Phase E — Multi-Role E2E Test Harness Spec

Date: 2026-04-20 Status: Superseded by docs/specs/2026-04-21-e2e-strategy.md — the multi-role harness (Q1–Q10 locked decisions, 6 canonical scenarios, TestModule design, milestones M-E1–M-E7) is carried forward verbatim as the M4 track of the new spec. Read this doc for deep architectural rationale; read the 2026-04-21 spec for the full-stack strategy (web + iOS + Android + env matrix + CI cadences). Superseded-by: docs/specs/2026-04-21-e2e-strategy.md Scope: Build the automated multi-role (consumer ↔ professional) E2E harness for Ideony MVP 0 — deterministic tenancy, clock control, GPS injection, 6 canonical scenarios on Playwright web + a direct Socket.IO integration layer. Phase mapping: Phase E of the Ideony MVP 0 blueprint — unblocks confident pre-demo regression + post-demo nightly gate. Related docs:

/Users/acidrums7/.claude/projects/-Users-acidrums7-Documents-Coding-Lavoro-Projects-Ideony/memory/project_multi_role_e2e_decisions.md (locked Q1–Q10)
plans/research/2026-04-19-multi-role-e2e-deep-dive.md (5565-word SOTA deep-dive — rationale only)
plans/specs/2026-04-20-ux-phase-c-design.md, plans/specs/2026-04-20-ux-phase-d-design.md (style reference)
/Users/acidrums7/Documents/Coding/Lavoro/Projects/Ideony/CLAUDE.md (monorepo + conventions)

Executive Summary
Locked Architecture (Q1–Q10)
Test Harness Components
Six Canonical Scenarios
Milestones M-E1 … M-E7
Directory Structure
CI + Local Integration
Risk List + Mitigations
NOT In Scope
Change Log

1. Executive Summary

Phase E delivers an automated multi-role E2E harness that catches consumer ↔ pro interaction bugs — booking state races, SOS cascade dispatch ordering, chat delivery across WebSocket, credential-trust ripple — that single-actor tests structurally miss. The harness uses Playwright TypeScript driving two browser contexts against a shared NestJS backend on the Hetzner dev instance, coordinated by a test-only module that exposes deterministic seeding, test_tenant isolation, FakeClock time control, and PostGIS geo injection. Six canonical 2-actor scenarios cover the marketplace’s critical interaction surface. Local-only for MVP 0 (pre-demo); nightly cron activates post-demo on the same Hetzner dev box. Zero ongoing cost (Playwright OSS, self-hosted runner, dev Clerk/Stripe/Novu already configured).

2. Locked Architecture (Q1–Q10)

Every decision below is locked. Full rationale lives in the memory file — do not reopen during impl.

#	Question	Choice	One-line rationale
Q1	Tenancy model	`test_tenant` UUID column + `TenantMiddleware` on shared dev DB	Avoids ephemeral-DB 30s startup penalty; Hetzner prod is repurposed as dev → no extra infra
Q2	GPS injection	Hybrid — BE `POST /test/geo-feed` default + 3 device-level smoke tests (web/iOS/Android)	Fast + deterministic for dispatch matching; 3 tests preserve client GPS upload coverage
Q3	Clock control	`ClockService` DI + `DelayService` BullMQ wrapper + `POST /test/advance-time`	SOS 30s countdown runs in BullMQ; per-request header alone misses cron/WS emitters
Q4	Test DSL	Maestro (mobile, deferred) + Playwright (web) — no TestRigor	€300/mo pre-revenue unjustified; lock-in risk; Maestro YAML already ~70% English-readable
Q5	Execution location	Local-only MVP 0; cron nightly post-demo on dev instance	Fast-dev philosophy; Mac M-series handles 4 pairs parallel; €0
Q6	Visual regression	Dedicated single-actor suite, deferred post-MVP 0	Separation of signal — multi-role snapshots flake; 44 screens baseline cost high
Q7	Mobile E2E orchestration	Playwright web only for MVP 0; Maestro added on mobile traction	Expo web renders 95% of mobile logic; Maestro Cloud €39–449/mo unjustified
Q8	Fail-fast vs run-all	Hybrid — `--bail=1` default, `--no-bail` via `:full` script	Dev iteration wants fast feedback; nightly + pre-demo want full triage
Q9	External services	All real dev envs (Stripe test, Clerk dev, Novu dev)	Mock drift is real maintenance tax; real integrations catch webhook-signature/JWT-claim/template drift
Q10	3-actor scenarios	Defer — 2-actor + single-actor admin integration tests	MVP 0 admin surface small (credentials only); 3-actor orchestration complexity > value

Infra side-effect from Q1: Hetzner 178.104.154.74 CAX11 is now the dev environment. Rename Dokploy project env production → development. Prod spins up fresh when revenue exists.

3. Test Harness Components

3.1 Backend test module — `apps/api/src/modules/test/`

Gated entirely by env flag ENABLE_TEST_ENDPOINTS=true. Module is conditionally imported in AppModule — in production the entire module tree is absent from the bundle. Double-gate with a TestGuard that re-checks the env flag + a X-Test-Tenant header signed with a shared secret (TEST_HARNESS_SECRET, rotated per env).

Routes (all prefix /test):

Route	Method	Body / Query	Response	Purpose
`/test/tenant/create`	POST	`{}`	`{ test_tenant: string }` (UUID v4)	Allocate a fresh tenant namespace for a scenario run
`/test/cleanup`	POST	`?tenant=<uuid>`	`{ deleted: { users, bookings, quotes, reviews, credentials, stripe_accounts, clerk_users, novu_subscribers } }`	Sweep all rows + external-service artefacts tagged with the tenant
`/test/geo-feed`	POST	`{ professionalId, lat, lng }`	`{ ok: true, updatedAt }`	Write PostGIS point into `ProfessionalProfile.location` for dispatch/matching
`/test/advance-time`	POST	`?ms=<int>`	`{ now: ISO8601, jobsFired: int }`	Advance `FakeClock` by `ms`, trigger every BullMQ job whose `runAt ≤ now`
`/test/seed/:scenario`	POST	`?tenant=<uuid>`	`{ seeded: object }`	Materialise a named scenario fixture (6 built-ins, see §4)
`/test/state/booking/:id`	GET	—	`{ status, … }`	Polling endpoint for state convergence (sync primitive B from research §4.4)
`/test/ws-tap/:channel`	GET	—	SSE stream	Proxy Redis `pub/sub` events for a tenant so Playwright can assert on events without WS client setup

Guard chain: TestGuard → TenantMiddleware → handler. TenantMiddleware rejects any request without X-Test-Tenant header on /test/* routes. On non-test routes, if the header is present, the middleware scopes all Prisma queries via a ClsModule-stashed tenant ID; Prisma client extensions inject WHERE test_tenant = $1 OR test_tenant IS NULL into reads, and set test_tenant = $1 on writes for tagged tables.

Files to add:

apps/api/src/modules/test/test.module.ts
apps/api/src/modules/test/test.controller.ts
apps/api/src/modules/test/test.service.ts
apps/api/src/modules/test/test.guard.ts
apps/api/src/modules/test/fixtures/*.ts (6 scenario seeders)
apps/api/src/common/tenant/tenant.middleware.ts
apps/api/src/common/tenant/tenant.cls-store.ts (ClsModule-backed)
apps/api/src/common/prisma/tenant-extension.ts (Prisma client extension scoping reads/writes)

3.2 Clock abstraction — `apps/api/src/common/clock/`

ClockService wraps every new Date() / Date.now() in business logic. Two implementations:

SystemClockService (prod, default) — returns new Date() verbatim; advance() throws ForbiddenException.
FakeClockService (test, activated by ENABLE_TEST_ENDPOINTS=true) — holds an internal offset + mutex, now() returns new Date(Date.now() + offsetMs), advance(ms) bumps the offset + emits an internal ClockAdvanced event.

Refactor surface (~40 call sites): BookingService, QuoteService, SOSService, AvailabilityService, ReviewService, CredentialsService, BullMQ processors. Replace every new Date() + Date.now() with this.clock.now() / this.clock.ms().

3.3 Delay abstraction — `apps/api/src/common/delay/`

DelayService.schedule(queue, jobData, delayMs) wraps queue.add(…, { delay }) calls. In test mode, registers each pending delay against FakeClockService — when /test/advance-time fires, DelayService promotes every delay whose dueAt ≤ clock.now() into an immediate queue.add(…, { delay: 0 }). This covers:

SOS 30s countdown (sos-countdown queue)
Booking reminders (booking-reminders queue, 24h + 1h before)
Quote expiry (quote-expiry queue, 48h TTL)
Review prompts (review-prompts queue, post-completion)

Key invariant: production SystemClockService + direct queue.add path is unchanged; DelayService becomes a pure pass-through in prod (no test hooks loaded).

3.4 Tenancy column migration

Prisma migration 20260420_add_test_tenant adds:

test_tenant UUID NULL column on: User, ProfessionalProfile, Booking, Quote, Review, Credential, ChatMessage, ChatThread, Dispatch, Notification, PushToken.
Index CREATE INDEX idx_<table>_test_tenant ON <table>(test_tenant) WHERE test_tenant IS NOT NULL; (partial index — prod rows have NULL, zero overhead).
NOT added to: Category, Service, I18nKey, other reference tables that are shared across tenants by design.

Prod rows = test_tenant IS NULL → invisible to tenant-scoped reads + safe from tenant-scoped deletes.

3.5 Tenant middleware

TenantMiddleware (Nest NestMiddleware) runs on every request. Behavior:

If X-Test-Tenant header missing → pass through (prod path).
If present → validate as UUID, validate TEST_HARNESS_SECRET signature header, push tenant into ClsModule store for the request lifecycle.
Prisma client extension reads from CLS store and injects test_tenant into create/update payloads, WHERE clauses on find/findMany, WHERE clauses on delete/deleteMany — for every tagged table.
External service calls (Stripe, Clerk, Novu) prefix resources with test_tenant:<uuid>: in metadata/subscriber IDs so /test/cleanup can sweep them.

3.6 Test-harness HTTP client — `apps/mobile/test/e2e-web/lib/test-api.ts`

Thin wrapper around the BE test endpoints used by Playwright specs + shared fixtures:

class TestApi {
  createTenant(): Promise<string>
  cleanupTenant(id: string): Promise<void>
  seedScenario(name: ScenarioName, tenant: string): Promise<SeedResult>
  advanceClock(ms: number): Promise<void>
  feedGeo(proId: string, lat: number, lng: number): Promise<void>
  getBookingState(id: string): Promise<BookingState>
  waitForBookingStatus(id: string, status: BookingStatus, opts?: { timeoutMs }): Promise<void>
  waitForWsEvent(channel: string, predicate: (ev) => boolean, opts?: { timeoutMs }): Promise<Event>
}

Exposes higher-level sync helpers (waitForBookingStatus, waitForWsEvent) implementing the research-doc §4.4 primitive split — polling for state convergence, SSE tap for event-fired assertions.

3.7 Cleanup contract

Every scenario calls cleanupTenant(id) in afterAll. The endpoint performs:

DELETE FROM <tagged_table> WHERE test_tenant = $1 (11 tables, FK-safe order via Prisma cascade rules).
stripe.accounts.del(…) for every Connect account whose metadata.test_tenant === id.
clerk.users.deleteUser(…) for every user whose privateMetadata.test_tenant === id.
novu.subscribers.delete(…) for every subscriber ID prefixed test_tenant:<id>:.
BullMQ: queue.clean(0, 1000, 'delayed' | 'waiting' | 'active') filtered by jobData.test_tenant === id.

Sweep is idempotent — safe to call twice (e.g. after a crash).

4. Six Canonical Scenarios

Each scenario = 1 Playwright spec file + 1 named seed fixture. All 2-actor (consumer + pro). Runtime target ≤ 2 min each on M-series laptop.

4.1 Scenario 01 — Booking full cycle

File: apps/mobile/test/e2e-web/scenarios/01-booking-full-cycle.spec.ts Seed: booking_full_cycle_rome — 1 consumer, 1 pro (verified trust tier), 1 category (plumbing), pro at 41.9028, 12.4964 (Rome Colosseum).

Actors:

Consumer: logs in, searches “idraulico”, picks pro, books instant slot.
Pro: receives booking, accepts via dashboard, marks arrived, marks completed.

Happy path:

Consumer opens /, searches category → ProCard for seeded pro visible.
Consumer taps pro → /professional/[id] → “Prenota” → /book/[professionalId] → picks slot (next hour) → pays via Stripe test card 4242… → booking created in PENDING_ACCEPTANCE.
waitForWsEvent('booking:new', { bookingId }) fires on pro’s WS channel within 5s.
Pro dashboard → Requests tab → card shows → taps “Accetta” → confirmation bottom sheet → booking → ACCEPTED.
advanceClock(3600_000) → clock is now scheduled start time.
Pro taps “Sono arrivato” → booking IN_PROGRESS.
Pro taps “Completa” → booking COMPLETED, Stripe PaymentIntent captured.
Consumer sees receipt + review prompt within 3s.

Assertions:

Booking status transitions: CREATED → PENDING_ACCEPTANCE → ACCEPTED → IN_PROGRESS → COMPLETED in DB (verify via /test/state/booking/:id).
Stripe PaymentIntent state requires_capture → succeeded.
Novu event log contains booking.accepted + booking.completed for consumer subscriber.
Pro earnings tab shows the amount in “pending” bucket.
Cleanup sweeps all of the above.

Timing: 1 clock advance (+1h for scheduled start). No geo feed needed.

4.2 Scenario 02 — SOS cascade dispatch

File: scenarios/02-sos-cascade.spec.ts Seed: sos_burst_pipe_rome — 1 consumer at 41.9028, 12.4964, 3 pros (p1, p2, p3) at concentric distances (1 km, 3 km, 8 km), all plumbing-qualified, all online.

Actors:

Consumer: opens SOS flow, describes “tubo scoppiato”, confirms dispatch.
Pro1 (nearest): receives first offer, ignores (times out).
Pro2 (middle): receives cascaded offer, accepts.
Pro3 (farthest): never sees the offer.

Happy path:

Consumer: /(welcome) → / → holds SOS tab → /sos → describes problem → confirm.
feedGeo('p1', 41.9040, 12.4960) etc. — positions set via BE, dispatch matching picks p1 first.
waitForWsEvent('sos:offer', { proId: 'p1' }).
advanceClock(30_000) — p1 countdown expires; BullMQ sos-countdown job fires; dispatch cascades to p2.
waitForWsEvent('sos:offer', { proId: 'p2' }).
Pro2 (2nd Playwright context) taps “Accetta” on SOS screen.
waitForBookingStatus(booking, 'ACCEPTED').
Pro2 enters live-tracking flow — feedGeo called 5× over simulated 10min (clock advanced 2min per tick) to simulate travel.
Pro2 taps “Sono arrivato” → IN_PROGRESS.

Assertions:

Dispatch rows: p1 status OFFERED → EXPIRED, p2 status OFFERED → ACCEPTED, p3 status never created.
Booking status: CREATED → DISPATCHING → ACCEPTED → IN_PROGRESS.
Pro3 WebSocket never received sos:offer for this booking.
Consumer receives exactly one booking.accepted Novu notification (not two from cascade race).

Timing: 1 geo seed (3 points), 6 clock advances (30s expiry + 5× 2min travel). Heaviest scenario.

4.3 Scenario 03 — Consumer cancel with refund

File: scenarios/03-cancel-refund.spec.ts Seed: Same as scenario 01 but booking pre-seeded in ACCEPTED state, scheduled 3h from test clock start.

Actors:

Consumer: opens booking detail, cancels.
Pro: receives cancellation notice, sees updated calendar.

Happy path:

Consumer /booking/[id] → “Annulla prenotazione” → confirmation sheet → confirm.
BE applies cancellation policy (> 2h notice = full refund).
Stripe refund created.
Pro’s WS booking:cancelled event fires.
Pro’s calendar slot freed.

Assertions:

Booking status: ACCEPTED → CANCELLED_BY_CONSUMER.
Stripe refund exists, amount = full booking price.
Pro Availability table shows the slot no longer blocked.
Both actors see identical cancellation reason + amount refunded.

Timing: No clock advance needed (policy check uses clock.now() which test harness sets relative to scheduled time via seed).

4.4 Scenario 04 — Chat delivery under disconnection

File: scenarios/04-chat-delivery.spec.ts Seed: Booking in ACCEPTED state; chat thread auto-created.

Actors:

Consumer: sends messages, briefly disconnects, reconnects.
Pro: receives all messages in order.

Happy path:

Consumer sends M1 “A che ora arrivi?” → pro receives within 1s.
Pro sends M2 “Entro 15 min” → consumer receives within 1s.
Consumer closes browser tab (Playwright context.close()), opens fresh tab, re-auths.
While offline, pro sends M3 + M4.
Consumer reconnects → chat:sync loads M3 + M4 in chronological order.
Consumer sends M5 with optimistic UI → confirmed delivered within 1s.

Assertions:

DB ChatMessage rows: 5 total, chronological createdAt.
No duplicate messages (test the idempotency key).
Read receipts fire bidirectionally.
Both actors’ UI shows same last-message + unread counts.

Timing: No clock advance. Connection-level test.

4.5 Scenario 05 — Credential submission → approval → trust tier ripple

File: scenarios/05-credential-approval.spec.ts Seed: Pro with trustTier = BASIC (score 10), no credentials yet. One consumer browsing.

Actors:

Pro (actor 1): submits P_IVA + INSURANCE credentials via upload flow.
Single-actor admin integration test runs as part of this scenario via BE direct API call (per Q10 — admin is NOT a 2nd orchestrated browser).
Consumer (actor 2): searches, sees pro, observes trust badge before + after approval.

Happy path:

Pro: /credentials → upload P_IVA → /credentials/me/:id/upload-url → S3 presigned PUT → status PENDING.
Pro uploads INSURANCE similarly.
Admin: BE direct call POST /admin/credentials/:id/approve for both, via test harness with admin service token.
Trust engine re-computes: P_IVA (30) + INSURANCE (25) + base (10) = 65 → tier VERIFIED.
Consumer (fresh tab, same tenant) searches category → pro card now shows VerifiedBadge.
Novu notification credential.approved delivered to pro’s subscriber.

Assertions:

Credential.status = APPROVED for both rows.
ProfessionalProfile.trustScore = 65, trustTier = VERIFIED.
Consumer search response payload includes trustTier: 'VERIFIED'.
Rate limit sanity: 3rd upload attempt within same day → 429 (Redis counter at rate:credential-upload:<proId>:<YYYY-MM-DD>).

Timing: No clock advance. Tests the trust-ripple path Q10 flagged as needing coverage.

4.6 Scenario 06 — Rating + review round-trip

File: scenarios/06-review-round-trip.spec.ts Seed: Completed booking from scenario 01’s end state (can chain or re-seed).

Actors:

Consumer: submits 5-star review.
Pro: sees rating reflected on profile + dashboard.

Happy path:

Consumer: /review/[bookingId] → 5 stars → “Ottimo lavoro, puntuale.” → submit.
DB review row created; pro’s aggregate rating recomputed.
Pro dashboard polls → rating badge updates.
Consumer profile shows the review in their history.
advanceClock(604_800_000) (+7 days) → review “editable window” closes.
Consumer attempts to edit review → 403 forbidden.

Assertions:

Review.rating = 5, Review.comment matches.
ProfessionalProfile.ratingAvg + ratingCount updated atomically.
Review appears in public /professional/[id] page for a 3rd unauthenticated browser context (verifies cache invalidation).
Post-clock-advance edit returns 403 with i18n-keyed error message.

Timing: 1 clock advance (+7 days).

5. Milestones M-E1 … M-E7

All independently PR-mergeable. Dependency order strict. Each milestone targets ≤ 2 working days.

M-E1 — Test module scaffold + TenantMiddleware + migration (2 days)

Scope:

Prisma migration 20260420_add_test_tenant — column + partial indexes on 11 tables.
apps/api/src/modules/test/ — module, controller, service, guard (env + secret gate).
TestController routes: POST /test/tenant/create, POST /test/cleanup (rows only — external sweeps deferred to M-E6).
TenantMiddleware + ClsModule wiring + Prisma client extension for tenant scoping.
Env var ENABLE_TEST_ENDPOINTS + TEST_HARNESS_SECRET added to .env.example + Dokploy dev config.
Unit tests for middleware (tenant-scoped reads, tenant-tagged writes, prod-path bypass when header absent).

Exit criteria:

Migration applies cleanly on dev DB (verify partial-index presence).
With ENABLE_TEST_ENDPOINTS=false → module tree absent from the Nest registry; routes return 404.
With flag on → POST /test/tenant/create returns UUID; POST /test/cleanup deletes tenant-tagged rows only.
Tests green, no prod-path regression.

Deps: none (foundational).

M-E2 — ClockService + DelayService refactor (2 days)

Scope:

apps/api/src/common/clock/ — ClockService abstract, SystemClockService, FakeClockService.
apps/api/src/common/delay/ — DelayService.schedule() wrapper.
Refactor ~40 call sites across Booking, Quote, SOS, Availability, Review, Credentials + 4 BullMQ processors to use clock.now() + delay.schedule().
POST /test/advance-time?ms=N endpoint wired to FakeClockService.advance() + DelayService.flushDueBy(clock.now()).
Unit tests: FakeClock monotonicity, DelayService job promotion, production throw-on-advance.

Exit criteria:

Zero new Date() / Date.now() calls in business-logic paths (enforced via Biome custom rule OR ripgrep CI gate).
/test/advance-time fires due BullMQ jobs within same request cycle.
All 240+ existing BE tests still green.

Deps: M-E1 (test module for endpoint).

M-E3 — Geo feed + SSE event tap + seed fixtures (1.5 days)

Scope:

POST /test/geo-feed writes PostGIS point via ProfessionalService.updateLocation().
GET /test/ws-tap/:channel SSE endpoint — subscribes to Redis events:* pattern scoped by tenant, streams filtered events to Playwright.
6 scenario seed fixtures under apps/api/src/modules/test/fixtures/ — each idempotent, tenant-scoped.
POST /test/seed/:scenario?tenant=<uuid> dispatcher.

Exit criteria:

Manually: curl POST /test/geo-feed → SELECT ST_AsText(location) FROM professional_profile returns updated point.
curl SSE stream against tap receives Redis-published events in real time.
All 6 scenarios seed without FK violations; rerun on same tenant is idempotent.

Deps: M-E1, M-E2.

M-E4 — Playwright harness + test-api.ts + Scenario 01 (2 days)

Scope:

apps/mobile/test/e2e-web/ workspace — playwright.config.ts, package.json (new scripts test:e2e, test:e2e:full), tsconfig.json.
lib/test-api.ts HTTP client (§3.6).
lib/sync.ts — waitForBookingStatus (polling) + waitForWsEvent (SSE consumer).
lib/fixtures.ts — Playwright fixtures: testTenant, testApi, consumerPage, proPage.
lib/artifacts.ts — failure artefact bundler (screenshots + WS transcript + BE log snapshot).
scenarios/01-booking-full-cycle.spec.ts.
Husky pre-push does not run E2E (too slow; unit tests only per Q5).

Exit criteria:

pnpm test:e2e -- --scenario 01 passes on local Mac against dev BE.
Two browser contexts coordinate, test cleans up after itself (verify DB row count = 0 for tenant post-run).
Failure run bundles artefacts to apps/mobile/test/e2e-web/artifacts/<tenant-id>/.

Deps: M-E1, M-E2, M-E3.

M-E5 — Scenarios 02 (SOS) + 03 (cancel) + 04 (chat) (2 days)

Scope:

3 scenario specs + seeds.
lib/stripe-test-cards.ts helper.
Disconnection/reconnection utilities in lib/sync.ts.
SOS-specific: cascade-ordering assertion helper.

Exit criteria:

All 3 scenarios pass 10 consecutive runs locally (flake < 1%).
SOS cascade: p3 never sees offer (negative assertion).

Deps: M-E4.

M-E6 — Scenarios 05 + 06 + external service cleanup (2 days)

Scope:

Scenarios 05 + 06 specs + seeds.
Extend /test/cleanup to sweep Stripe (accounts + customers), Clerk (users), Novu (subscribers) scoped by tenant metadata prefix.
Admin direct-API helper in lib/admin-api.ts (service-token auth for credential approve/reject).

Exit criteria:

Scenarios 05 + 06 green.
Post-cleanup verification: Stripe account list filtered by metadata.test_tenant=<id> returns zero; same for Clerk users; same for Novu subscribers.
Twilio spend-alert webhook configured (Q9 budget guard, €20/mo threshold).

Deps: M-E5.

M-E7 — Scripts + docs + pre-demo smoke (1 day)

Scope:

Root pnpm test:e2e → pnpm --filter @ideony/mobile-e2e-web test:e2e (bail on first fail per Q8).
Root pnpm test:e2e:full → same without bail.
apps/mobile/test/e2e-web/README.md — how to run, how to add a scenario, how to read failure artefacts.
scripts/e2e-smoke.sh — pre-demo invoker using :full variant + Slack webhook on completion.
CLAUDE.md update (Testing section) documenting the new commands.

Exit criteria:

pnpm test:e2e end-to-end green, wall time < 15 min for all 6 scenarios on laptop.
Smoke script integrates with demo-prep runbook.
Post-demo TODO ticket filed: “Add cron 0 3 * * * on Hetzner dev instance running /opt/ideony/scripts/e2e-smoke.sh w/ Slack alerts.”

Deps: M-E6.

Total: 7 milestones, ~12.5 working days.

6. Directory Structure

apps/api/
├── prisma/
│   └── migrations/
│       └── 20260420_add_test_tenant/migration.sql
├── src/
│   ├── common/
│   │   ├── clock/
│   │   │   ├── clock.service.ts            # abstract
│   │   │   ├── system-clock.service.ts     # prod impl
│   │   │   └── fake-clock.service.ts       # test impl
│   │   ├── delay/
│   │   │   └── delay.service.ts            # BullMQ wrapper
│   │   ├── prisma/
│   │   │   └── tenant-extension.ts         # client extension
│   │   └── tenant/
│   │       ├── tenant.cls-store.ts
│   │       └── tenant.middleware.ts
│   └── modules/
│       └── test/
│           ├── test.module.ts              # conditional import
│           ├── test.controller.ts
│           ├── test.service.ts
│           ├── test.guard.ts
│           └── fixtures/
│               ├── booking-full-cycle-rome.ts
│               ├── sos-burst-pipe-rome.ts
│               ├── cancel-with-refund.ts
│               ├── chat-thread.ts
│               ├── credential-trust-ripple.ts
│               └── review-round-trip.ts
└── test/
    └── integration/
        ├── admin/                          # single-actor admin tests (Q10)
        │   └── credential-approval.spec.ts
        └── multi-role/
            └── ws-multiclient.spec.ts      # research-doc §4.2 fast layer (optional mini-add)

apps/mobile/
└── test/
    └── e2e-web/                            # Playwright workspace (Q7 — web only)
        ├── package.json                    # name: @ideony/mobile-e2e-web
        ├── playwright.config.ts
        ├── tsconfig.json
        ├── README.md
        ├── lib/
        │   ├── test-api.ts                 # HTTP client to /test/*
        │   ├── admin-api.ts                # service-token admin calls
        │   ├── sync.ts                     # waitForBookingStatus / waitForWsEvent
        │   ├── fixtures.ts                 # Playwright fixtures
        │   ├── artifacts.ts                # failure artefact bundler
        │   └── stripe-test-cards.ts
        ├── scenarios/
        │   ├── 01-booking-full-cycle.spec.ts
        │   ├── 02-sos-cascade.spec.ts
        │   ├── 03-cancel-refund.spec.ts
        │   ├── 04-chat-delivery.spec.ts
        │   ├── 05-credential-approval.spec.ts
        │   └── 06-review-round-trip.spec.ts
        └── artifacts/                      # .gitignored — failure bundles

scripts/
└── e2e-smoke.sh                            # pre-demo invoker

Workspace packaging: apps/mobile/test/e2e-web is a distinct pnpm workspace package (not nested in apps/mobile’s package.json) so Playwright deps don’t inflate the mobile app bundle. Root pnpm-workspace.yaml adds apps/mobile/test/e2e-web.

7. CI + Local Integration

7.1 Local run (MVP 0 primary path per Q5)

# One-time setup
pnpm install
pnpm --filter @ideony/mobile-e2e-web exec playwright install chromium

# Run all 6 scenarios, bail on first fail
pnpm test:e2e

# Run all, report all (for pre-demo smoke)
pnpm test:e2e:full

# Run single scenario
pnpm test:e2e -- --grep "Scenario 01"

Target env: apps/mobile/test/e2e-web/.env.test points at dev BE:

TEST_API_URL=https://api.ideony.is-a.dev (post-named-tunnel) or Quick Tunnel URL
TEST_HARNESS_SECRET=<shared secret, rotated>
STRIPE_TEST_CARD=4242424242424242
CLERK_FRONTEND_API=humble-garfish-77.clerk.accounts.dev

Husky pre-push: does NOT run E2E (unit tests only). Multi-role suite is manually triggered pre-demo / pre-merge.

7.2 Post-demo cron (Q5 activation)

Target: 2026-04-21+ (after demo). Add on Hetzner dev instance (178.104.154.74):

0 3 * * *  /opt/ideony/scripts/e2e-smoke.sh >> /var/log/ideony-e2e.log 2>&1

Script body:

cd /opt/ideony && git pull --rebase
pnpm install --frozen-lockfile
pnpm test:e2e:full --reporter=json > /tmp/e2e-report.json
On fail: POST Slack webhook with failed scenarios + link to artefact tarball uploaded to R2.

7.3 GitHub Actions (deferred — not MVP 0)

Structure sketched but not implemented in Phase E. When activated post-revenue:

Self-hosted ARM64 runner (already exists for build/deploy).
Matrix over 6 scenarios, fail-fast: false, 15-min timeout, artefact upload.
Trigger: workflow_dispatch + pull_request for paths apps/api/src/modules/{booking,sos,credentials,chat,reviews,dispatch}/** + apps/mobile/app/**.

8. Risk List + Mitigations

#	Risk	Likelihood	Impact	Mitigation
R1	`test_tenant` column forgotten on a new table → cross-tenant leakage	Medium	High	Biome custom rule flagging Prisma models missing `test_tenant`; PR checklist item; M-E1 includes snapshot test of tagged-table list
R2	`ClockService` refactor misses a call site → flaky timing test	Medium	Medium	Biome rule banning `new Date()` + `Date.now()` in `apps/api/src/modules/**` w/ exceptions list; M-E2 exit criteria enforces zero
R3	External service (Stripe/Clerk/Novu) rate limit hit during test run	Low	Medium	Tenant isolation spreads creations; Twilio €20/mo budget alert (Q9); retry-with-backoff in `test-api.ts`
R4	Dev DB schema drift between migrations and E2E scenario seeds	Medium	Medium	Seed fixtures import Prisma client types directly (compile-time guarantee); CI job runs `pnpm prisma migrate deploy` before E2E
R5	Playwright browser context auth races (both actors using same Clerk session)	Medium	High	`lib/fixtures.ts` creates two Clerk users per test, uses BAPI session creation pattern (`reference_clerk_e2e` memory) for independent JWTs
R6	SSE tap drops events on reconnection → `waitForWsEvent` hangs	Low	Medium	Server-side buffer last 50 events per tenant in Redis; tap replays on connect
R7	`FakeClock.advance()` + BullMQ job firing race condition	Medium	High	`DelayService.flushDueBy()` awaits all promoted jobs’ `completed` event before resolving; scenario tests add assertion after advance
R8	Cleanup doesn’t sweep new external resources added in future modules	Medium	Medium	Cleanup service uses reflect-metadata-driven registry — any module that adds external resources must register a sweep callback; lint rule enforces
R9	Test run on local machine blocks dev (port conflicts, Clerk rate)	Low	Low	Playwright uses dev BE (not local); no local BE needed; rate-limit risk covered in R3
R10	Scenarios depend on seed data that conflicts with each other in parallel runs	Medium	Medium	Every scenario allocates its own `test_tenant` UUID — no shared seeds; parallel safe by design

9. NOT In Scope

Explicit exclusions — do not implement in Phase E. Add to post-MVP 0 backlog if/when justified:

3-actor scenarios (consumer + pro + admin-live). Q10 deferred — admin flows covered via single-actor integration tests in apps/api/test/integration/admin/. Revisit when admin surface grows (disputes, moderation, SOS override).
Mobile-native E2E (iOS/Android Maestro). Q7 deferred — Expo web covers 95% of mobile UI logic; native-specific bugs (push tokens, deep links, file-picker) caught manually via EAS preview builds + TestFlight. Add Maestro when mobile traction + revenue justify.
Visual regression inside multi-role flows. Q6 deferred — separation of signal mandate; multi-role is async + flake-prone, visual snapshots amplify flake. Dedicated single-actor visual suite post-MVP 0 once 10+ visual bugs surface.
TestRigor / other AI-authored DSL. Q4 rejected — €300+/mo pre-revenue + lock-in. Revisit in 6mo if cofounders explicitly blocked from authoring PR reviews on Playwright TS.
City simulator / algorithm validation (Bolt/Glovo style). Research §1.4 — premature; no historical data yet. Phase F+ concern.
Synthetic canaries in prod (Checkly style). Research §1.6 — post-v1 only; needs prod env first.
GitHub Actions CI integration. Local-only per Q5 — add in post-revenue infra hardening phase.
WebSocket multi-client integration tests (research §4.2 fast layer). Optional — M-E1 directory structure reserves apps/api/test/integration/multi-role/ for future adds, but not shipped in Phase E.

10. Change Log

2026-04-20 — Spec created. All 10 locked decisions consolidated. 6 canonical scenarios detailed. 7 milestones scoped (~12.5 days). 10 risks catalogued. Ready for M-E1 kickoff.

Phase E — Multi-Role E2E Test Harness Spec

Phase E — Multi-Role E2E Test Harness Spec

Table of Contents

1. Executive Summary

2. Locked Architecture (Q1–Q10)

3. Test Harness Components

3.1 Backend test module — apps/api/src/modules/test/

3.2 Clock abstraction — apps/api/src/common/clock/

3.3 Delay abstraction — apps/api/src/common/delay/

3.4 Tenancy column migration

3.5 Tenant middleware

3.6 Test-harness HTTP client — apps/mobile/test/e2e-web/lib/test-api.ts

3.7 Cleanup contract

4. Six Canonical Scenarios

4.1 Scenario 01 — Booking full cycle

4.2 Scenario 02 — SOS cascade dispatch

4.3 Scenario 03 — Consumer cancel with refund

4.4 Scenario 04 — Chat delivery under disconnection

4.5 Scenario 05 — Credential submission → approval → trust tier ripple

4.6 Scenario 06 — Rating + review round-trip

5. Milestones M-E1 … M-E7

M-E1 — Test module scaffold + TenantMiddleware + migration (2 days)

M-E2 — ClockService + DelayService refactor (2 days)

M-E3 — Geo feed + SSE event tap + seed fixtures (1.5 days)

M-E4 — Playwright harness + test-api.ts + Scenario 01 (2 days)

M-E5 — Scenarios 02 (SOS) + 03 (cancel) + 04 (chat) (2 days)

M-E6 — Scenarios 05 + 06 + external service cleanup (2 days)

M-E7 — Scripts + docs + pre-demo smoke (1 day)

6. Directory Structure

7. CI + Local Integration

7.1 Local run (MVP 0 primary path per Q5)

7.2 Post-demo cron (Q5 activation)

7.3 GitHub Actions (deferred — not MVP 0)

8. Risk List + Mitigations

9. NOT In Scope

10. Change Log

3.1 Backend test module — `apps/api/src/modules/test/`

3.2 Clock abstraction — `apps/api/src/common/clock/`

3.3 Delay abstraction — `apps/api/src/common/delay/`

3.6 Test-harness HTTP client — `apps/mobile/test/e2e-web/lib/test-api.ts`