Skip to content

Phase E — Multi-Role E2E Test Harness Spec

Phase E — Multi-Role E2E Test Harness Spec

Section titled “Phase E — Multi-Role E2E Test Harness Spec”

Date: 2026-04-20 Status: Superseded by docs/specs/2026-04-21-e2e-strategy.md — the multi-role harness (Q1–Q10 locked decisions, 6 canonical scenarios, TestModule design, milestones M-E1–M-E7) is carried forward verbatim as the M4 track of the new spec. Read this doc for deep architectural rationale; read the 2026-04-21 spec for the full-stack strategy (web + iOS + Android + env matrix + CI cadences). Superseded-by: docs/specs/2026-04-21-e2e-strategy.md Scope: Build the automated multi-role (consumer ↔ professional) E2E harness for Ideony MVP 0 — deterministic tenancy, clock control, GPS injection, 6 canonical scenarios on Playwright web + a direct Socket.IO integration layer. Phase mapping: Phase E of the Ideony MVP 0 blueprint — unblocks confident pre-demo regression + post-demo nightly gate. Related docs:

  • /Users/acidrums7/.claude/projects/-Users-acidrums7-Documents-Coding-Lavoro-Projects-Ideony/memory/project_multi_role_e2e_decisions.md (locked Q1–Q10)
  • plans/research/2026-04-19-multi-role-e2e-deep-dive.md (5565-word SOTA deep-dive — rationale only)
  • plans/specs/2026-04-20-ux-phase-c-design.md, plans/specs/2026-04-20-ux-phase-d-design.md (style reference)
  • /Users/acidrums7/Documents/Coding/Lavoro/Projects/Ideony/CLAUDE.md (monorepo + conventions)
  1. Executive Summary
  2. Locked Architecture (Q1–Q10)
  3. Test Harness Components
  4. Six Canonical Scenarios
  5. Milestones M-E1 … M-E7
  6. Directory Structure
  7. CI + Local Integration
  8. Risk List + Mitigations
  9. NOT In Scope
  10. Change Log

Phase E delivers an automated multi-role E2E harness that catches consumer ↔ pro interaction bugs — booking state races, SOS cascade dispatch ordering, chat delivery across WebSocket, credential-trust ripple — that single-actor tests structurally miss. The harness uses Playwright TypeScript driving two browser contexts against a shared NestJS backend on the Hetzner dev instance, coordinated by a test-only module that exposes deterministic seeding, test_tenant isolation, FakeClock time control, and PostGIS geo injection. Six canonical 2-actor scenarios cover the marketplace’s critical interaction surface. Local-only for MVP 0 (pre-demo); nightly cron activates post-demo on the same Hetzner dev box. Zero ongoing cost (Playwright OSS, self-hosted runner, dev Clerk/Stripe/Novu already configured).

Every decision below is locked. Full rationale lives in the memory file — do not reopen during impl.

#QuestionChoiceOne-line rationale
Q1Tenancy modeltest_tenant UUID column + TenantMiddleware on shared dev DBAvoids ephemeral-DB 30s startup penalty; Hetzner prod is repurposed as dev → no extra infra
Q2GPS injectionHybrid — BE POST /test/geo-feed default + 3 device-level smoke tests (web/iOS/Android)Fast + deterministic for dispatch matching; 3 tests preserve client GPS upload coverage
Q3Clock controlClockService DI + DelayService BullMQ wrapper + POST /test/advance-timeSOS 30s countdown runs in BullMQ; per-request header alone misses cron/WS emitters
Q4Test DSLMaestro (mobile, deferred) + Playwright (web) — no TestRigor€300/mo pre-revenue unjustified; lock-in risk; Maestro YAML already ~70% English-readable
Q5Execution locationLocal-only MVP 0; cron nightly post-demo on dev instanceFast-dev philosophy; Mac M-series handles 4 pairs parallel; €0
Q6Visual regressionDedicated single-actor suite, deferred post-MVP 0Separation of signal — multi-role snapshots flake; 44 screens baseline cost high
Q7Mobile E2E orchestrationPlaywright web only for MVP 0; Maestro added on mobile tractionExpo web renders 95% of mobile logic; Maestro Cloud €39–449/mo unjustified
Q8Fail-fast vs run-allHybrid — --bail=1 default, --no-bail via :full scriptDev iteration wants fast feedback; nightly + pre-demo want full triage
Q9External servicesAll real dev envs (Stripe test, Clerk dev, Novu dev)Mock drift is real maintenance tax; real integrations catch webhook-signature/JWT-claim/template drift
Q103-actor scenariosDefer — 2-actor + single-actor admin integration testsMVP 0 admin surface small (credentials only); 3-actor orchestration complexity > value

Infra side-effect from Q1: Hetzner 178.104.154.74 CAX11 is now the dev environment. Rename Dokploy project env production → development. Prod spins up fresh when revenue exists.

3.1 Backend test module — apps/api/src/modules/test/

Section titled “3.1 Backend test module — apps/api/src/modules/test/”

Gated entirely by env flag ENABLE_TEST_ENDPOINTS=true. Module is conditionally imported in AppModulein production the entire module tree is absent from the bundle. Double-gate with a TestGuard that re-checks the env flag + a X-Test-Tenant header signed with a shared secret (TEST_HARNESS_SECRET, rotated per env).

Routes (all prefix /test):

RouteMethodBody / QueryResponsePurpose
/test/tenant/createPOST{}{ test_tenant: string } (UUID v4)Allocate a fresh tenant namespace for a scenario run
/test/cleanupPOST?tenant=<uuid>{ deleted: { users, bookings, quotes, reviews, credentials, stripe_accounts, clerk_users, novu_subscribers } }Sweep all rows + external-service artefacts tagged with the tenant
/test/geo-feedPOST{ professionalId, lat, lng }{ ok: true, updatedAt }Write PostGIS point into ProfessionalProfile.location for dispatch/matching
/test/advance-timePOST?ms=<int>{ now: ISO8601, jobsFired: int }Advance FakeClock by ms, trigger every BullMQ job whose runAt ≤ now
/test/seed/:scenarioPOST?tenant=<uuid>{ seeded: object }Materialise a named scenario fixture (6 built-ins, see §4)
/test/state/booking/:idGET{ status, … }Polling endpoint for state convergence (sync primitive B from research §4.4)
/test/ws-tap/:channelGETSSE streamProxy Redis pub/sub events for a tenant so Playwright can assert on events without WS client setup

Guard chain: TestGuard → TenantMiddleware → handler. TenantMiddleware rejects any request without X-Test-Tenant header on /test/* routes. On non-test routes, if the header is present, the middleware scopes all Prisma queries via a ClsModule-stashed tenant ID; Prisma client extensions inject WHERE test_tenant = $1 OR test_tenant IS NULL into reads, and set test_tenant = $1 on writes for tagged tables.

Files to add:

  • apps/api/src/modules/test/test.module.ts
  • apps/api/src/modules/test/test.controller.ts
  • apps/api/src/modules/test/test.service.ts
  • apps/api/src/modules/test/test.guard.ts
  • apps/api/src/modules/test/fixtures/*.ts (6 scenario seeders)
  • apps/api/src/common/tenant/tenant.middleware.ts
  • apps/api/src/common/tenant/tenant.cls-store.ts (ClsModule-backed)
  • apps/api/src/common/prisma/tenant-extension.ts (Prisma client extension scoping reads/writes)

3.2 Clock abstraction — apps/api/src/common/clock/

Section titled “3.2 Clock abstraction — apps/api/src/common/clock/”

ClockService wraps every new Date() / Date.now() in business logic. Two implementations:

  • SystemClockService (prod, default) — returns new Date() verbatim; advance() throws ForbiddenException.
  • FakeClockService (test, activated by ENABLE_TEST_ENDPOINTS=true) — holds an internal offset + mutex, now() returns new Date(Date.now() + offsetMs), advance(ms) bumps the offset + emits an internal ClockAdvanced event.

Refactor surface (~40 call sites): BookingService, QuoteService, SOSService, AvailabilityService, ReviewService, CredentialsService, BullMQ processors. Replace every new Date() + Date.now() with this.clock.now() / this.clock.ms().

3.3 Delay abstraction — apps/api/src/common/delay/

Section titled “3.3 Delay abstraction — apps/api/src/common/delay/”

DelayService.schedule(queue, jobData, delayMs) wraps queue.add(…, { delay }) calls. In test mode, registers each pending delay against FakeClockService — when /test/advance-time fires, DelayService promotes every delay whose dueAt ≤ clock.now() into an immediate queue.add(…, { delay: 0 }). This covers:

  • SOS 30s countdown (sos-countdown queue)
  • Booking reminders (booking-reminders queue, 24h + 1h before)
  • Quote expiry (quote-expiry queue, 48h TTL)
  • Review prompts (review-prompts queue, post-completion)

Key invariant: production SystemClockService + direct queue.add path is unchanged; DelayService becomes a pure pass-through in prod (no test hooks loaded).

Prisma migration 20260420_add_test_tenant adds:

  • test_tenant UUID NULL column on: User, ProfessionalProfile, Booking, Quote, Review, Credential, ChatMessage, ChatThread, Dispatch, Notification, PushToken.
  • Index CREATE INDEX idx_<table>_test_tenant ON <table>(test_tenant) WHERE test_tenant IS NOT NULL; (partial index — prod rows have NULL, zero overhead).
  • NOT added to: Category, Service, I18nKey, other reference tables that are shared across tenants by design.

Prod rows = test_tenant IS NULL → invisible to tenant-scoped reads + safe from tenant-scoped deletes.

TenantMiddleware (Nest NestMiddleware) runs on every request. Behavior:

  1. If X-Test-Tenant header missing → pass through (prod path).
  2. If present → validate as UUID, validate TEST_HARNESS_SECRET signature header, push tenant into ClsModule store for the request lifecycle.
  3. Prisma client extension reads from CLS store and injects test_tenant into create/update payloads, WHERE clauses on find/findMany, WHERE clauses on delete/deleteMany — for every tagged table.
  4. External service calls (Stripe, Clerk, Novu) prefix resources with test_tenant:<uuid>: in metadata/subscriber IDs so /test/cleanup can sweep them.

3.6 Test-harness HTTP client — apps/mobile/test/e2e-web/lib/test-api.ts

Section titled “3.6 Test-harness HTTP client — apps/mobile/test/e2e-web/lib/test-api.ts”

Thin wrapper around the BE test endpoints used by Playwright specs + shared fixtures:

class TestApi {
createTenant(): Promise<string>
cleanupTenant(id: string): Promise<void>
seedScenario(name: ScenarioName, tenant: string): Promise<SeedResult>
advanceClock(ms: number): Promise<void>
feedGeo(proId: string, lat: number, lng: number): Promise<void>
getBookingState(id: string): Promise<BookingState>
waitForBookingStatus(id: string, status: BookingStatus, opts?: { timeoutMs }): Promise<void>
waitForWsEvent(channel: string, predicate: (ev) => boolean, opts?: { timeoutMs }): Promise<Event>
}

Exposes higher-level sync helpers (waitForBookingStatus, waitForWsEvent) implementing the research-doc §4.4 primitive split — polling for state convergence, SSE tap for event-fired assertions.

Every scenario calls cleanupTenant(id) in afterAll. The endpoint performs:

  1. DELETE FROM <tagged_table> WHERE test_tenant = $1 (11 tables, FK-safe order via Prisma cascade rules).
  2. stripe.accounts.del(…) for every Connect account whose metadata.test_tenant === id.
  3. clerk.users.deleteUser(…) for every user whose privateMetadata.test_tenant === id.
  4. novu.subscribers.delete(…) for every subscriber ID prefixed test_tenant:<id>:.
  5. BullMQ: queue.clean(0, 1000, 'delayed' | 'waiting' | 'active') filtered by jobData.test_tenant === id.

Sweep is idempotent — safe to call twice (e.g. after a crash).

Each scenario = 1 Playwright spec file + 1 named seed fixture. All 2-actor (consumer + pro). Runtime target ≤ 2 min each on M-series laptop.

File: apps/mobile/test/e2e-web/scenarios/01-booking-full-cycle.spec.ts Seed: booking_full_cycle_rome — 1 consumer, 1 pro (verified trust tier), 1 category (plumbing), pro at 41.9028, 12.4964 (Rome Colosseum).

Actors:

  • Consumer: logs in, searches “idraulico”, picks pro, books instant slot.
  • Pro: receives booking, accepts via dashboard, marks arrived, marks completed.

Happy path:

  1. Consumer opens /, searches category → ProCard for seeded pro visible.
  2. Consumer taps pro → /professional/[id] → “Prenota” → /book/[professionalId] → picks slot (next hour) → pays via Stripe test card 4242… → booking created in PENDING_ACCEPTANCE.
  3. waitForWsEvent('booking:new', { bookingId }) fires on pro’s WS channel within 5s.
  4. Pro dashboard → Requests tab → card shows → taps “Accetta” → confirmation bottom sheet → booking → ACCEPTED.
  5. advanceClock(3600_000) → clock is now scheduled start time.
  6. Pro taps “Sono arrivato” → booking IN_PROGRESS.
  7. Pro taps “Completa” → booking COMPLETED, Stripe PaymentIntent captured.
  8. Consumer sees receipt + review prompt within 3s.

Assertions:

  • Booking status transitions: CREATED → PENDING_ACCEPTANCE → ACCEPTED → IN_PROGRESS → COMPLETED in DB (verify via /test/state/booking/:id).
  • Stripe PaymentIntent state requires_capture → succeeded.
  • Novu event log contains booking.accepted + booking.completed for consumer subscriber.
  • Pro earnings tab shows the amount in “pending” bucket.
  • Cleanup sweeps all of the above.

Timing: 1 clock advance (+1h for scheduled start). No geo feed needed.

File: scenarios/02-sos-cascade.spec.ts Seed: sos_burst_pipe_rome — 1 consumer at 41.9028, 12.4964, 3 pros (p1, p2, p3) at concentric distances (1 km, 3 km, 8 km), all plumbing-qualified, all online.

Actors:

  • Consumer: opens SOS flow, describes “tubo scoppiato”, confirms dispatch.
  • Pro1 (nearest): receives first offer, ignores (times out).
  • Pro2 (middle): receives cascaded offer, accepts.
  • Pro3 (farthest): never sees the offer.

Happy path:

  1. Consumer: /(welcome)/ → holds SOS tab → /sos → describes problem → confirm.
  2. feedGeo('p1', 41.9040, 12.4960) etc. — positions set via BE, dispatch matching picks p1 first.
  3. waitForWsEvent('sos:offer', { proId: 'p1' }).
  4. advanceClock(30_000) — p1 countdown expires; BullMQ sos-countdown job fires; dispatch cascades to p2.
  5. waitForWsEvent('sos:offer', { proId: 'p2' }).
  6. Pro2 (2nd Playwright context) taps “Accetta” on SOS screen.
  7. waitForBookingStatus(booking, 'ACCEPTED').
  8. Pro2 enters live-tracking flow — feedGeo called 5× over simulated 10min (clock advanced 2min per tick) to simulate travel.
  9. Pro2 taps “Sono arrivato” → IN_PROGRESS.

Assertions:

  • Dispatch rows: p1 status OFFERED → EXPIRED, p2 status OFFERED → ACCEPTED, p3 status never created.
  • Booking status: CREATED → DISPATCHING → ACCEPTED → IN_PROGRESS.
  • Pro3 WebSocket never received sos:offer for this booking.
  • Consumer receives exactly one booking.accepted Novu notification (not two from cascade race).

Timing: 1 geo seed (3 points), 6 clock advances (30s expiry + 5× 2min travel). Heaviest scenario.

4.3 Scenario 03 — Consumer cancel with refund

Section titled “4.3 Scenario 03 — Consumer cancel with refund”

File: scenarios/03-cancel-refund.spec.ts Seed: Same as scenario 01 but booking pre-seeded in ACCEPTED state, scheduled 3h from test clock start.

Actors:

  • Consumer: opens booking detail, cancels.
  • Pro: receives cancellation notice, sees updated calendar.

Happy path:

  1. Consumer /booking/[id] → “Annulla prenotazione” → confirmation sheet → confirm.
  2. BE applies cancellation policy (> 2h notice = full refund).
  3. Stripe refund created.
  4. Pro’s WS booking:cancelled event fires.
  5. Pro’s calendar slot freed.

Assertions:

  • Booking status: ACCEPTED → CANCELLED_BY_CONSUMER.
  • Stripe refund exists, amount = full booking price.
  • Pro Availability table shows the slot no longer blocked.
  • Both actors see identical cancellation reason + amount refunded.

Timing: No clock advance needed (policy check uses clock.now() which test harness sets relative to scheduled time via seed).

4.4 Scenario 04 — Chat delivery under disconnection

Section titled “4.4 Scenario 04 — Chat delivery under disconnection”

File: scenarios/04-chat-delivery.spec.ts Seed: Booking in ACCEPTED state; chat thread auto-created.

Actors:

  • Consumer: sends messages, briefly disconnects, reconnects.
  • Pro: receives all messages in order.

Happy path:

  1. Consumer sends M1 “A che ora arrivi?” → pro receives within 1s.
  2. Pro sends M2 “Entro 15 min” → consumer receives within 1s.
  3. Consumer closes browser tab (Playwright context.close()), opens fresh tab, re-auths.
  4. While offline, pro sends M3 + M4.
  5. Consumer reconnects → chat:sync loads M3 + M4 in chronological order.
  6. Consumer sends M5 with optimistic UI → confirmed delivered within 1s.

Assertions:

  • DB ChatMessage rows: 5 total, chronological createdAt.
  • No duplicate messages (test the idempotency key).
  • Read receipts fire bidirectionally.
  • Both actors’ UI shows same last-message + unread counts.

Timing: No clock advance. Connection-level test.

4.5 Scenario 05 — Credential submission → approval → trust tier ripple

Section titled “4.5 Scenario 05 — Credential submission → approval → trust tier ripple”

File: scenarios/05-credential-approval.spec.ts Seed: Pro with trustTier = BASIC (score 10), no credentials yet. One consumer browsing.

Actors:

  • Pro (actor 1): submits P_IVA + INSURANCE credentials via upload flow.
  • Single-actor admin integration test runs as part of this scenario via BE direct API call (per Q10 — admin is NOT a 2nd orchestrated browser).
  • Consumer (actor 2): searches, sees pro, observes trust badge before + after approval.

Happy path:

  1. Pro: /credentials → upload P_IVA → /credentials/me/:id/upload-url → S3 presigned PUT → status PENDING.
  2. Pro uploads INSURANCE similarly.
  3. Admin: BE direct call POST /admin/credentials/:id/approve for both, via test harness with admin service token.
  4. Trust engine re-computes: P_IVA (30) + INSURANCE (25) + base (10) = 65 → tier VERIFIED.
  5. Consumer (fresh tab, same tenant) searches category → pro card now shows VerifiedBadge.
  6. Novu notification credential.approved delivered to pro’s subscriber.

Assertions:

  • Credential.status = APPROVED for both rows.
  • ProfessionalProfile.trustScore = 65, trustTier = VERIFIED.
  • Consumer search response payload includes trustTier: 'VERIFIED'.
  • Rate limit sanity: 3rd upload attempt within same day → 429 (Redis counter at rate:credential-upload:<proId>:<YYYY-MM-DD>).

Timing: No clock advance. Tests the trust-ripple path Q10 flagged as needing coverage.

4.6 Scenario 06 — Rating + review round-trip

Section titled “4.6 Scenario 06 — Rating + review round-trip”

File: scenarios/06-review-round-trip.spec.ts Seed: Completed booking from scenario 01’s end state (can chain or re-seed).

Actors:

  • Consumer: submits 5-star review.
  • Pro: sees rating reflected on profile + dashboard.

Happy path:

  1. Consumer: /review/[bookingId] → 5 stars → “Ottimo lavoro, puntuale.” → submit.
  2. DB review row created; pro’s aggregate rating recomputed.
  3. Pro dashboard polls → rating badge updates.
  4. Consumer profile shows the review in their history.
  5. advanceClock(604_800_000) (+7 days) → review “editable window” closes.
  6. Consumer attempts to edit review → 403 forbidden.

Assertions:

  • Review.rating = 5, Review.comment matches.
  • ProfessionalProfile.ratingAvg + ratingCount updated atomically.
  • Review appears in public /professional/[id] page for a 3rd unauthenticated browser context (verifies cache invalidation).
  • Post-clock-advance edit returns 403 with i18n-keyed error message.

Timing: 1 clock advance (+7 days).


All independently PR-mergeable. Dependency order strict. Each milestone targets ≤ 2 working days.

M-E1 — Test module scaffold + TenantMiddleware + migration (2 days)

Section titled “M-E1 — Test module scaffold + TenantMiddleware + migration (2 days)”

Scope:

  • Prisma migration 20260420_add_test_tenant — column + partial indexes on 11 tables.
  • apps/api/src/modules/test/ — module, controller, service, guard (env + secret gate).
  • TestController routes: POST /test/tenant/create, POST /test/cleanup (rows only — external sweeps deferred to M-E6).
  • TenantMiddleware + ClsModule wiring + Prisma client extension for tenant scoping.
  • Env var ENABLE_TEST_ENDPOINTS + TEST_HARNESS_SECRET added to .env.example + Dokploy dev config.
  • Unit tests for middleware (tenant-scoped reads, tenant-tagged writes, prod-path bypass when header absent).

Exit criteria:

  • Migration applies cleanly on dev DB (verify partial-index presence).
  • With ENABLE_TEST_ENDPOINTS=false → module tree absent from the Nest registry; routes return 404.
  • With flag on → POST /test/tenant/create returns UUID; POST /test/cleanup deletes tenant-tagged rows only.
  • Tests green, no prod-path regression.

Deps: none (foundational).

M-E2 — ClockService + DelayService refactor (2 days)

Section titled “M-E2 — ClockService + DelayService refactor (2 days)”

Scope:

  • apps/api/src/common/clock/ClockService abstract, SystemClockService, FakeClockService.
  • apps/api/src/common/delay/DelayService.schedule() wrapper.
  • Refactor ~40 call sites across Booking, Quote, SOS, Availability, Review, Credentials + 4 BullMQ processors to use clock.now() + delay.schedule().
  • POST /test/advance-time?ms=N endpoint wired to FakeClockService.advance() + DelayService.flushDueBy(clock.now()).
  • Unit tests: FakeClock monotonicity, DelayService job promotion, production throw-on-advance.

Exit criteria:

  • Zero new Date() / Date.now() calls in business-logic paths (enforced via Biome custom rule OR ripgrep CI gate).
  • /test/advance-time fires due BullMQ jobs within same request cycle.
  • All 240+ existing BE tests still green.

Deps: M-E1 (test module for endpoint).

M-E3 — Geo feed + SSE event tap + seed fixtures (1.5 days)

Section titled “M-E3 — Geo feed + SSE event tap + seed fixtures (1.5 days)”

Scope:

  • POST /test/geo-feed writes PostGIS point via ProfessionalService.updateLocation().
  • GET /test/ws-tap/:channel SSE endpoint — subscribes to Redis events:* pattern scoped by tenant, streams filtered events to Playwright.
  • 6 scenario seed fixtures under apps/api/src/modules/test/fixtures/ — each idempotent, tenant-scoped.
  • POST /test/seed/:scenario?tenant=<uuid> dispatcher.

Exit criteria:

  • Manually: curl POST /test/geo-feedSELECT ST_AsText(location) FROM professional_profile returns updated point.
  • curl SSE stream against tap receives Redis-published events in real time.
  • All 6 scenarios seed without FK violations; rerun on same tenant is idempotent.

Deps: M-E1, M-E2.

M-E4 — Playwright harness + test-api.ts + Scenario 01 (2 days)

Section titled “M-E4 — Playwright harness + test-api.ts + Scenario 01 (2 days)”

Scope:

  • apps/mobile/test/e2e-web/ workspace — playwright.config.ts, package.json (new scripts test:e2e, test:e2e:full), tsconfig.json.
  • lib/test-api.ts HTTP client (§3.6).
  • lib/sync.tswaitForBookingStatus (polling) + waitForWsEvent (SSE consumer).
  • lib/fixtures.ts — Playwright fixtures: testTenant, testApi, consumerPage, proPage.
  • lib/artifacts.ts — failure artefact bundler (screenshots + WS transcript + BE log snapshot).
  • scenarios/01-booking-full-cycle.spec.ts.
  • Husky pre-push does not run E2E (too slow; unit tests only per Q5).

Exit criteria:

  • pnpm test:e2e -- --scenario 01 passes on local Mac against dev BE.
  • Two browser contexts coordinate, test cleans up after itself (verify DB row count = 0 for tenant post-run).
  • Failure run bundles artefacts to apps/mobile/test/e2e-web/artifacts/<tenant-id>/.

Deps: M-E1, M-E2, M-E3.

M-E5 — Scenarios 02 (SOS) + 03 (cancel) + 04 (chat) (2 days)

Section titled “M-E5 — Scenarios 02 (SOS) + 03 (cancel) + 04 (chat) (2 days)”

Scope:

  • 3 scenario specs + seeds.
  • lib/stripe-test-cards.ts helper.
  • Disconnection/reconnection utilities in lib/sync.ts.
  • SOS-specific: cascade-ordering assertion helper.

Exit criteria:

  • All 3 scenarios pass 10 consecutive runs locally (flake < 1%).
  • SOS cascade: p3 never sees offer (negative assertion).

Deps: M-E4.

M-E6 — Scenarios 05 + 06 + external service cleanup (2 days)

Section titled “M-E6 — Scenarios 05 + 06 + external service cleanup (2 days)”

Scope:

  • Scenarios 05 + 06 specs + seeds.
  • Extend /test/cleanup to sweep Stripe (accounts + customers), Clerk (users), Novu (subscribers) scoped by tenant metadata prefix.
  • Admin direct-API helper in lib/admin-api.ts (service-token auth for credential approve/reject).

Exit criteria:

  • Scenarios 05 + 06 green.
  • Post-cleanup verification: Stripe account list filtered by metadata.test_tenant=<id> returns zero; same for Clerk users; same for Novu subscribers.
  • Twilio spend-alert webhook configured (Q9 budget guard, €20/mo threshold).

Deps: M-E5.

M-E7 — Scripts + docs + pre-demo smoke (1 day)

Section titled “M-E7 — Scripts + docs + pre-demo smoke (1 day)”

Scope:

  • Root pnpm test:e2epnpm --filter @ideony/mobile-e2e-web test:e2e (bail on first fail per Q8).
  • Root pnpm test:e2e:full → same without bail.
  • apps/mobile/test/e2e-web/README.md — how to run, how to add a scenario, how to read failure artefacts.
  • scripts/e2e-smoke.sh — pre-demo invoker using :full variant + Slack webhook on completion.
  • CLAUDE.md update (Testing section) documenting the new commands.

Exit criteria:

  • pnpm test:e2e end-to-end green, wall time < 15 min for all 6 scenarios on laptop.
  • Smoke script integrates with demo-prep runbook.
  • Post-demo TODO ticket filed: “Add cron 0 3 * * * on Hetzner dev instance running /opt/ideony/scripts/e2e-smoke.sh w/ Slack alerts.”

Deps: M-E6.

Total: 7 milestones, ~12.5 working days.

apps/api/
├── prisma/
│ └── migrations/
│ └── 20260420_add_test_tenant/migration.sql
├── src/
│ ├── common/
│ │ ├── clock/
│ │ │ ├── clock.service.ts # abstract
│ │ │ ├── system-clock.service.ts # prod impl
│ │ │ └── fake-clock.service.ts # test impl
│ │ ├── delay/
│ │ │ └── delay.service.ts # BullMQ wrapper
│ │ ├── prisma/
│ │ │ └── tenant-extension.ts # client extension
│ │ └── tenant/
│ │ ├── tenant.cls-store.ts
│ │ └── tenant.middleware.ts
│ └── modules/
│ └── test/
│ ├── test.module.ts # conditional import
│ ├── test.controller.ts
│ ├── test.service.ts
│ ├── test.guard.ts
│ └── fixtures/
│ ├── booking-full-cycle-rome.ts
│ ├── sos-burst-pipe-rome.ts
│ ├── cancel-with-refund.ts
│ ├── chat-thread.ts
│ ├── credential-trust-ripple.ts
│ └── review-round-trip.ts
└── test/
└── integration/
├── admin/ # single-actor admin tests (Q10)
│ └── credential-approval.spec.ts
└── multi-role/
└── ws-multiclient.spec.ts # research-doc §4.2 fast layer (optional mini-add)
apps/mobile/
└── test/
└── e2e-web/ # Playwright workspace (Q7 — web only)
├── package.json # name: @ideony/mobile-e2e-web
├── playwright.config.ts
├── tsconfig.json
├── README.md
├── lib/
│ ├── test-api.ts # HTTP client to /test/*
│ ├── admin-api.ts # service-token admin calls
│ ├── sync.ts # waitForBookingStatus / waitForWsEvent
│ ├── fixtures.ts # Playwright fixtures
│ ├── artifacts.ts # failure artefact bundler
│ └── stripe-test-cards.ts
├── scenarios/
│ ├── 01-booking-full-cycle.spec.ts
│ ├── 02-sos-cascade.spec.ts
│ ├── 03-cancel-refund.spec.ts
│ ├── 04-chat-delivery.spec.ts
│ ├── 05-credential-approval.spec.ts
│ └── 06-review-round-trip.spec.ts
└── artifacts/ # .gitignored — failure bundles
scripts/
└── e2e-smoke.sh # pre-demo invoker

Workspace packaging: apps/mobile/test/e2e-web is a distinct pnpm workspace package (not nested in apps/mobile’s package.json) so Playwright deps don’t inflate the mobile app bundle. Root pnpm-workspace.yaml adds apps/mobile/test/e2e-web.

Terminal window
# One-time setup
pnpm install
pnpm --filter @ideony/mobile-e2e-web exec playwright install chromium
# Run all 6 scenarios, bail on first fail
pnpm test:e2e
# Run all, report all (for pre-demo smoke)
pnpm test:e2e:full
# Run single scenario
pnpm test:e2e -- --grep "Scenario 01"

Target env: apps/mobile/test/e2e-web/.env.test points at dev BE:

  • TEST_API_URL=https://api.ideony.is-a.dev (post-named-tunnel) or Quick Tunnel URL
  • TEST_HARNESS_SECRET=<shared secret, rotated>
  • STRIPE_TEST_CARD=4242424242424242
  • CLERK_FRONTEND_API=humble-garfish-77.clerk.accounts.dev

Husky pre-push: does NOT run E2E (unit tests only). Multi-role suite is manually triggered pre-demo / pre-merge.

Target: 2026-04-21+ (after demo). Add on Hetzner dev instance (178.104.154.74):

0 3 * * * /opt/ideony/scripts/e2e-smoke.sh >> /var/log/ideony-e2e.log 2>&1

Script body:

  1. cd /opt/ideony && git pull --rebase
  2. pnpm install --frozen-lockfile
  3. pnpm test:e2e:full --reporter=json > /tmp/e2e-report.json
  4. On fail: POST Slack webhook with failed scenarios + link to artefact tarball uploaded to R2.

7.3 GitHub Actions (deferred — not MVP 0)

Section titled “7.3 GitHub Actions (deferred — not MVP 0)”

Structure sketched but not implemented in Phase E. When activated post-revenue:

  • Self-hosted ARM64 runner (already exists for build/deploy).
  • Matrix over 6 scenarios, fail-fast: false, 15-min timeout, artefact upload.
  • Trigger: workflow_dispatch + pull_request for paths apps/api/src/modules/{booking,sos,credentials,chat,reviews,dispatch}/** + apps/mobile/app/**.
#RiskLikelihoodImpactMitigation
R1test_tenant column forgotten on a new table → cross-tenant leakageMediumHighBiome custom rule flagging Prisma models missing test_tenant; PR checklist item; M-E1 includes snapshot test of tagged-table list
R2ClockService refactor misses a call site → flaky timing testMediumMediumBiome rule banning new Date() + Date.now() in apps/api/src/modules/** w/ exceptions list; M-E2 exit criteria enforces zero
R3External service (Stripe/Clerk/Novu) rate limit hit during test runLowMediumTenant isolation spreads creations; Twilio €20/mo budget alert (Q9); retry-with-backoff in test-api.ts
R4Dev DB schema drift between migrations and E2E scenario seedsMediumMediumSeed fixtures import Prisma client types directly (compile-time guarantee); CI job runs pnpm prisma migrate deploy before E2E
R5Playwright browser context auth races (both actors using same Clerk session)MediumHighlib/fixtures.ts creates two Clerk users per test, uses BAPI session creation pattern (reference_clerk_e2e memory) for independent JWTs
R6SSE tap drops events on reconnection → waitForWsEvent hangsLowMediumServer-side buffer last 50 events per tenant in Redis; tap replays on connect
R7FakeClock.advance() + BullMQ job firing race conditionMediumHighDelayService.flushDueBy() awaits all promoted jobs’ completed event before resolving; scenario tests add assertion after advance
R8Cleanup doesn’t sweep new external resources added in future modulesMediumMediumCleanup service uses reflect-metadata-driven registry — any module that adds external resources must register a sweep callback; lint rule enforces
R9Test run on local machine blocks dev (port conflicts, Clerk rate)LowLowPlaywright uses dev BE (not local); no local BE needed; rate-limit risk covered in R3
R10Scenarios depend on seed data that conflicts with each other in parallel runsMediumMediumEvery scenario allocates its own test_tenant UUID — no shared seeds; parallel safe by design

Explicit exclusions — do not implement in Phase E. Add to post-MVP 0 backlog if/when justified:

  • 3-actor scenarios (consumer + pro + admin-live). Q10 deferred — admin flows covered via single-actor integration tests in apps/api/test/integration/admin/. Revisit when admin surface grows (disputes, moderation, SOS override).
  • Mobile-native E2E (iOS/Android Maestro). Q7 deferred — Expo web covers 95% of mobile UI logic; native-specific bugs (push tokens, deep links, file-picker) caught manually via EAS preview builds + TestFlight. Add Maestro when mobile traction + revenue justify.
  • Visual regression inside multi-role flows. Q6 deferred — separation of signal mandate; multi-role is async + flake-prone, visual snapshots amplify flake. Dedicated single-actor visual suite post-MVP 0 once 10+ visual bugs surface.
  • TestRigor / other AI-authored DSL. Q4 rejected — €300+/mo pre-revenue + lock-in. Revisit in 6mo if cofounders explicitly blocked from authoring PR reviews on Playwright TS.
  • City simulator / algorithm validation (Bolt/Glovo style). Research §1.4 — premature; no historical data yet. Phase F+ concern.
  • Synthetic canaries in prod (Checkly style). Research §1.6 — post-v1 only; needs prod env first.
  • GitHub Actions CI integration. Local-only per Q5 — add in post-revenue infra hardening phase.
  • WebSocket multi-client integration tests (research §4.2 fast layer). Optional — M-E1 directory structure reserves apps/api/test/integration/multi-role/ for future adds, but not shipped in Phase E.
  • 2026-04-20 — Spec created. All 10 locked decisions consolidated. 6 canonical scenarios detailed. 7 milestones scoped (~12.5 days). 10 risks catalogued. Ready for M-E1 kickoff.