Rock-solid FE+BE — design

Date: 2026-04-21 Trigger: user-surfaced screenshots (cloud Unmatched Route, local Loading stuck, many COEP/CORS errors, font 404s, demo tonight). Status: brainstorm — awaiting user approval before any implementation. Supersedes: nothing — layers on top of existing plans + SOTA research below.

Prior work to consult (not to duplicate):

docs/testing.md — current test pyramid (354 mobile unit + 195 API unit + 45 API E2E + 122 Playwright) + Phase E (multi-role)
docs/specs/2026-04-20-phase-e-multi-role-e2e-spec.md — 10 locked decisions for multi-role harness (tenant col, FakeClock, GPS inject, Maestro+Playwright)
docs/research/2026-04-19-multi-role-e2e-deep-dive.md — Uber CTF, Airbnb Happo, DoorDash multi-tenancy, Bolt city sim SOTA deep-dive
docs/plans/2026-04-21-uxui-realignment.md — current sprint (TabBar, i18n, fonts, slugs, icons)
docs/infrastructure.md — hosts, DNS, CI, secrets

1. Executive summary — user premise accepted

User’s premise (2026-04-21): “if it works local, apart from env or infra issues, it should work also on cloud.” — CORRECT. Every demo-blocker traces to one of three buckets:

Latent app bugs visible on local once you actually walk the flow (e.g. /(consumer)/home post-auth crash)
Env/infra drift (Metro cache staleness, missing Dockerfile ARG, Sentry DSN swap, HTTP vs HTTPS, stale Cloudflare tunnel URL)
Low test signal (354 mocked-tautology unit tests, 116 Playwright specs that never run, dead tour.spec.ts config, web-build smoke missing)

The strategy is therefore: harden local to zero-defect, then cloud = deploy exercise, not debug exercise.

2. Root-cause table (from 5-agent investigation)

#	Bug	Evidence	Scope
B1	`router.replace("/(consumer)/home" as never)` after email-OTP + OAuth success	`lib/auth/use-continue-flow.ts:74,97` — `/(consumer)/home` doesn’t exist; every fresh-auth user hits Unmatched Route. `as never` cast suppressed typed-routes guard.	app — latent bug, reaches users
B2	`build:web` script missing `--clear` → Metro cache leaks stale `EXPO_PUBLIC_*` values across rebuilds	Memory `feedback_metro_cache_env.md` documented; agent probe confirmed `localhost:3000` baked into `http://178.104.154.74` bundle. Fixed already; commit pending.	build
B3	`apps/mobile/Dockerfile` missing `ARG EXPO_PUBLIC_MAPBOX_ACCESS_TOKEN` + Dokploy buildArgs missing same	Any map screen 401s against Mapbox on cloud → blank maps	infra — demo cosmetic blocker
B4	Dokploy web buildArgs ship BE Sentry DSN (`…618666576`) as `EXPO_PUBLIC_SENTRY_DSN` (should be `…624302672`)	Agent grep of env vs Dokploy	infra — observability wrong
B5	`.dockerignore` whitelists `!.env.example` → risks re-leaking `localhost:3000` even with `--clear`	`.dockerignore:12-13`; `apps/mobile/.env.example:2`	build — latent re-breakage risk
B6	`apps/mobile/playwright.config.ts` points at missing `scripts/tour.spec.ts`	File deleted during UX overhaul, config not updated	tests — dead config
B7	132 Playwright E2E specs never run (ECONNREFUSED :3001; no `webServer` in config; `verify-local.sh` excludes `@ideony/e2e`)	Agent `pnpm test` output	tests — 100% of E2E investment invisible
B8	No web-build smoke gate → `import.meta` crash class ships	Today’s commit a6890f0 landed after prod white-screen	tests
B9	`@ideony/mobile` Jest “worker failed to exit gracefully” warning	Leaked timer in reanimated or socket.io mock	tests — future flake
B10	No coverage measurement; no incidents-per-deploy tracking	`apps/api` has `test:cov` orphan, `apps/mobile` has no config	tests
B11	Sentry Session Replay + Performance not enabled	`@sentry/react-native` + `@sentry/nestjs` installed, flags off	observability
B12	No OpenAPI contract drift gate	BE ships schema change → FE fails silently until runtime	tests
B13	No visual regression	UI drift lands undetected	tests

3. Phased plan

Phase 0 — Demo tonight (P0 blockers, ~90 min)

Fix what the cofounders WILL see on 2026-04-21 evening. Nothing speculative.

Step	Change	Files	Est
P0.1	Fix B1 — change `router.replace("/(consumer)/home" as never)` → `router.replace("/(consumer)")` at both call sites	`lib/auth/use-continue-flow.ts:74,97`	5 min
P0.2	Commit B2 fix — `build:web` → `expo export --platform web --clear` (already on disk; needs commit + CHANGELOG + status.md)	`apps/mobile/package.json`, `CHANGELOG.md`, `docs/status.md`	5 min
P0.3	Fix B3 — add `ARG EXPO_PUBLIC_MAPBOX_ACCESS_TOKEN` + `ENV` line to `apps/mobile/Dockerfile`; add buildArg on Dokploy web app via MCP	`apps/mobile/Dockerfile`, Dokploy	15 min
P0.4	Fix B4 — swap Dokploy web `EXPO_PUBLIC_SENTRY_DSN` to FE project DSN	Dokploy MCP	5 min
P0.5	Fix B5 — remove `!.env.example` whitelist in `.dockerignore` (belt-and-suspenders after B2)	`.dockerignore`	2 min
P0.6	Redeploy FE + verify — `./scripts/deploy.sh --fe-only` → probe bundle for `localhost`, Mapbox render, Sentry DSN, Clerk signup success → land on `/(consumer)` not Unmatched	shell	15 min
P0.7	Walk consumer flow manually via Playwright headed against LOCAL first: welcome → signup → home → triage → results → pro-detail → book → chat → review → SOS. Capture any new breakage.	ad-hoc	30 min
P0.8	Fix anything P0.7 surfaces. Stop at first pass.	—	buffer

Exit criteria Phase 0:

Cloud FE → prod BE (no localhost leak)
Signup/signin lands on /(consumer), not Unmatched Route
Mapbox tiles render
Sentry receives FE events in correct project
One complete consumer flow green on local AND cloud

Phase 1 — This week (after demo, before next sprint)

Eliminate the class of bugs Phase 0 fixed.

Step	Change	Why
P1.1	Delete `apps/mobile/playwright.config.ts` (dead) OR repoint at `../../e2e/web/smoke.spec.ts`	B6 — single source of Playwright config in `e2e/web/`
P1.2	Wire web-build smoke into `verify-local.sh`: `expo export --platform web --clear` → `npx serve dist -p 8081` → `playwright test smoke.spec.ts --project=chromium` → fail on any `page.on('pageerror')`. ~45 sec gate.	B8 — catches `import.meta` class permanently
P1.3	Start API in Playwright `webServer` so `@ideony/e2e` can actually run. Add `docker:up` precondition. Drop `--filter='!@ideony/e2e'` from `verify-local.sh`. Run subset (`smoke` + `scenario-0[1-3]`) in gate; full suite nightly.	B7 — unlocks 132 specs
P1.4	Delete 3 low-signal tests (per test-audit agent): `test/app/(consumer)/_layout.test.tsx`, `test/lib/hooks/use-theme-colors.test.ts`, `test/components/chrome/{TabBar,SkeletonLoader,EmptyState}.test.tsx` presentational-smoke bundle. ~15 specs removed.	B10 — mocked tautologies
P1.5	Add 3 HIGH-value tests: (a) real Stripe Connect onboarding E2E (BAPI user + webhook), (b) SOS dispatch multi-actor (uses existing `scenario-02` stub), (c) Clerk svix webhook with real signature.	Critical paths currently 0%
P1.6	Enable Sentry Session Replay + Performance on `@sentry/react-native` (FE) + `@sentry/nestjs` (BE). Set `replaysOnErrorSampleRate: 1.0`, `tracesSampleRate: 0.2`.	B11 — free RUM from installed dep
P1.7	Add `openapi-diff` CI step (when CI restored): after BE build, diff `packages/api-client/src/generated/openapi.json` vs `main`. Fail on breaking change. ~1 hr to wire.	B12 — cheap SOTA win
P1.8	Fix B9 — instrument Jest teardown, find the leaked timer (likely in reanimated or socket.io mock).	B9

Phase 2 — Post-MVP-0 (observability + SOTA uplift)

Step	Change	Cost
P2.1	Maestro for native iOS+Android — Expo itself migrating Detox→Maestro. Start with 3 flows (auth, booking, SOS). YAML, <1% flakiness (vs Detox 2%).	~3 days
P2.2	Visual regression — Playwright `toHaveScreenshot()` on 7 hero screens. Baseline commit per release tag. Free, already installed.	~1 day
P2.3	Synthetic monitoring — Checkly w/ existing Playwright scripts. 3 critical journeys every 5 min from EU. Free tier fits MVP 0.	~½ day
P2.4	Flaky quarantine — `@flaky` Playwright tag + separate non-blocking job; cap 5 entries; weekly auto-retire passing-3x-consecutive.	~½ day
P2.5	Switch metric — coverage % → incidents/deploy + MTTR on SOS + payment modules (per SOTA — Airbnb/Uber/Faire all track outcomes, not %)	—
P2.6	Migrate to named Cloudflare tunnel `api.ideony.is-a.dev` + `app.ideony.is-a.dev` (per infra.md blocker: `is-a.dev/register#36614`) → HTTPS stable → no more tunnel-URL rotation rebuilds → Clerk/Stripe/Mapbox secure context warnings gone	blocked on PR
P2.7	Resume Phase E multi-role E2E per existing spec. Blueprint already locked.	12.5 days est

4. Out of scope

Rewriting test suite from scratch — 195 API unit tests mostly HIGH value; don’t throw the baby out. Only touch the 3 low-value clusters listed.
Chromatic/Percy visual regression SaaS — Playwright built-in covers MVP 0 free; revisit if Storybook gets heavy.
Contract tests via Pact bidirectional — openapi-diff is 80% of the value at 20% of the cost; revisit Pact post-PMF.
Bolt/Glovo-style city simulator — pre-revenue, no historical data to seed with. Phase F+.
Multi-tenancy in prod (DoorDash pattern) — overkill for MVP 0; test_tenant column per Phase E spec is already the right scope.
Switching off Playwright in favour of Cypress — no benefit; cost is real.

5. Risks

#	Risk	Likelihood	Impact	Mitigation
R1	B1 fix breaks existing E2E that happened to land on Unmatched Route and still asserted some element	L	M	grep E2E for `/(consumer)/home`; update specs same commit
R2	Removing `!.env.example` whitelist breaks a dev who clones fresh and needs the example for reference	L	L	Keep `.env.example` in repo (it’s in git); only removed from docker build context
R3	Adding Playwright `webServer` slows `verify-local.sh` from 30s → 90s	M	L	Run smoke only locally (3 specs); full suite nightly
R4	Sentry Session Replay at 100% on-error overwhelms free tier	L	M	Start at 0.2; tune after first week
R5	openapi-diff false positives from hey-api version bumps	M	L	Pin hey-api version; allowlist schema version bumps

6. Success criteria

Phase 0: 1 full consumer flow green on local AND cloud by 2026-04-21 18:00
Phase 1: verify-local.sh runs web-build smoke + E2E smoke; ≤90 sec total; blocks commit on pageerror
Phase 2: Sentry shows FE session replays on error; Checkly pages on critical path regression within 5 min

7. Decisions needed from user

Q1 — Proceed with Phase 0 now (all 8 steps before demo)? Or only P0.1–P0.4 (app + infra essentials, skip test-suite cleanup)?
Q2 — Phase 1 post-demo: scope as above (~1 day of work) or trim? Specifically: does P1.3 (unblock 132 Playwright specs) feel worth 2-3 hrs today?
Q3 — Phase 2 Maestro + Checkly + visual regression — approve as post-MVP-0 sprint target, or defer?
Q4 — Do I write this spec to docs/specs/2026-04-21-rock-solid-fe-be-design.md (current location, matches repo convention) or move to docs/superpowers/specs/ (superpowers default)?

Awaiting confirmation — no implementation until user says “proceed” (or proceed P0 only / modify: …).