Rock-solid FE+BE — design
Rock-solid FE+BE — design
Section titled “Rock-solid FE+BE — design”Date: 2026-04-21 Trigger: user-surfaced screenshots (cloud Unmatched Route, local Loading stuck, many COEP/CORS errors, font 404s, demo tonight). Status: brainstorm — awaiting user approval before any implementation. Supersedes: nothing — layers on top of existing plans + SOTA research below.
Prior work to consult (not to duplicate):
docs/testing.md— current test pyramid (354 mobile unit + 195 API unit + 45 API E2E + 122 Playwright) + Phase E (multi-role)docs/specs/2026-04-20-phase-e-multi-role-e2e-spec.md— 10 locked decisions for multi-role harness (tenant col, FakeClock, GPS inject, Maestro+Playwright)docs/research/2026-04-19-multi-role-e2e-deep-dive.md— Uber CTF, Airbnb Happo, DoorDash multi-tenancy, Bolt city sim SOTA deep-divedocs/plans/2026-04-21-uxui-realignment.md— current sprint (TabBar, i18n, fonts, slugs, icons)docs/infrastructure.md— hosts, DNS, CI, secrets
1. Executive summary — user premise accepted
Section titled “1. Executive summary — user premise accepted”User’s premise (2026-04-21): “if it works local, apart from env or infra issues, it should work also on cloud.” — CORRECT. Every demo-blocker traces to one of three buckets:
- Latent app bugs visible on local once you actually walk the flow (e.g.
/(consumer)/homepost-auth crash) - Env/infra drift (Metro cache staleness, missing Dockerfile ARG, Sentry DSN swap, HTTP vs HTTPS, stale Cloudflare tunnel URL)
- Low test signal (354 mocked-tautology unit tests, 116 Playwright specs that never run, dead
tour.spec.tsconfig, web-build smoke missing)
The strategy is therefore: harden local to zero-defect, then cloud = deploy exercise, not debug exercise.
2. Root-cause table (from 5-agent investigation)
Section titled “2. Root-cause table (from 5-agent investigation)”| # | Bug | Evidence | Scope |
|---|---|---|---|
| B1 | router.replace("/(consumer)/home" as never) after email-OTP + OAuth success | lib/auth/use-continue-flow.ts:74,97 — /(consumer)/home doesn’t exist; every fresh-auth user hits Unmatched Route. as never cast suppressed typed-routes guard. | app — latent bug, reaches users |
| B2 | build:web script missing --clear → Metro cache leaks stale EXPO_PUBLIC_* values across rebuilds | Memory feedback_metro_cache_env.md documented; agent probe confirmed localhost:3000 baked into http://178.104.154.74 bundle. Fixed already; commit pending. | build |
| B3 | apps/mobile/Dockerfile missing ARG EXPO_PUBLIC_MAPBOX_ACCESS_TOKEN + Dokploy buildArgs missing same | Any map screen 401s against Mapbox on cloud → blank maps | infra — demo cosmetic blocker |
| B4 | Dokploy web buildArgs ship BE Sentry DSN (…618666576) as EXPO_PUBLIC_SENTRY_DSN (should be …624302672) | Agent grep of env vs Dokploy | infra — observability wrong |
| B5 | .dockerignore whitelists !.env.example → risks re-leaking localhost:3000 even with --clear | .dockerignore:12-13; apps/mobile/.env.example:2 | build — latent re-breakage risk |
| B6 | apps/mobile/playwright.config.ts points at missing scripts/tour.spec.ts | File deleted during UX overhaul, config not updated | tests — dead config |
| B7 | 132 Playwright E2E specs never run (ECONNREFUSED :3001; no webServer in config; verify-local.sh excludes @ideony/e2e) | Agent pnpm test output | tests — 100% of E2E investment invisible |
| B8 | No web-build smoke gate → import.meta crash class ships | Today’s commit a6890f0 landed after prod white-screen | tests |
| B9 | @ideony/mobile Jest “worker failed to exit gracefully” warning | Leaked timer in reanimated or socket.io mock | tests — future flake |
| B10 | No coverage measurement; no incidents-per-deploy tracking | apps/api has test:cov orphan, apps/mobile has no config | tests |
| B11 | Sentry Session Replay + Performance not enabled | @sentry/react-native + @sentry/nestjs installed, flags off | observability |
| B12 | No OpenAPI contract drift gate | BE ships schema change → FE fails silently until runtime | tests |
| B13 | No visual regression | UI drift lands undetected | tests |
3. Phased plan
Section titled “3. Phased plan”Phase 0 — Demo tonight (P0 blockers, ~90 min)
Section titled “Phase 0 — Demo tonight (P0 blockers, ~90 min)”Fix what the cofounders WILL see on 2026-04-21 evening. Nothing speculative.
| Step | Change | Files | Est |
|---|---|---|---|
| P0.1 | Fix B1 — change router.replace("/(consumer)/home" as never) → router.replace("/(consumer)") at both call sites | lib/auth/use-continue-flow.ts:74,97 | 5 min |
| P0.2 | Commit B2 fix — build:web → expo export --platform web --clear (already on disk; needs commit + CHANGELOG + status.md) | apps/mobile/package.json, CHANGELOG.md, docs/status.md | 5 min |
| P0.3 | Fix B3 — add ARG EXPO_PUBLIC_MAPBOX_ACCESS_TOKEN + ENV line to apps/mobile/Dockerfile; add buildArg on Dokploy web app via MCP | apps/mobile/Dockerfile, Dokploy | 15 min |
| P0.4 | Fix B4 — swap Dokploy web EXPO_PUBLIC_SENTRY_DSN to FE project DSN | Dokploy MCP | 5 min |
| P0.5 | Fix B5 — remove !.env.example whitelist in .dockerignore (belt-and-suspenders after B2) | .dockerignore | 2 min |
| P0.6 | Redeploy FE + verify — ./scripts/deploy.sh --fe-only → probe bundle for localhost, Mapbox render, Sentry DSN, Clerk signup success → land on /(consumer) not Unmatched | shell | 15 min |
| P0.7 | Walk consumer flow manually via Playwright headed against LOCAL first: welcome → signup → home → triage → results → pro-detail → book → chat → review → SOS. Capture any new breakage. | ad-hoc | 30 min |
| P0.8 | Fix anything P0.7 surfaces. Stop at first pass. | — | buffer |
Exit criteria Phase 0:
- Cloud FE → prod BE (no localhost leak)
- Signup/signin lands on
/(consumer), not Unmatched Route - Mapbox tiles render
- Sentry receives FE events in correct project
- One complete consumer flow green on local AND cloud
Phase 1 — This week (after demo, before next sprint)
Section titled “Phase 1 — This week (after demo, before next sprint)”Eliminate the class of bugs Phase 0 fixed.
| Step | Change | Why |
|---|---|---|
| P1.1 | Delete apps/mobile/playwright.config.ts (dead) OR repoint at ../../e2e/web/smoke.spec.ts | B6 — single source of Playwright config in e2e/web/ |
| P1.2 | Wire web-build smoke into verify-local.sh: expo export --platform web --clear → npx serve dist -p 8081 → playwright test smoke.spec.ts --project=chromium → fail on any page.on('pageerror'). ~45 sec gate. | B8 — catches import.meta class permanently |
| P1.3 | Start API in Playwright webServer so @ideony/e2e can actually run. Add docker:up precondition. Drop --filter='!@ideony/e2e' from verify-local.sh. Run subset (smoke + scenario-0[1-3]) in gate; full suite nightly. | B7 — unlocks 132 specs |
| P1.4 | Delete 3 low-signal tests (per test-audit agent): test/app/(consumer)/_layout.test.tsx, test/lib/hooks/use-theme-colors.test.ts, test/components/chrome/{TabBar,SkeletonLoader,EmptyState}.test.tsx presentational-smoke bundle. ~15 specs removed. | B10 — mocked tautologies |
| P1.5 | Add 3 HIGH-value tests: (a) real Stripe Connect onboarding E2E (BAPI user + webhook), (b) SOS dispatch multi-actor (uses existing scenario-02 stub), (c) Clerk svix webhook with real signature. | Critical paths currently 0% |
| P1.6 | Enable Sentry Session Replay + Performance on @sentry/react-native (FE) + @sentry/nestjs (BE). Set replaysOnErrorSampleRate: 1.0, tracesSampleRate: 0.2. | B11 — free RUM from installed dep |
| P1.7 | Add openapi-diff CI step (when CI restored): after BE build, diff packages/api-client/src/generated/openapi.json vs main. Fail on breaking change. ~1 hr to wire. | B12 — cheap SOTA win |
| P1.8 | Fix B9 — instrument Jest teardown, find the leaked timer (likely in reanimated or socket.io mock). | B9 |
Phase 2 — Post-MVP-0 (observability + SOTA uplift)
Section titled “Phase 2 — Post-MVP-0 (observability + SOTA uplift)”| Step | Change | Cost |
|---|---|---|
| P2.1 | Maestro for native iOS+Android — Expo itself migrating Detox→Maestro. Start with 3 flows (auth, booking, SOS). YAML, <1% flakiness (vs Detox 2%). | ~3 days |
| P2.2 | Visual regression — Playwright toHaveScreenshot() on 7 hero screens. Baseline commit per release tag. Free, already installed. | ~1 day |
| P2.3 | Synthetic monitoring — Checkly w/ existing Playwright scripts. 3 critical journeys every 5 min from EU. Free tier fits MVP 0. | ~½ day |
| P2.4 | Flaky quarantine — @flaky Playwright tag + separate non-blocking job; cap 5 entries; weekly auto-retire passing-3x-consecutive. | ~½ day |
| P2.5 | Switch metric — coverage % → incidents/deploy + MTTR on SOS + payment modules (per SOTA — Airbnb/Uber/Faire all track outcomes, not %) | — |
| P2.6 | Migrate to named Cloudflare tunnel api.ideony.is-a.dev + app.ideony.is-a.dev (per infra.md blocker: is-a.dev/register#36614) → HTTPS stable → no more tunnel-URL rotation rebuilds → Clerk/Stripe/Mapbox secure context warnings gone | blocked on PR |
| P2.7 | Resume Phase E multi-role E2E per existing spec. Blueprint already locked. | 12.5 days est |
4. Out of scope
Section titled “4. Out of scope”- Rewriting test suite from scratch — 195 API unit tests mostly HIGH value; don’t throw the baby out. Only touch the 3 low-value clusters listed.
- Chromatic/Percy visual regression SaaS — Playwright built-in covers MVP 0 free; revisit if Storybook gets heavy.
- Contract tests via Pact bidirectional —
openapi-diffis 80% of the value at 20% of the cost; revisit Pact post-PMF. - Bolt/Glovo-style city simulator — pre-revenue, no historical data to seed with. Phase F+.
- Multi-tenancy in prod (DoorDash pattern) — overkill for MVP 0;
test_tenantcolumn per Phase E spec is already the right scope. - Switching off Playwright in favour of Cypress — no benefit; cost is real.
5. Risks
Section titled “5. Risks”| # | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| R1 | B1 fix breaks existing E2E that happened to land on Unmatched Route and still asserted some element | L | M | grep E2E for /(consumer)/home; update specs same commit |
| R2 | Removing !.env.example whitelist breaks a dev who clones fresh and needs the example for reference | L | L | Keep .env.example in repo (it’s in git); only removed from docker build context |
| R3 | Adding Playwright webServer slows verify-local.sh from 30s → 90s | M | L | Run smoke only locally (3 specs); full suite nightly |
| R4 | Sentry Session Replay at 100% on-error overwhelms free tier | L | M | Start at 0.2; tune after first week |
| R5 | openapi-diff false positives from hey-api version bumps | M | L | Pin hey-api version; allowlist schema version bumps |
6. Success criteria
Section titled “6. Success criteria”- Phase 0: 1 full consumer flow green on local AND cloud by 2026-04-21 18:00
- Phase 1:
verify-local.shruns web-build smoke + E2E smoke; ≤90 sec total; blocks commit onpageerror - Phase 2: Sentry shows FE session replays on error; Checkly pages on critical path regression within 5 min
7. Decisions needed from user
Section titled “7. Decisions needed from user”- Q1 — Proceed with Phase 0 now (all 8 steps before demo)? Or only P0.1–P0.4 (app + infra essentials, skip test-suite cleanup)?
- Q2 — Phase 1 post-demo: scope as above (~1 day of work) or trim? Specifically: does P1.3 (unblock 132 Playwright specs) feel worth 2-3 hrs today?
- Q3 — Phase 2 Maestro + Checkly + visual regression — approve as post-MVP-0 sprint target, or defer?
- Q4 — Do I write this spec to
docs/specs/2026-04-21-rock-solid-fe-be-design.md(current location, matches repo convention) or move todocs/superpowers/specs/(superpowers default)?
Awaiting confirmation — no implementation until user says “proceed” (or proceed P0 only / modify: …).