You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-> AuthN/AuthZ (JWT + API keys, RBAC, audit, rate limiting)
38
+
39
+
Control PostgreSQL (:5433; jobs, users, audit logs, API keys, dedup)
40
+
<-> FastAPI Control API
41
+
<-> Worker Service
42
+
<-> gRPC Replayer (dedup state)
43
+
44
+
Frontend Dashboard (:3000)
45
+
-> Control API (REST + WebSocket)
46
+
-> Metrics proxy UI (/api/metrics)
47
+
48
+
Prometheus (:9099)
49
+
-> Scrapes ingestor/control/replayer metrics
50
+
-> Alertmanager (:9093)
51
+
-> Grafana (:3001)
53
52
```
54
53
55
-
## Architectural Decisions (As of February 15, 2026)
54
+
## Architectural Decisions
56
55
57
-
-**FastAPI control plane over Django**: Legacy Django scaffolding was removed and all control-plane features live in `control/app/`.
58
56
-**Centralized dedup in Replayer**: Workers do not maintain local dedup state; idempotency is enforced in `replayer/server.py`.
59
57
-**At-least-once replay semantics**: Replayer uses `exists -> apply -> mark_processed` so failed applies are retried instead of being pre-marked as duplicates.
60
58
-**Fail-fast job progression on unreplayable events**: Workers retry each event up to `MAX_EVENT_RETRIES` times after the initial attempt; if still failing, the job is marked `FAILED` without advancing checkpoint beyond the failed event.
61
59
-**Dual replay source strategy**: Replay router serves recent/small windows from Redis and historical/large windows from Kafka.
62
60
-**Source-pinned resume behavior**: Resumed jobs reuse their original `replay_source` to avoid cross-source checkpoint mismatches (for example Redis stream IDs interpreted as Kafka offsets).
63
61
-**Per-partition Kafka checkpoints**: Kafka progress is stored in JSON (`checkpoint`) keyed by partition, while `last_processed_id` is retained as a fallback string checkpoint.
64
62
-**Safe Kafka partition seek semantics**: On resume, uncheckpointed partitions seek to `start_ms`; if no offset exists at/after `start_ms`, they seek to partition end to avoid replaying out-of-window historical data.
63
+
-**Cross-partition Kafka end-boundary safety**: Replay does not stop globally on the first out-of-range message; partitions are completed independently to avoid dropping in-range records from other partitions.
65
64
-**Lease-based worker ownership**: Job execution uses DB-backed leases plus renewal to prevent concurrent workers from processing the same job.
65
+
-**Lease expiry recovery for orphaned jobs**: Workers can reclaim expired-lease `RUNNING` jobs after crashes, not just `QUEUED` jobs.
66
+
-**Read/write session separation by path**: Auth and health/readiness/event-stream read paths use read sessions; API key `last_used_at` updates are handled with explicit write context.
67
+
-**Control-plane operational gauges**: `walstream_jobs_active` and `walstream_job_lease_expired` are exported for replay stall and lease-integrity alerting.
66
68
-**Async I/O across services**: `aiokafka`, `redis.asyncio`, `grpc.aio`, and async SQLAlchemy are used to keep ingestion and replay non-blocking.
67
-
-**JWT auth for REST and WebSocket**: API routes and WebSocket stream are token-authenticated to keep monitoring and control endpoints private.
69
+
-**Hybrid auth model**: User-facing clients use JWT; service clients can authenticate with `X-API-Key`; health probes remain public.
-[] Stale-cookie session handling UX alignment between middleware guards and API 401 handling
887
-
-[] Degraded readiness/503 UI states for dashboard health cards and overview widgets
888
-
-[]**Dark/light mode toggle** (`ThemeToggle.tsx`) — Tailwind dark mode is configured but no toggle UI
889
-
-[]**Mobile responsive sidebar** — no hamburger menu for small screens
890
-
-[]**Extracted dashboard components** — `OverviewCards`, `RecentJobs`, `HealthStatus`, `QuickActions` are inline in `page.tsx`, not reusable components
891
-
-[]**UI store** (`stores/uiStore.ts`) — sidebar collapse, theme state
0 commit comments