feat(conversations): status state machine + escalation latch (US-067)#29
Merged
Conversation
Enforce the conversation status latch active -> escalated -> resolved at the
DB level on top of US-066's conversations table, so deflection is derivable
without a resolved_by_bot/resolved_by_human state explosion (ADR-0004).
- CHECK constraint status in ('active','escalated','resolved'), added
idempotently so it coexists with whatever US-066 defines (exactly once).
- BEFORE INSERT OR UPDATE trigger conversations_status_guard makes escalated_at
trigger-owned on every write path:
* UPDATE: rejects escalated -> active (no de-escalation) and any
resolved -> * (resolved is terminal); same-status no-ops allowed.
escalated_at is stamped now() on the first escalate and preserved
verbatim thereafter (set-once latch); callers can never move or clear it.
* INSERT: a row born 'escalated' is latched now(); any other birth status
(incl. 'resolved') gets a null latch, ignoring caller-supplied
escalated_at. This closes the INSERT-path gap (US-066 RLS lets any
workspace member INSERT) so the derived deflection metric cannot be
corrupted by a stray create.
Uses create or replace trigger so the migration is uniformly re-runnable.
- Deflection is derivable:
resolved AND escalated_at IS NULL => deflected
resolved AND escalated_at IS NOT NULL => human-handled
- Documented deflection-rate SQL snippet (migration comment + AGENTS.md).
- DB-level integration test covering the UPDATE state machine, the set-once
latch, terminal/backward rejections, and the INSERT-path latch.
The runtime "first escalating turn stops the bot pipeline" behaviour is wired
separately in US-080; this is only the DB-level latch it relies on.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Retrieval eval — PR vs
|
| Mode | recall@5 | MRR | nDCG@5 |
|---|---|---|---|
| vector | 0.860 (±0.000) | 0.772 (±0.000) | 0.779 (±0.000) |
| keyword | 0.110 (±0.000) | 0.120 (±0.000) | 0.112 (±0.000) |
| hybrid | 0.860 (±0.000) | 0.759 (±0.000) | 0.769 (±0.000) |
Per-category recall@5
| Mode | single_chunk | multi_hop | adversarial | paraphrase |
|---|---|---|---|---|
| vector | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
| keyword | 0.250 (±0.000) | 0.033 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
| hybrid | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent
Implement US-067: conversation status state machine + escalation latch + derivable deflection, as a new migration (supabase/migrations/20260623130000_conversation_status_machine.sql) plus a DB-level test, stacked on US-066's conversations table (merged to main as PR #28, migration 20260623120000_init_conversations.sql). Rebased onto current main which also includes PR #27 (escalation/deflection runtime + E7 eval); US-067 is the DB status machine only and does not touch #27's runtime code.
Goal: enforce active -> escalated -> resolved as a one-way latch with escalated_at set exactly once, so deflection is DERIVABLE (resolved AND escalated_at IS NULL = deflected; IS NOT NULL = human-handled) WITHOUT a resolved_by_bot/resolved_by_human state explosion (ADR-0004).
Deliberate decisions and tradeoffs (so they are not mistaken for errors):
Validation done locally on the current-main base: supabase db reset applied all migrations from scratch (including US-066 init_conversations with F1 ON DELETE SET NULL, then this status_machine) cleanly, and the test passes (UPDATE + INSERT paths).
What Changed
20260623130000_conversation_status_machine.sqlthat pinsconversations.statusto('active','escalated','resolved')with an idempotent CHECK (guarded against US-066's own constraint) and installs acreate or replaceBEFORE INSERT OR UPDATEtrigger (conversations_status_guard) enforcing a one-way latch: it rejectsescalated -> activeand any transition out of terminalresolved, while allowing same-status no-ops.escalated_ata trigger-owned set-once latch on both the INSERT and UPDATE paths (stampednow()on the first move intoescalated, preserved verbatim afterward, and forced to null otherwise so a stray member INSERT can't plant a misleading timestamp), making deflection derivable (resolved AND escalated_at IS NULL= bot-handled) without a storedresolved_by_*flag; ships the divide-by-zero-guarded deflection-rate query as a migration comment and in AGENTS.md.backend/test_conversation_status_machine.pyexercising the trigger as a superuser (triggers aren't bypassed), covering the UPDATE/INSERT state machine, set-once preservation against a forced overwrite, de-escalation and terminal-transition rejection, the derivable-deflection corollary, and the deflection-rate snippet, skipping cleanly when the DB/table is absent.Risk Assessment
✅ Low: A single additive, idempotent DB migration plus an isolated integration test — the state-machine and set-once-latch logic traces correctly across all transition paths and is race-safe, with no schema conflicts or behavioral risk; the only finding is trivial dead code in the test.
Testing
Reset the local Supabase DB to apply every migration from scratch (ending with the US-067 status-machine migration), then ran the DB-level asyncpg integration test as the postgres superuser so the BEFORE INSERT OR UPDATE trigger fires exactly as in production - it passed, confirming the set-once escalated_at latch, rejection of de-escalation and resolved-terminal transitions, the status CHECK, the INSERT-path latch, and derivable deflection. I also confirmed migration idempotency by re-applying the file (no duplicate constraint) and captured a reviewer-visible psql transcript of three realistic conversation lifecycles where the DB rejects illegal writes and the documented production deflection-rate query derives 0.50 from persisted state. No UI surface exists for this DB-only migration, so evidence is CLI transcripts plus persisted DB state rather than screenshots. All checks pass with no findings; demo rows were cleaned up and the worktree is clean.
Evidence: Status-machine integration test output (fresh DB)
Evidence: End-to-end deflection scenario transcript (persisted DB state + production query)
Evidence: Demo SQL script driving the scenario through the production trigger/CHECK
Evidence: Derived deflection-rate result (1 bot-only of 2 resolved = 0.5000)
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
backend/test_conversation_status_machine.py:51- Dead code:ROOT = Path(...).parents[1]andsys.path.insert(0, str(ROOT / "backend"))(lines 50-51), plus thefrom pathlib import Path(line 45) andimport sys(line 43) that exist only to support them, are unused — the test imports nothing local frombackend(only asyncio/os/uuid/asyncpg are actually used). This is copied boilerplate from tests that do need a local import; here it can be removed.✅ **Test** - passed
✅ No issues found.
supabase db reset --local- all 36 migrations applied from scratch, ending with 20260623130000_conversation_status_machine.sql (clean, exit 0)python -m backend.test_conversation_status_machine- DB-level asyncpg test passes against the freshly reset DB (set-once latch, de-escalation + terminal rejections as RaiseError, CHECK as CheckViolationError, INSERT-path latch, deflection rate 0.5)Re-applied the migration file viapsql -f supabase/migrations/20260623130000_conversation_status_machine.sqlto confirm idempotency (exit 0; status CHECK constraint count stays exactly 1, no duplicate)Verified live DB objects: conversations_status_check CHECK, conversations_status_guard BEFORE INSERT OR UPDATE trigger, _conversations_status_guard function present; migration 20260623130000 is the latest appliedRan a realistic-scenario psql transcript (3 conversation lifecycles) showing latch-derived deflection, DB rejecting de-escalation/resolved-revival/out-of-set writes with rows intact, and the documented deflection-rate query returning 0.5000Confirmed cleanup: 0 leftover demo rows in public.conversations and a cleangit status --porcelainAGENTS.md:25- AGENTS.md "Conversation status machine + derivable deflection (US-067, ADR-0004)" and the migration's header comment point to ADR-0004, which is not committed under docs/adr/ (only 0001, 0002, 0007 exist). This is pre-existing and systemic - 6 of the 9 referenced ADRs (0003/0004/0005/0006/0008/0009) live only in the PRD task files, and AGENTS.md already referenced ADR-0004 in the prior US-066 section before this change. Authoring ADR-0004 is a deliberate non-goal of this DB-only story (would require fabricating decision-record content), so it is left as-is; surfaced as a human judgment call on whether/when to backfill the deferred ADRs.✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.