Skip to content

feat(conversations): status state machine + escalation latch (US-067)#29

Merged
hcho22 merged 1 commit into
mainfrom
fm/us067-status-m3
Jun 24, 2026
Merged

feat(conversations): status state machine + escalation latch (US-067)#29
hcho22 merged 1 commit into
mainfrom
fm/us067-status-m3

Conversation

@hcho22

@hcho22 hcho22 commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Intent

Implement US-067: conversation status state machine + escalation latch + derivable deflection, as a new migration (supabase/migrations/20260623130000_conversation_status_machine.sql) plus a DB-level test, stacked on US-066's conversations table (merged to main as PR #28, migration 20260623120000_init_conversations.sql). Rebased onto current main which also includes PR #27 (escalation/deflection runtime + E7 eval); US-067 is the DB status machine only and does not touch #27's runtime code.

Goal: enforce active -> escalated -> resolved as a one-way latch with escalated_at set exactly once, so deflection is DERIVABLE (resolved AND escalated_at IS NULL = deflected; IS NOT NULL = human-handled) WITHOUT a resolved_by_bot/resolved_by_human state explosion (ADR-0004).

Deliberate decisions and tradeoffs (so they are not mistaken for errors):

  • status CHECK constraint status in ('active','escalated','resolved') is added IDEMPOTENTLY via a DO block that first checks pg_constraint, because US-066 owns the same table and may also constrain status; the guard ensures the constraint exists exactly once with no duplicate.
  • The transition guard is a BEFORE INSERT OR UPDATE trigger (conversations_status_guard, via create or replace trigger so the migration is uniformly re-runnable), making the DB the source of truth on EVERY write path.
  • escalated_at is trigger-owned on both paths; callers never set it:
  • UPDATE: rejects escalated -> active (no de-escalation) and any resolved -> * (resolved is terminal); same-status no-ops (incl. resolved->resolved) allowed. escalated_at is stamped now() on the first transition into escalated and preserved verbatim on every later write (set-once latch, never moved or cleared). On an un-latched row any caller-supplied escalated_at is forced back to null unless this write is the first escalate.
  • INSERT: a row born 'escalated' is latched now(); any other birth status (including 'resolved') gets a null latch, ignoring any caller-supplied escalated_at. This is INTENTIONAL and closes a real gap (US-066 RLS lets any workspace member INSERT a conversation): without it a stray INSERT could plant status='escalated' with null escalated_at (later miscounted as deflected) or status='resolved' with a stray timestamp (miscounted as human-handled), corrupting the derived deflection metric. Forcing escalated_at on the create path is therefore correct hardening, not a bug.
  • The function is plpgsql with search_path pinned to public, pg_temp; it is NOT security definer (it only mutates NEW and raises), matching least-privilege.
  • A documented deflection-rate SQL snippet (divide-by-zero guarded with nullif) is included both as a migration comment and in AGENTS.md.
  • Scope discipline: ONLY the DB-level latch is implemented. The runtime 'first escalating turn stops the bot pipeline' behaviour is US-080 and is intentionally NOT wired here. ADR-0004/0003 are referenced by the PRD but not yet committed under docs/adr/ (pre-existing, systemic across the repo); US-067 follows the AC verbatim and does not fabricate an ADR.
  • Test (backend/test_conversation_status_machine.py) is a DB-level asyncpg integration test mirroring the PRD US-067 Validation Test. It connects as the postgres superuser on purpose because triggers (unlike RLS) are not bypassed by superusers, so the guard is exercised exactly as in production. It asserts specific exception classes (RaiseError for trigger guards, CheckViolationError for the constraint) and verifies the UPDATE state machine, set-once preservation against a forced overwrite, de-escalation rejection, terminal-transition rejection, the derivable-deflection corollary (direct active->resolved keeps escalated_at null), the INSERT-path latch (born escalated stamps; born active/resolved and stray-timestamp inserts forced null), and the deflection-rate snippet. It skips cleanly when the DB/table is absent, matching test_permissions.py / test_au4_auth_attacks.py.
  • AGENTS.md keeps BOTH US-066's 'two trust models' section and this US-067 status-machine section.

Validation done locally on the current-main base: supabase db reset applied all migrations from scratch (including US-066 init_conversations with F1 ON DELETE SET NULL, then this status_machine) cleanly, and the test passes (UPDATE + INSERT paths).

What Changed

  • Add migration 20260623130000_conversation_status_machine.sql that pins conversations.status to ('active','escalated','resolved') with an idempotent CHECK (guarded against US-066's own constraint) and installs a create or replace BEFORE INSERT OR UPDATE trigger (conversations_status_guard) enforcing a one-way latch: it rejects escalated -> active and any transition out of terminal resolved, while allowing same-status no-ops.
  • Make escalated_at a trigger-owned set-once latch on both the INSERT and UPDATE paths (stamped now() on the first move into escalated, preserved verbatim afterward, and forced to null otherwise so a stray member INSERT can't plant a misleading timestamp), making deflection derivable (resolved AND escalated_at IS NULL = bot-handled) without a stored resolved_by_* flag; ships the divide-by-zero-guarded deflection-rate query as a migration comment and in AGENTS.md.
  • Add DB-level asyncpg integration test backend/test_conversation_status_machine.py exercising the trigger as a superuser (triggers aren't bypassed), covering the UPDATE/INSERT state machine, set-once preservation against a forced overwrite, de-escalation and terminal-transition rejection, the derivable-deflection corollary, and the deflection-rate snippet, skipping cleanly when the DB/table is absent.

Risk Assessment

✅ Low: A single additive, idempotent DB migration plus an isolated integration test — the state-machine and set-once-latch logic traces correctly across all transition paths and is race-safe, with no schema conflicts or behavioral risk; the only finding is trivial dead code in the test.

Testing

Reset the local Supabase DB to apply every migration from scratch (ending with the US-067 status-machine migration), then ran the DB-level asyncpg integration test as the postgres superuser so the BEFORE INSERT OR UPDATE trigger fires exactly as in production - it passed, confirming the set-once escalated_at latch, rejection of de-escalation and resolved-terminal transitions, the status CHECK, the INSERT-path latch, and derivable deflection. I also confirmed migration idempotency by re-applying the file (no duplicate constraint) and captured a reviewer-visible psql transcript of three realistic conversation lifecycles where the DB rejects illegal writes and the documented production deflection-rate query derives 0.50 from persisted state. No UI surface exists for this DB-only migration, so evidence is CLI transcripts plus persisted DB state rather than screenshots. All checks pass with no findings; demo rows were cleaned up and the worktree is clean.

Evidence: Status-machine integration test output (fresh DB)
active -> escalated latched escalated_at = 2026-06-24 03:59:15.494132+00:00
second escalate write did NOT move escalated_at (set-once latch holds)
rejected as expected: escalated -> active (RaiseError)
rejected as expected: resolved -> escalated (RaiseError)
rejected as expected: resolved -> active (RaiseError)
derived deflection rate over test rows = 0.50000000000000000000
INSERT-path latch holds (escalated stamps, active/resolved forced null)
rejected as expected: status = 'bogus' (CHECK constraint) (CheckViolationError)
OK: status machine verified - active->escalated->resolved latch, set-once escalated_at on INSERT + UPDATE, de-escalation + terminal transitions rejected, deflection derivable, status CHECK enforced
Evidence: End-to-end deflection scenario transcript (persisted DB state + production query)
Border style is 2.
INSERT 0 1
INSERT 0 1
INSERT 0 1
=== STEP 1: three conversations born active, escalated_at all NULL (trigger owns the latch) ===
+-----------------------------+--------+--------------+
|       customer_email        | status | escalated_at |
+-----------------------------+--------+--------------+
| us067-demo:faq@acme.test    | active |              |
| us067-demo:open@acme.test   | active |              |
| us067-demo:refund@acme.test | active |              |
+-----------------------------+--------+--------------+
(3 rows)

UPDATE 1
UPDATE 1
UPDATE 1

=== STEP 2: after lifecycle writes - C1 deflected (latch NULL), C2 human-handled (latch set), C3 still open ===
+-----------------------------+----------+----------------+-------------------------------+
|       customer_email        |  status  | ever_escalated |         escalated_at          |
+-----------------------------+----------+----------------+-------------------------------+
| us067-demo:faq@acme.test    | resolved | f              |                               |
| us067-demo:open@acme.test   | active   | f              |                               |
| us067-demo:refund@acme.test | resolved | t              | 2026-06-24 04:00:14.254895+00 |
+-----------------------------+----------+----------------+-------------------------------+
(3 rows)


=== STEP 3: DB rejects illegal writes from ANY caller (these RAISE, row is unchanged) ===
-- (a) de-escalation escalated->active on a still-open escalated conv:
UPDATE 1
-- attempting escalated -> active ...
psql:/var/folders/9t/k_yy9fqs5vd27rf12jx_rzqh0000gn/T/no-mistakes-evidence/01KVVW5MFD7X5CGQEQ4ERX28EM/deflection_demo.sql:46: ERROR:  illegal conversation status transition: escalated -> active (no de-escalation)
CONTEXT:  PL/pgSQL function _conversations_status_guard() line 32 at RAISE
-- (b) reviving a resolved conversation resolved->escalated:
psql:/var/folders/9t/k_yy9fqs5vd27rf12jx_rzqh0000gn/T/no-mistakes-evidence/01KVVW5MFD7X5CGQEQ4ERX28EM/deflection_demo.sql:48: ERROR:  illegal conversation status transition: resolved -> escalated (resolved is terminal)
CONTEXT:  PL/pgSQL function _conversations_status_guard() line 37 at RAISE
-- (c) an out-of-set status value:
psql:/var/folders/9t/k_yy9fqs5vd27rf12jx_rzqh0000gn/T/no-mistakes-evidence/01KVVW5MFD7X5CGQEQ4ERX28EM/deflection_demo.sql:50: ERROR:  new row for relation "conversations" violates check constraint "conversations_status_check"
DETAIL:  Failing row contains (aaaa1111-0000-0000-0000-000000000003, 00000000-0000-0000-0000-0000000000d0, null, us067-demo:open@acme.test, spam, 2026-06-24 04:00:14.258336+00, widget, null, null, 2026-06-24 04:00:14.251087+00).

=== STEP 4: rows survived every rejected write intact ===
+-----------------------------+-----------+----------------+
|       customer_email        |  status   | ever_escalated |
+-----------------------------+-----------+----------------+
| us067-demo:faq@acme.test    | resolved  | f              |
| us067-demo:open@acme.test   | escalated | t              |
| us067-demo:refund@acme.test | resolved  | t              |
+-----------------------------+-----------+----------------+
(3 rows)


=== STEP 5: the production DERIVED deflection-rate query (1 of 2 resolved was bot-only = 0.50) ===
+------------------+---------------+----------------+-----------------+
| deflected_by_bot | human_handled | resolved_total | deflection_rate |
+------------------+---------------+----------------+-----------------+
|                1 |             1 |              2 |          0.5000 |
+------------------+---------------+----------------+-----------------+
(1 row)

DELETE 3

=== cleaned up demo rows ===
Evidence: Demo SQL script driving the scenario through the production trigger/CHECK
-- US-067 end-to-end demo: status latch + DERIVABLE deflection over a realistic
-- support queue. Every write below goes through the production trigger/CHECK
-- installed by 20260623130000_conversation_status_machine.sql. Marker emails let
-- us clean up afterwards. WS = Default Workspace.
\set ws '00000000-0000-0000-0000-0000000000d0'
\set ECHO none
\pset border 2

-- Three real conversations, three real support outcomes -----------------------
-- C1: customer asks a FAQ, the bot answers, customer satisfied -> resolved.
insert into public.conversations (id, workspace_id, customer_email, status)
  values ('aaaa1111-0000-0000-0000-000000000001', :'ws', 'us067-demo:faq@acme.test', 'active');
-- C2: bot can't help, escalates to a human, agent resolves it.
insert into public.conversations (id, workspace_id, customer_email, status)
  values ('aaaa1111-0000-0000-0000-000000000002', :'ws', 'us067-demo:refund@acme.test', 'active');
-- C3: still being handled by the bot right now (open).
insert into public.conversations (id, workspace_id, customer_email, status)
  values ('aaaa1111-0000-0000-0000-000000000003', :'ws', 'us067-demo:open@acme.test', 'active');

\echo '=== STEP 1: three conversations born active, escalated_at all NULL (trigger owns the latch) ==='
select customer_email, status, escalated_at
  from public.conversations where customer_email like 'us067-demo:%' order by customer_email;

-- C1: bot deflects -> resolve directly (never escalated).
update public.conversations set status='resolved'
  where id='aaaa1111-0000-0000-0000-000000000001';
-- C2: escalate (latch stamps now()) ...
update public.conversations set status='escalated'
  where id='aaaa1111-0000-0000-0000-000000000002';
-- ... then human resolves it. Latch must survive the escalated->resolved hop.
update public.conversations set status='resolved'
  where id='aaaa1111-0000-0000-0000-000000000002';

\echo ''
\echo '=== STEP 2: after lifecycle writes - C1 deflected (latch NULL), C2 human-handled (latch set), C3 still open ==='
select customer_email, status,
       (escalated_at is not null) as ever_escalated, escalated_at
  from public.conversations where customer_email like 'us067-demo:%' order by customer_email;

\echo ''
\echo '=== STEP 3: DB rejects illegal writes from ANY caller (these RAISE, row is unchanged) ==='
\echo '-- (a) de-escalation escalated->active on a still-open escalated conv:'
update public.conversations set status='escalated' where id='aaaa1111-0000-0000-0000-000000000003';
\set ON_ERROR_STOP off
\echo '-- attempting escalated -> active ...'
update public.conversations set status='active' where id='aaaa1111-0000-0000-0000-000000000003';
\echo '-- (b) reviving a resolved conversation resolved->escalated:'
update public.conversations set status='escalated' where id='aaaa1111-0000-0000-0000-000000000002';
\echo '-- (c) an out-of-set status value:'
update public.conversations set status='spam' where id='aaaa1111-0000-0000-0000-000000000003';
\set ON_ERROR_STOP on

\echo ''
\echo '=== STEP 4: rows survived every rejected write intact ==='
select customer_email, status, (escalated_at is not null) as ever_escalated
  from public.conversations where customer_email like 'us067-demo:%' order by customer_email;

\echo ''
\echo '=== STEP 5: the production DERIVED deflection-rate query (1 of 2 resolved was bot-only = 0.50) ==='
select
  count(*) filter (where status='resolved' and escalated_at is null)        as deflected_by_bot,
  count(*) filter (where status='resolved' and escalated_at is not null)    as human_handled,
  count(*) filter (where status='resolved')                                 as resolved_total,
  round(
    count(*) filter (where status='resolved' and escalated_at is null)::numeric
    / nullif(count(*) filter (where status='resolved'), 0), 4)              as deflection_rate
  from public.conversations where customer_email like 'us067-demo:%';

-- cleanup
delete from public.conversations where customer_email like 'us067-demo:%';
\echo ''
\echo '=== cleaned up demo rows ==='
Evidence: Derived deflection-rate result (1 bot-only of 2 resolved = 0.5000)
+------------------+---------------+----------------+-----------------+
| deflected_by_bot | human_handled | resolved_total | deflection_rate |
+------------------+---------------+----------------+-----------------+
| 1 | 1 | 2 | 0.5000 |
+------------------+---------------+----------------+-----------------+

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

⚠️ **Review** - 1 info
  • ℹ️ backend/test_conversation_status_machine.py:51 - Dead code: ROOT = Path(...).parents[1] and sys.path.insert(0, str(ROOT / "backend")) (lines 50-51), plus the from pathlib import Path (line 45) and import sys (line 43) that exist only to support them, are unused — the test imports nothing local from backend (only asyncio/os/uuid/asyncpg are actually used). This is copied boilerplate from tests that do need a local import; here it can be removed.
✅ **Test** - passed

✅ No issues found.

  • supabase db reset --local - all 36 migrations applied from scratch, ending with 20260623130000_conversation_status_machine.sql (clean, exit 0)
  • python -m backend.test_conversation_status_machine - DB-level asyncpg test passes against the freshly reset DB (set-once latch, de-escalation + terminal rejections as RaiseError, CHECK as CheckViolationError, INSERT-path latch, deflection rate 0.5)
  • Re-applied the migration file via psql -f supabase/migrations/20260623130000_conversation_status_machine.sql to confirm idempotency (exit 0; status CHECK constraint count stays exactly 1, no duplicate)
  • Verified live DB objects: conversations_status_check CHECK, conversations_status_guard BEFORE INSERT OR UPDATE trigger, _conversations_status_guard function present; migration 20260623130000 is the latest applied
  • Ran a realistic-scenario psql transcript (3 conversation lifecycles) showing latch-derived deflection, DB rejecting de-escalation/resolved-revival/out-of-set writes with rows intact, and the documented deflection-rate query returning 0.5000
  • Confirmed cleanup: 0 leftover demo rows in public.conversations and a clean git status --porcelain
⚠️ **Document** - 1 info
  • ℹ️ AGENTS.md:25 - AGENTS.md "Conversation status machine + derivable deflection (US-067, ADR-0004)" and the migration's header comment point to ADR-0004, which is not committed under docs/adr/ (only 0001, 0002, 0007 exist). This is pre-existing and systemic - 6 of the 9 referenced ADRs (0003/0004/0005/0006/0008/0009) live only in the PRD task files, and AGENTS.md already referenced ADR-0004 in the prior US-066 section before this change. Authoring ADR-0004 is a deliberate non-goal of this DB-only story (would require fabricating decision-record content), so it is left as-is; surfaced as a human judgment call on whether/when to backfill the deferred ADRs.
✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

Enforce the conversation status latch active -> escalated -> resolved at the
DB level on top of US-066's conversations table, so deflection is derivable
without a resolved_by_bot/resolved_by_human state explosion (ADR-0004).

- CHECK constraint status in ('active','escalated','resolved'), added
  idempotently so it coexists with whatever US-066 defines (exactly once).
- BEFORE INSERT OR UPDATE trigger conversations_status_guard makes escalated_at
  trigger-owned on every write path:
    * UPDATE: rejects escalated -> active (no de-escalation) and any
      resolved -> * (resolved is terminal); same-status no-ops allowed.
      escalated_at is stamped now() on the first escalate and preserved
      verbatim thereafter (set-once latch); callers can never move or clear it.
    * INSERT: a row born 'escalated' is latched now(); any other birth status
      (incl. 'resolved') gets a null latch, ignoring caller-supplied
      escalated_at. This closes the INSERT-path gap (US-066 RLS lets any
      workspace member INSERT) so the derived deflection metric cannot be
      corrupted by a stray create.
  Uses create or replace trigger so the migration is uniformly re-runnable.
- Deflection is derivable:
    resolved AND escalated_at IS NULL     => deflected
    resolved AND escalated_at IS NOT NULL => human-handled
- Documented deflection-rate SQL snippet (migration comment + AGENTS.md).
- DB-level integration test covering the UPDATE state machine, the set-once
  latch, terminal/backward rejections, and the INSERT-path latch.

The runtime "first escalating turn stops the bot pipeline" behaviour is wired
separately in US-080; this is only the DB-level latch it relies on.
@vercel

vercel Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentic-rag Ready Ready Preview, Comment Jun 24, 2026 4:08am

@github-actions

Copy link
Copy Markdown
Contributor

Retrieval eval — PR vs main

n = 50 questions × 3 modes (vector, keyword, hybrid) on a 14-chunk corpus. PR ran in 70.07s; main in 68.7s.

Headline (each cell: PR value, Δ vs main)

Mode recall@5 MRR nDCG@5
vector 0.860 (±0.000) 0.772 (±0.000) 0.779 (±0.000)
keyword 0.110 (±0.000) 0.120 (±0.000) 0.112 (±0.000)
hybrid 0.860 (±0.000) 0.759 (±0.000) 0.769 (±0.000)

Per-category recall@5

Mode single_chunk multi_hop adversarial paraphrase
vector 0.900 (±0.000) 0.933 (±0.000) 0.600 (±0.000) 1.000 (±0.000)
keyword 0.250 (±0.000) 0.033 (±0.000) 0.000 (±0.000) 0.000 (±0.000)
hybrid 0.900 (±0.000) 0.933 (±0.000) 0.600 (±0.000) 1.000 (±0.000)

Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.

@hcho22 hcho22 merged commit 32428dd into main Jun 24, 2026
3 checks passed
@hcho22 hcho22 deleted the fm/us067-status-m3 branch June 24, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant