feat(support): support-bot per-turn retrieval seam + non-security workspace filter (US-070)#33
Merged
Merged
Conversation
…ty workspace filter (US-070) Each customer turn the deflection pipeline now retrieves AS the per-workspace support bot, not as a privileged reader, so the bot answers only from share-to-bot documents (ADR-0008; no new content role, no new enforcement path). - backend/support_bot.py (new): run_bot_deflection_turn mints a ~60s role=authenticated bot JWT (sub=bot_user_id) via an injected minter (US-068 mint_supabase_jwt, dependency-injected to avoid a main.py import cycle and keep the module pure/testable), runs the ADR-0003 deflection pipeline with that JWT in the Supabase headers so match_chunks/keyword_search resolve auth.uid() to the bot, then discards the token (no cross-turn cache). The bearer is confined to the Supabase headers and never reaches an SSE/response/log surface; only DeflectionResult.customer_message is client-safe. - migrations 20260624150000 / 20260624150100: add filter_workspace_id uuid default null to match_chunks / keyword_search as an ORDINARY non-security narrowing filter (the extension point the US-007 note + documents_workspace_id_idx reserved). AND-ed beside filter_topics, so it can only subtract within what the auth.uid()-resolved membership clause already allows; null is a no-op, so /api/chat and E4/E6 are byte-identical. The trust boundary stays the membership EXISTS clause, never this param. Applied to both hybrid legs for coherence. - retrieval.py / escalation.run_deflection_pipeline: thread an optional workspace_id=None through search_documents/keyword_search/keyword_only_search/ hybrid_search and the pipeline; default None leaves every existing caller unchanged. Tests: test_us070_bot_retrieval.py (no DB/LLM/secret) pins mint-per-turn, bot bearer + filter on both legs, no caching, token absent from result/error. test_us070_bot_retrieval_integration.py (live Postgres+PostgREST under a self-minted bot JWT) pins the PRD validation: bot retrieves only the share-to-bot doc and 0 rows from a non-shared doc; filter_workspace_id narrows non-securely. AGENTS.md documents the seam and the sharp edge.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Retrieval eval — PR vs
|
| Mode | recall@5 | MRR | nDCG@5 |
|---|---|---|---|
| vector | 0.860 (±0.000) | 0.772 (±0.000) | 0.779 (±0.000) |
| keyword | 0.110 (±0.000) | 0.120 (±0.000) | 0.112 (±0.000) |
| hybrid | 0.860 (±0.000) | 0.759 (±0.000) | 0.769 (±0.000) |
Per-category recall@5
| Mode | single_chunk | multi_hop | adversarial | paraphrase |
|---|---|---|---|---|
| vector | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
| keyword | 0.250 (±0.000) | 0.033 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
| hybrid | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent
Implement US-070 (Phase-2 PRD): the support-bot per-turn retrieval seam, now rebased onto main after US-068 (#31) and US-069 (#32) landed. Each customer turn mints a ~60s Supabase-compatible bot JWT (sub=bot_user_id, via US-068 mint_supabase_jwt), calls match_chunks AS the bot, then discards the token (no cross-turn cache), so the bot answers only from chunk_acl share-to-bot docs. Key points a reviewer should know: (1) US-069's PR #32 created backend/support_bot.py for bot PROVISIONING (provision_workspace_bot); US-070 adds the RETRIEVAL seam (run_bot_deflection_turn). During rebase I intentionally MERGED both into one support_bot.py module under a unified docstring - they are two phases of one support-bot lifecycle meeting at conversations.bot_user_id - preserving both sides' code verbatim. (2) The US-068 minter is DEPENDENCY-INJECTED (MintToken callable) into the retrieval path to avoid a main.py import cycle and keep it pure/testable; the endpoint that wires it is US-080. (3) filter_workspace_id is added to BOTH match_chunks and keyword_search as an ORDINARY NON-SECURITY narrowing filter (null default = no-op; trust boundary stays the auth.uid() workspace_membership clause, untouched) because hybrid fuses both legs; E4/E6/permissions stay byte-identical. (4) An earlier review correctly flagged a no-op bearer-scrub in a finally block; I removed it and the comment now states the real structural no-cache guarantee honestly. Tests: unit (no DB/secret), live RLS integration (bot sees only the shared doc, 0 from non-shared; filter narrows non-securely), and US-069's provisioning test all pass. The bot bearer token must never reach an SSE/response/log surface; only DeflectionResult.customer_message is client-safe.
What Changed
run_bot_deflection_turntobackend/support_bot.py(merged with US-069'sprovision_workspace_botinto one support-bot lifecycle module): each customer turn mints a ~60ssub=bot_user_idSupabase JWT via a dependency-injectedmint_supabase_jwt(avoids amain.pyimport cycle), runs the deflection pipeline as the bot principal, then discards the token with no cross-turn cache, so the bot answers only fromchunk_aclshare-to-bot documents;escalation.py/retrieval.pyare updated to plumbworkspace_idthrough.filter_workspace_id uuid default nullto bothmatch_chunksandkeyword_search(migrations20260624150000/20260624150100) as an ordinary non-security narrowing filter AND-ed besidefilter_topicson both hybrid legs;nullis a no-op so/api/chatand the E4/E6 paths stay byte-identical, and the trust boundary remains theauth.uid()workspace_membershipclause.AGENTS.mdand aCONTEXT.mdglossary sync; noterun_bot_deflection_turnhas no production caller yet (the wiring endpoint is US-080).Risk Assessment
✅ Low: The change is well-bounded and backward-compatible: migration bodies are byte-identical to their predecessors except an optional null-default narrowing param AND-ed under the untouched membership boundary, every new Python param defaults to None (no-op for existing callers), and the new bot-retrieval seam is covered by both unit and live RLS integration tests with no bearer-leak path.
Testing
Ran the US-070 unit seam test (6 groups, no DB/secret) and the live RLS integration test (5 checks, the PRD US-070 validation) against the running local Supabase with the migrations confirmed applied; both pass. Verified no regression in the touched paths via the deflection-pipeline, US-069 provisioning, and permissions/RLS suites. As reviewer-visible evidence I captured the actual raw PostgREST match_chunks/keyword_search responses under a self-minted bot JWT: the bot retrieves only the share-to-bot doc D (non-shared E and other-workspace F return zero rows), the owner positive control sees D+E (proving the data is retrievable), and filter_workspace_id only subtracts within membership - demonstrating the auth.uid() membership clause stays the trust boundary. This is backend security work with no UI surface, so the evidence is API-response transcripts rather than screenshots. Overall: all green, no findings.
Evidence: Raw PostgREST RPC responses (bot vs owner) proving the US-070 retrieval boundary
[2] match_chunks AS BOT B, filter_workspace_id=W <<< PRD US-070 VALIDATION HTTP 200 -> ['D (in W, SHARED-to-bot via chunk_acl)'] raw rows: [{"document_id": "f1c9c0dc-...", "content": "shared answer about returns policy", "granting_principal_display": "bot-a441a6a8@test.local"}] => bot retrieves ONLY the share-to-bot doc D; NON-shared E -> 0 rows (zero-leak) [4] match_chunks AS OWNER U — filter_workspace_id is NON-SECURITY narrowing no filter -> {D,E,F}; filter=W -> {D,E}; filter=W2 -> {F}Evidence: Evidence capture script (reusable; drives real PostgREST RPCs as bot/owner)
Evidence: Live integration test transcript (PRD validation, 5 checks)
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
backend/support_bot.py:372- run_bot_deflection_turn (and the workspace_id plumbing into run_deflection_pipeline) has no production caller yet — only the two new tests exercise it. This is intentional per the US-070 scope (the endpoint that wires it is US-080), so it is not dead code to remove; flagging only so it isn't mistaken for an oversight during review.✅ **Test** - passed
✅ No issues found.
python -m backend.test_us070_bot_retrieval(unit, no DB/secret): per-turn mint sub=bot_user_id+60s TTL is the Bearer on both legs, filter_workspace_id on both legs, no cross-turn token cache, bot token absent from DeflectionResult and from propagated errors, no-visible-chunks => generic-deferral escalatepython -m backend.test_us070_bot_retrieval_integrationagainst live Supabase (DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:54322/postgres, SUPABASE_URL=http://127.0.0.1:54321): bot sees EXACTLY {D} (share-to-bot), 0 from non-shared E and other-workspace F; owner positive control sees D+E; filter_workspace_id narrows non-securely on both vector and keyword legsConfirmed US-070 migrations applied live:match_chunks/keyword_searchboth carryfilter_workspace_id uuid(queried pg_proc identity args)Regression checks (no boundary/byte-identical change):python -m backend.test_deflection_pipeline,python -m backend.test_us069_bot_provisioning(module-merge intact),python -m backend.test_permissionsEvidence capture script driving the real PostgREST RPCs as the bot vs owner principals, writing raw HTTP responses to the evidence dir✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.