Agentic RAG

A production-shaped Retrieval-Augmented Generation app where per-document sharing is a first-class part of the retrieval predicate, not a post-hoc filter. Multi-user from day one — every chunk carries an ACL, every retrieval call runs under the viewer's JWT, every tool-call attribution in the chat UI surfaces why the viewer can see a chunk.

Raw OpenAI SDK + Pydantic (no LLM frameworks), FastAPI backend, React/Vite/Tailwind frontend, Supabase (Postgres + pgvector + Auth + Storage + Realtime), LangSmith observability.

Tool-call attribution renders a per-chunk badge — "via owner" / "via direct grant" / "via {group}" — so the viewer can see exactly which ACL rule granted them access to each retrieved chunk.

The permissions story, in numbers

The retrieval path is evaluated in two cuts: a correctness eval that proves the security property holds at small scale, and a scale benchmark that characterises the recall curve as the visible set shrinks.

Security — fraction of no_access runs that returned zero gold chunks (50 questions × 3 modes × 3 viewer setups, 14-chunk Acme corpus):

Mode	Pre-filter	Post-filter
vector	1.000	1.000
keyword	1.000	1.000
hybrid	1.000	1.000

Pre-filter is the load-bearing row — security is enforced in the SQL predicate, not a Python drop after the fact (post-filter passes too but could in principle leak via timing or payload size).

Recall@5 across viewers, ef_search × selectivity sweep (15 multi-hop queries against a synthetic Wikipedia 10k-chunk corpus, gold = top-5 at the most exhaustive sweep):

Viewer	Visible chunks	Selectivity	ef_search=40	ef_search=80	ef_search=200	ef_search=500 (gold)
viewer_50pct	5,000	50.0%	1.000	1.000	1.000	1.000
viewer_10pct	1,000	10.0%	1.000	1.000	1.000	1.000
viewer_1pct	100	1.0%	1.000	1.000	1.000	1.000

Every cell is 1.000 because at 10k chunks the Postgres planner sidesteps HNSW entirely — it bitmap-scans chunk_acl, index-scans the visible chunks, sorts exactly by embedding distance, and takes top-5. EXPLAIN ANALYZE confirms; ef_search is a no-op in that plan. The eval infrastructure (10k seed, viewer ACL setup, sweep, regression alarm) is shipped; the recall curve surfaces at the corpus size where exact NN over the filtered set becomes more expensive than HNSW + post-filter (tens to hundreds of thousands of visible chunks per query). The nightly workflow fails loudly if the configured recall floor is breached. See docs/permissions-aware-rag.md §5b for the full plan output.

Why this is hard

The naive approach to per-document sharing in a RAG retriever is to leave the vector search alone and post-filter the results: pull top-k chunks by similarity, then drop the ones the viewer can't see. This fails on selective ACLs in a way that's easy to miss. The math: if a viewer can see 5% of the corpus and we ask for top-10, the expected number of visible chunks in that result is k × selectivity = 10 × 0.05 = 0.5 — half a chunk on average. The viewer most often sees zero relevant chunks; multi-hop questions that need two chunks become unanswerable. "Fetch more candidates and post-filter harder" doesn't rescue it — at 5% selectivity you'd need top-100 to expect five visible chunks, and post-filtering top-100 means embedding distance is no longer ranking the visible chunks against each other. The fix is to push the ACL check into the SQL predicate so the planner is choosing among visible candidates from the start — which then opens a second gotcha around HNSW behaviour under selective filters. The full write-up is in docs/permissions-aware-rag.md.

What else is in the box

Chat with streaming — OpenAI Responses or Chat Completions API, configurable per-request, streamed token-by-token to the UI. Tool calls and results persist alongside messages.
Drag-and-drop ingestion — .txt / .md / .pdf / .docx / .html parsed via docling, chunked, embedded, indexed. Live status updates via Supabase Realtime. Document-level metadata (title, authors, topics, dates) extracted via LLM structured outputs.
Hybrid retrieval — vector (pgvector HNSW) + keyword (Postgres full-text) fused via Reciprocal Rank Fusion. Optional reranker layer: Cohere, Voyage, or LLM-as-judge. All retrieval runs under user JWT — RLS enforces per-user visibility.
Per-document sharing — share documents with individual users or groups via the per-chunk ACL system. Share dialog in the ingestion UI. Per-chunk badges in chat tool attribution show why the viewer can see each chunk.
Workspace tenant isolation — a hard tenant boundary above per-document sharing: a chunk is visible only if the viewer is a member of its document's workspace, AND-ed into the same SECURITY INVOKER retrieval predicate (resolved from the viewer's JWT, never a backend-passed tenant id) and mirrored in the table RLS. Existing data lives in one operator-managed Default Workspace; the boundary bites once a second workspace exists. See docs/adr/0002-workspace-tenant-isolation.md.
Structured RAG (text-to-SQL) — query_database tool over an allowlisted read-only schema, with a semantic-layer-aware compiler so the LLM doesn't have to know table internals.
Web search fallback — web_search tool when local retrieval is insufficient.
Sub-agents — spawn_document_agent launches a sub-agent with isolated context and purpose-specific tools.
Retrieval eval suite — 50-question golden set, runner that exercises vector / keyword / hybrid against the real backend functions, recall@k / MRR / nDCG@5 metrics, optional generation + LLM-judge step. PR CI posts a delta-vs-main comment; nightly publishes snapshots to docs/nightly/.
RAGAS metrics — the four canonical RAG-eval scores (Faithfulness, Answer Relevancy, Context Precision, Context Recall) computed weekly alongside the custom Claude judge and published to docs/ragas-weekly/.
Permissions scale benchmark — Wikipedia 10k synthetic corpus, ef_search sweep across three permission selectivities, nightly workflow with regression alarm.

Documentation

Long-form writeups for the parts of the system that benefit from prose explanation — the kind of context a code review won't recover:

Doc	What it covers
`docs/permissions-aware-rag.md`	The post-filter recall problem, the four-table data model, the SQL predicate, the HNSW interaction, the eval tables, deliberate v0 scope cuts (group nesting, write-vs-read tiers).
`docs/adr/0002-workspace-tenant-isolation.md`	Phase 2 — the Workspace tenant boundary layered above owner-OR-ACL: where the boundary is enforced (membership clause inside the retrieval predicate, never a backend-passed tenant id), how existing data migrates into a Default Workspace, the alternatives rejected, and the Identity Boundary (AU3) — what an integrator may swap in the auth stack (federation-edge only) versus the welded Supabase-JWT pass-through floor.
`docs/evals.md`	Corpus design, the 50-question golden set, what each metric measures and what it doesn't, a worked example of CI catching a regression (Δ -0.510 on `recall@5` from a one-line chunk-size change), a frank list of the eval's limitations, and the E7 escalation eval (§6) - the deflection-pipeline golden set, why its deterministic legs gate per-PR while the LLM-judged legs run weekly, and the false-resolve ceiling as a pinned safety invariant.
`docs/structured-rag.md`	The semantic-layer-aware text-to-SQL compiler, allowlisted schemas, the read-only role boundary.
`docs/ingestion-parser-adapters.md`	Write your own `DocumentParser` — the load-bearing markdown-string contract, the edits to add one (subclass + `PARSER` validation + `build_parser`), `PARSER` selection, proving the round-trip, and Unstructured.io as the canonical buyer-written adapter.

The eval tables in docs/permissions-aware-rag.md are auto-embedded from the runner-generated summary.md files via marker comments:

python -m evals.retrieval.runner          # populates evals/retrieval/summary.md
python -m evals.permissions_scale.runner  # populates evals/permissions_scale/summary.md (after wikipedia_seed)
python -m docs._embed_eval_summaries      # injects into docs/permissions-aware-rag.md

Repository layout

backend/                FastAPI service (Dockerfile, railway.toml, fly.toml)
frontend/               React + Vite + Tailwind (vercel.json)
supabase/               Migrations + local CLI config
evals/retrieval/        50-question golden set + E7 escalation golden set + runners + CI workflow integration
evals/permissions_scale/ Wikipedia 10k corpus benchmark + nightly workflow
evals/structured_rag/   Text-to-SQL eval
db_seed/                Deterministic seeders for the eval corpora
docs/                   Long-form writeups (evals, structured RAG, permissions-aware RAG)
.github/workflows/      PR + nightly eval workflows
.claude/                Agent task specs (not needed to run the app)

Local development

Prerequisites: Node 20+, Python 3.11+, Docker Desktop (for local Supabase), Supabase CLI, OpenAI API key.

# 1. Start the local Supabase stack (Postgres + pgvector + GoTrue + Storage + Studio)
#    Brings up Docker containers and applies all migrations in supabase/migrations/.
supabase start
supabase status                # note API_URL, SERVICE_ROLE_KEY, DB_URL for env files

# 2. Backend
cd backend
cp .env.example .env           # fill in the values below
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

# 3. Frontend
cd ../frontend
cp .env.example .env           # fill in VITE_SUPABASE_* + VITE_BACKEND_URL
npm install
npm run dev                    # http://localhost:5173

To run against hosted Supabase instead of local, push migrations with supabase db push --linked and point SUPABASE_URL / VITE_SUPABASE_URL at the hosted project URL — no other code changes.

Environment variables

Backend (`backend/.env`)

Var	Required	Notes
`SUPABASE_URL`	yes	`https://<project>.supabase.co` (hosted) or `http://127.0.0.1:54321` (local)
`SUPABASE_ANON_KEY`	yes	Used to call GoTrue for JWT validation
`SUPABASE_SERVICE_ROLE_KEY`	yes	Reserved for system-level ops (share API owner-lookup, ingestion, support-bot provisioning via the GoTrue admin API - US-069, `backend/support_bot.py`, the backend-mediated conversation-token surface - issuance + the `resume_conversation` RPC, US-071, `backend/conversation_tokens.py`, and the anonymous public widget-key resolution gate - US-072, `backend/widget_keys.py`); never used to touch user data on the retrieval path (RLS enforced via user JWT). The public widget endpoints fail closed with a 503 when it is unset
`SUPABASE_JWT_SECRET`	only for support bot	The project JWT secret GoTrue signs with. The support bot self-signs its short-lived bot token with it so `auth.uid()`/RLS resolve it natively (US-068, `backend/supabase_jwt.py`); a knowledge-assistant-only deploy leaves it blank. NEW signing surface - keep server-side only, never embed client-side
`SUPPORT_BOT_EMAIL_DOMAIN`	no	Internal, non-routable email domain for the per-workspace support bot's `auth.users` row (US-069, `backend/support_bot.py`). Default `bots.support.internal`. The bot row is admin-created with `email_confirm=true` and no password, so the address never logs in or receives mail
`OPENAI_API_KEY`	yes
`OPENAI_MODEL`	no	Default `gpt-4o-mini`
`OPENAI_VECTOR_STORE_ID`	no	Enables `file_search` retrieval when set
`PARSER`	no	Ingestion parser: `docling` (default) / `llamaparse` / `unstructured`. Invalid value fails fast at startup. To add your own, see docs/ingestion-parser-adapters.md
`LLAMA_CLOUD_API_KEY`	only if `PARSER=llamaparse`	LlamaParse cloud key; checked at startup, not first ingest
`FRONTEND_ORIGIN`	yes (prod)	Comma-separated list of allowed CORS origins for the authenticated app surface (`/api/`, `/healthz`). Defaults to `http://localhost:5173` for dev. The public widget surface (`/widget/`) does NOT use this - it has its own posture keyed off each active widget key's registered origins (US-074)
`WIDGET_CORS_ORIGIN_CACHE_TTL`	no	Seconds the public-widget CORS layer caches the union of active-key registered origins before re-reading under the service role. Default 30; must be `> 0`. Issuing/revoking a key invalidates the cache immediately on that instance; the TTL is the cross-instance backstop (US-074)
`RATE_LIMITER`	no	Backend for the public-widget abuse/cost-DoS rate limiter (US-075 seam, `backend/rate_limiting.py`). `postgres` (default - durable counter rows reached over PostgREST via service-role RPCs) or `redis`. No in-memory backend by design (it would under-count per replica and reset on restart). Fails closed at startup on a misconfigured backend; the limiter is only built when support is configured (`SUPABASE_SERVICE_ROLE_KEY` set)
`REDIS_URL`	only if `RATE_LIMITER=redis`	Redis connection URL for the Redis limiter backend. The `redis` package is an optional dependency (not in `requirements.txt`; `pip install redis`). Checked at startup
`WIDGET_RATE_LIMIT_WINDOW_SECONDS`	no	Sliding-window length (seconds) for the public-widget per-key + per-session/IP rate limits. Default 60; must be `> 0` (US-076)
`WIDGET_RATE_LIMIT_PER_KEY`	no	Max requests per `public_key` per window, aggregated across every session/IP. Default 300; must be `> 0`. A breach refuses with a 429 + `Retry-After`, having done no retrieval/LLM work (US-076)
`WIDGET_RATE_LIMIT_PER_SESSION`	no	Max requests per session/IP (best-effort left-most `X-Forwarded-For` hop) per window, across every key. Default 30; must be `> 0`. Defense-in-depth - the per-key window and an edge/WAF limiter (P5) are the harder bounds (US-076)
`CHAT_MODE_DEFAULT`	no	`responses` or `completions`. Defaults to `responses` on an `openai` answerer, `completions` on any other provider. `responses` is OpenAI-only and fails closed at startup on a non-`openai` answerer — see docs/model-surface.md
`CHAT_HISTORY_MAX_TURNS`	no	Default 10
`RETRIEVAL_MODE`	no	`hybrid` (default) / `vector` / `keyword`. Safety escape hatch — production uses hybrid
`SEARCH_SIMILARITY_THRESHOLD`	no	Cosine threshold for `match_chunks` filter. Default 0.3
`HYBRID_RRF_K`	no	RRF damping constant. Default 60
`RERANKER`	no	`none` (default) / `cohere` / `voyage` / `llm`
`COHERE_API_KEY`	only if `RERANKER=cohere`
`VOYAGE_API_KEY`	only if `RERANKER=voyage`
`RERANK_INPUT_K`	no	Pool size fed into the reranker. Default 20
`LANGSMITH_API_KEY`	no	When set, traces ship to LangSmith
`LANGSMITH_PROJECT`	no	Default `agentic-rag`
`LANGSMITH_TRACING`	no	`true`/`false`; auto-set based on API key presence
`PORT`	no	Injected by Railway/Fly at runtime
`ANALYTICS_DATABASE_URL`	no	Postgres URL for the `analytics_readonly` role used by the text-to-SQL baseline
`CRM_DATABASE_URL`	no	Postgres URL for the `crm_readonly` role used by the semantic-layer-aware SQL search. Falls back to `ANALYTICS_DATABASE_URL`
`CRM_SEED_DATABASE_URL`	no	Writable Postgres URL used only by `python -m db_seed.crm_seed`. Falls back to `DATABASE_URL`
`ALLOWED_SQL_SCHEMAS`	no	Comma-separated schema allowlist for SQL tools. Default `analytics,crm`
`SQL_QUERY_TIMEOUT_MS`	no	Statement timeout for SQL tools. Default 10000
`ANTHROPIC_API_KEY`	only for eval generation	Required by `evals/retrieval/runner.py --include-generation` (the LLM judge runs Claude). Never read by the live backend

Model surface (provider / model selection)

Bring your own model host. Provider binds per role (answerer / embedder / judge); model binds per call-site. Two targets are tested — openai and azure — and openai accepts a base_url for any OpenAI-compatible endpoint. The embedder/judge inherit the answerer config unless overridden, so a single-provider deploy sets only the answerer (bare) vars. Full reference, role-fallback precedence, worked Azure example, capability matrix, and the embedder re-index procedure: docs/model-surface.md.

Var	Required	Notes
`LLM_PROVIDER`	no	Answerer provider: `openai` (default) or `azure`
`OPENAI_BASE_URL`	no	Any OpenAI-compatible endpoint (supported-but-untested)
`AZURE_OPENAI_ENDPOINT` / `AZURE_OPENAI_API_VERSION` / `AZURE_OPENAI_API_KEY`	only if `provider=azure`	All three required — `provider=azure` fails closed at startup if any is missing
`AZURE_OPENAI_DEPLOYMENT`	no	Azure deployment name (≠ model id); unset → per-call model id is the deployment
`EMBEDDER_PROVIDER` / `EMBEDDER_API_KEY` / `EMBEDDER_BASE_URL` / `EMBEDDER_AZURE_OPENAI_*`	no	Embedder-role overrides; fall back to the answerer config (deployment is per-role, not inherited)
`JUDGE_PROVIDER` / `JUDGE_API_KEY` / `JUDGE_BASE_URL` / `JUDGE_AZURE_OPENAI_*`	no	Runtime-judge-role overrides; same fallback rules as the embedder
`EMBEDDER_MODEL`	no	Embedder model. Falls back to `EMBEDDING_MODEL` → `text-embedding-3-small`
`METADATA_MODEL` / `OPENAI_PLANNER_MODEL` / `OPENAI_SQL_MODEL` / `OPENAI_SUBAGENT_MODEL` / `OPENAI_RERANK_MODEL`	no	Per-call-site model selectors within the answerer provider; each falls back to `OPENAI_MODEL`

Frontend (`frontend/.env`)

Var	Required	Notes
`VITE_SUPABASE_URL`	yes	Same as backend `SUPABASE_URL`
`VITE_SUPABASE_ANON_KEY`	yes	Same as backend `SUPABASE_ANON_KEY`
`VITE_BACKEND_URL`	yes	Backend origin — `http://localhost:8000` for dev, your Railway/Fly URL in prod

API surface

The backend exposes:

Method	Path	Purpose
`POST`	`/api/chat`	Streaming chat, tool-using agent loop
`GET`	`/api/config`	Frontend bootstrap (chat mode default, etc.)
`POST`	`/api/documents/{id}/ingest`	Trigger / re-trigger ingestion for an uploaded document
`POST`	`/api/documents/{id}/share`	Grant a user or group access to a document
`GET`	`/api/documents/{id}/shares`	List existing grants (owner-only)
`DELETE`	`/api/documents/{id}/shares/{principal_id}`	Revoke a grant
`POST`	`/api/search` `/api/search/keyword` `/api/search/hybrid` `/api/search/rerank`	Direct retrieval probes (debugging / eval)
`POST`	`/api/sql`	Text-to-SQL via the semantic-layer compiler
`POST`	`/api/web-search`	Web fallback
`POST`	`/api/subagent`	Spawn a document sub-agent
`POST`	`/api/support/widget-keys`	Admin: issue a new (non-secret) widget public key for a workspace you administer; the first key issued lazily provisions the workspace support bot. Rejects an empty/blank `allowed_origins` with a 400 - a key with no registered origin is inactive under the US-073 fail-closed gate, so it is refused at creation rather than minted dead (US-072/US-073)
`GET`	`/api/support/widget-keys?workspace_id=…`	Admin: list a workspace's widget keys (active + revoked) for the `/support/settings` UI (US-072)
`POST`	`/api/support/widget-keys/{key_id}/revoke`	Admin: revoke a widget key - a one-way `revoked_at` latch that blocks new conversations but never terminates a live one (US-072)
`POST`	`/widget/keys/resolve`	Public widget: resolve a non-secret `public_key` on open, gating on not-revoked then the per-key registered-origin allowlist (fail-closed - an empty allowlist or a missing/unlisted request `Origin` is refused with the same opaque 404) under the service role; returns `{"active": true}` or 404 and leaks no workspace topology. Rate-limited per-session/IP (charged first, before the resolve) and per-key (after the resolve), 429 + `Retry-After` on breach (US-072/US-073/US-076)
`POST`	`/widget/conversations/resume`	Public widget: resume an anonymous support conversation from its opaque per-conversation token (`X-Conversation-Token`, not a JWT) - slides the 24h window (US-071)
`GET`	`/widget/conversations/{id}/transcript`	Public widget: fetch a conversation's transcript, authorized by the same opaque token; read-only, never slides the window (US-071)
`GET`	`/healthz`	Liveness check

Eval suite

The CI workflows wrap the eval runners:

.github/workflows/retrieval-eval.yml — runs on PRs that touch retrieval / chunking / embeddings / escalation / migrations / the runner itself. Executes the 50-question golden set against PR head AND main, posts a delta-vs-main comment. The delta comment is advisory — it never fails the build. The PR run additionally executes two hard gates: the E6 second-workspace zero-leak eval (--include-e6) — a detected cross-workspace leak (or a structurally blind positive control) fails the build — and the E7 escalation tripwire (e7_runner --include-p1b, US-059): the deterministic deflection legs (P1a/P1b retrieval-gate decisions + the P1b non-disclosure byte-equality assertion, no LLM), where a P1a/P1b gate clear or a non-disclosure mismatch fails the build. Both are deterministic, so a real verdict can't flake; a transient E6 execution error is surfaced loudly but stays non-blocking.
.github/workflows/escalation-eval-weekly.yml — Sundays 06:00 UTC + manual workflow_dispatch. Runs the full E7 deflection sweep including the LLM-judged P2/P3 legs + the knob sweep; publishes to docs/escalation-weekly/<DATE>.md + .json. A measured false-resolve rate above the buyer's ceiling (the pinned safety number) fails the scheduled workflow and files an issue — it never blocks a merge (a judge wobble must not red-bar a PR; US-059).
.github/workflows/retrieval-eval-ragas-weekly.yml — Sundays 04:00 UTC + manual workflow_dispatch. Scores the four canonical RAGAS metrics weekly; publishes to docs/ragas-weekly/<DATE>.md; files an issue on a red gate finding.
.github/workflows/retrieval-eval-nightly.yml — daily 02:00 UTC. Publishes snapshots to docs/nightly/<DATE>.md + .json.
.github/workflows/permissions-scale-eval.yml — daily 03:00 UTC + manual workflow_dispatch. Runs the Wikipedia 10k seed + ef_search sweep; publishes to docs/permissions-scale-nightly/<DATE>.md. Fails the workflow if the configured recall floor is breached — this is the regression alarm for the day the planner flips to HNSW for some workload.

To run the eval locally:

# One-time corpus seed
export CORPUS_SEED_DATABASE_URL=postgresql://postgres:postgres@localhost:54322/postgres
export SUPABASE_URL=http://127.0.0.1:54321
export SUPABASE_SERVICE_ROLE_KEY=<from `supabase status`>
export OPENAI_API_KEY=sk-...
python -m db_seed.corpus_seed

# Eval runs
python -m evals.retrieval.runner                      # all three modes
python -m evals.retrieval.runner --mode vector        # single mode (faster)
python -m evals.retrieval.runner --include-generation # adds LLM-judge faithfulness/helpfulness (needs ANTHROPIC_API_KEY)
python -m evals.retrieval.runner --include-e6         # adds the E6 second-workspace zero-leak gate (exits 1 on a cross-workspace leak)
python -m evals.retrieval.e7_runner --include-p1b     # E7 escalation tripwire - the deterministic per-PR gate (P1a/P1b retrieval gate + non-disclosure byte-equality, no LLM; exits 1 on a gate clear or non-disclosure mismatch). The P1b leg also needs DATABASE_URL set. Add --include-p2 --include-p3 --sweep for the weekly LLM-judged legs (needs ANTHROPIC_API_KEY)

Deploy

The app deploys to Vercel (frontend) + Railway or Fly (backend) + Supabase (DB/Auth/Storage). No code changes required — only env vars.

1. Supabase

Create a project at supabase.com.

Link and push the schema:

cd supabase
supabase link --project-ref <your-ref>
supabase db push

Enable Google and GitHub OAuth providers in Authentication → Providers.
Grab SUPABASE_URL, anon key, and service_role key from Settings → API.

2. Backend — Railway (recommended)

Push the repo to GitHub.
Create a Railway project → New Service → Deploy from GitHub repo.
Set Service Root Directory to backend/. Railway picks up backend/Dockerfile and backend/railway.toml automatically.
Under Variables, set: SUPABASE_URL, SUPABASE_ANON_KEY, SUPABASE_SERVICE_ROLE_KEY, OPENAI_API_KEY, OPENAI_MODEL, OPENAI_VECTOR_STORE_ID, FRONTEND_ORIGIN, LANGSMITH_API_KEY, LANGSMITH_PROJECT. Add RERANKER + the matching API key if you want a reranker on by default.
Deploy. Note the generated *.up.railway.app URL — that's your VITE_BACKEND_URL.
Hit /healthz to confirm the service is up.

2b. Backend — Fly.io (alternative)

cd backend
fly launch --copy-config --no-deploy        # picks up fly.toml + Dockerfile
fly secrets set \
  SUPABASE_URL=... SUPABASE_ANON_KEY=... SUPABASE_SERVICE_ROLE_KEY=... \
  OPENAI_API_KEY=... OPENAI_VECTOR_STORE_ID=... \
  FRONTEND_ORIGIN=https://<your-vercel-url> \
  LANGSMITH_API_KEY=...
fly deploy

3. Frontend — Vercel

Add New Project → import the GitHub repo.
Set Root Directory to frontend/. Vercel picks up frontend/vercel.json (Vite preset, SPA rewrites).
Set env vars: VITE_SUPABASE_URL, VITE_SUPABASE_ANON_KEY, VITE_BACKEND_URL (← your Railway/Fly URL).
Deploy. Copy the production URL back into the backend's FRONTEND_ORIGIN and redeploy the backend so CORS allows it.

4. Verify

Open the Vercel URL, sign up, create a thread, send a message. The response should stream token-by-token, and a trace should appear in LangSmith tagged with your user_id and thread_id. Upload a document at /ingestion, watch it transition pending → processing → ready, then ask the chat about its contents.

How it was built

The system landed in 11 progressive modules; the full plan + per-story acceptance criteria live in .claude/agent/tasks/prd-agentic-rag.md.

Module	What landed
1	App shell, auth, threads, streaming chat, LangSmith
2	BYO retrieval (vector via match_chunks RPC), per-thread memory
3	Content-hashing dedup on documents and chunks
4	LLM structured-output metadata extraction at ingestion
5	Multi-format ingestion (txt/md/pdf/docx/html via docling)
6	Hybrid retrieval (RRF) + reranker layer (cohere / voyage / llm)
7	Additional tools — `query_database`, `web_search`
8	Sub-agents — `spawn_document_agent`
9	Structured RAG with semantic-layer-aware text-to-SQL
10	Retrieval eval suite (golden set, metrics, PR CI delta, nightly)
11	Permission-aware retrieval (per-chunk ACLs, share dialog, granting-principal badges)

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.claude		.claude
.github/workflows		.github/workflows
backend		backend
db_seed		db_seed
docs		docs
evals		evals
frontend		frontend
supabase		supabase
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG

The permissions story, in numbers

Why this is hard

What else is in the box

Documentation

Repository layout

Local development

Environment variables

Backend (`backend/.env`)

Model surface (provider / model selection)

Frontend (`frontend/.env`)

API surface

Eval suite

Deploy

1. Supabase

2. Backend — Railway (recommended)

2b. Backend — Fly.io (alternative)

3. Frontend — Vercel

4. Verify

How it was built

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG

The permissions story, in numbers

Why this is hard

What else is in the box

Documentation

Repository layout

Local development

Environment variables

Backend (backend/.env)

Model surface (provider / model selection)

Frontend (frontend/.env)

API surface

Eval suite

Deploy

1. Supabase

2. Backend — Railway (recommended)

2b. Backend — Fly.io (alternative)

3. Frontend — Vercel

4. Verify

How it was built

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/.env`)

Frontend (`frontend/.env`)

Packages