A reference architecture for AI agents that participate in operational systems
without becoming the system of record.
Why β’ Pattern β’ Validation Moment β’ Where Is the LLM β’ Validators β’ When Not To Use β’ Stack β’ Quickstart β’ Limits β’ Roadmap β’ Prior Art
On the first end-to-end run, the validator caught the LLM silently dropping 4 of 8 active projects from a clean-looking digest. The Warn badge isn't a bug. It's the architecture doing its job.
This is a reference build, not a product. It shipped in roughly four hours as a deliberate exercise in one pattern: an LLM agent operating on data it doesn't own, bounded by deterministic checks on both sides, with every decision logged to an append-only ledger enforced at the database layer.
The pattern matters because most "AI in operations" projects fail the same way. The LLM works; the integration into the system humans actually trust does not. Either the agent silently becomes the new source of truth (bad), or its outputs vanish without an audit trail (worse), or its judgment goes unchecked (worst). Project Pulse is the smallest possible artifact that demonstrates the alternative.
The flagship workflow is a weekly project digest tool. Notion holds the operational data. A Supabase Edge Function pulls it, hands it to Claude Sonnet 4.6 for narrative synthesis, runs four deterministic validators on the output, and writes every decision (passed, warned, or failed) to an append-only digest_ledger table whose RLS policy makes tampering structurally impossible from the client. The Lovable frontend renders the result. Nothing about the agent owns the operational state; everything about the agent is auditable.
Notion is canonical. The agent participates. The validators verify. The ledger remembers.
Four layers, each doing one job:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Notion β canonical project data (operational truth) β
β β β
β Supabase Edge Function β orchestrator (fetch, hash, call, log) β
β β β
β Anthropic Claude Sonnet 4.6 β narrative synthesis β
β β β
β Four Deterministic Validators β coverage, length, format, accuracy β
β β β
β Supabase digest_ledger β append-only via Postgres RLS β
β β β
β Lovable β presentation layer only β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The architectural primitives this enforces:
- The agent does not own state. Notion stays the system of record. If the LLM disappeared tomorrow, the operational data would still be there, unchanged. The agent contributes structured records back into a separate audit store rather than rewriting the source.
- The LLM is bounded on both sides. Input is fetched and hashed deterministically before the model is called. Output is validated deterministically before it is surfaced. The LLM does what LLMs are good at (summarization, prioritization, narrative); the system does not depend on it for correctness on anything checkable.
- Audit is enforced at the database, not in application code. The
digest_ledgertable has RLS configured so anon and authenticated users haveSELECTonly. The Edge Function writes via service role, which bypasses RLS. There is no application-layer "please don't mutate this." There is a Postgres policy that makes mutation structurally impossible from the client side. Same primitive as the SoD pattern in FinAgent OS. - Every call is live, every decision is logged. No caching. Every click of Generate Weekly Digest is a fresh Notion fetch, a fresh Anthropic call, a fresh validator pass, a fresh ledger row. Average cost per digest is roughly $0.01 (Sonnet 4.6 at ~600 input + ~300 output tokens). Token cost is metered to 6 decimals and stored alongside the ledger row. UUIDs are generated per decision. The audit trail is unconditional. Failed digests still log, with their failed checks named.
The badge in the hero above deserves the full story.
On the first end-to-end run, the LLM produced a perfectly readable weekly digest. It named the top 2 risks (Vendor Contract Renewals blocked on legal review, Quarterly Compliance Report with outstanding control tests), the top 2 wins (Support Tier Restructure shipped on time, Customer Survey Q2 closed at NPS 47), and asked an open question for leadership.
The badge read Warn.
The coverage validator caught that the LLM had named only 4 of the 8 in-progress-or-blocked projects. The model had editorialized, focused on the projects it deemed most important, and silently dropped four others (Annual Insurance Renewal, Sales Pipeline Reporting, Customer Onboarding v2, Q3 Marketing Site Refresh) from the digest narrative. The validators caught it; the human reviewer was told; the audit ledger preserved both the digest and the failed check.
This is the architectural pattern in 30 seconds of looking. The LLM made a defensible editorial choice (focus on what matters most). The deterministic validator made that choice visible to the human reviewer (here is what the AI chose not to mention). No LLM authority over what gets surfaced as "passed."
The Warn re-fires on every subsequent run with the same input, because the gap is structural. The prompt asks for top-2 selectivity while the validator requires 80% coverage. Those requirements are intentionally incompatible by design. A passed digest would mean the system was either tuned to ignore the gap or never actually validating anything. The Warn is the feature.
The full ledger row for this digest is preserved at docs/evidence/first-warn.json: input hash, token cost, validation status, failed checks array, full output text, all stored verbatim.
Borrowed from FinAgent OS's SOX-MAPPING.md, the same enumeration discipline: every decision point in the pipeline classified as deterministic or LLM-driven, so a reviewer can trace AI involvement without reading code.
| Step | Who | Why |
|---|---|---|
| Fetch projects from Notion | Deterministic code | Source of truth, no transformation |
| Hash input state (SHA-256) | Deterministic code | Reproducibility; same state always produces same hash |
| Generate digest narrative | Claude Sonnet 4.6 | Narrative synthesis, appropriate LLM use |
| Validate coverage (β₯80% of in-progress + blocked projects named) | Deterministic code | Hallucination detection by omission |
| Validate length bounds (150β400 words) | Deterministic code | Output shape gate |
Validate question presence (β₯1 ? in output) |
Deterministic code | Format check |
| Validate blocked-count accuracy | Deterministic code | Numerical claim verification |
Write to digest_ledger via service role |
Deterministic code | Audit invariant |
| Surface to UI | Deterministic code | Lovable receives the response; no LLM in render path |
The LLM has zero authority over what gets written to the ledger or what gets surfaced as "passed." Those decisions are made by the validators, which read like any other piece of code.
The empirical proof: if you replace the Anthropic call with a static fixture, every other step in the pipeline produces identical outputs. The deterministic spine is reproducible by reading the Edge Function source.
Every digest is gated on these four checks. Failed checks are surfaced as failed_checks in the response and stored in the ledger.
-
Project name coverage. The digest must name at least 80% of projects whose status is
In ProgressorBlocked(the "needs attention" subset). Done and Not Started projects are excluded. Done is celebration territory, Not Started has nothing to surveil. The coverage check is the architecture's most aggressive guard against silent omission. -
Length bounds. Between 150 and 400 words. Below 150, the digest is too thin to be actionable. Above 400, it's rambling and won't be read. Outside these bounds is a Warn.
-
Question presence. The digest must contain at least one
?character. The prompt asks for an open question for the team, and the validator confirms it's actually there. Forces the LLM to leave room for human input rather than closing the narrative. -
Blocked-count accuracy. If the digest claims "N projects are blocked" (or "N blocked"), N must match the actual count in the input. Catches a specific class of numerical hallucination.
Status grading is severity-weighted: zero failed checks = Passed, one failed check = Warn, two or more = Failed. Failed digests still log to the ledger. The audit trail is unconditional.
The pattern fits where AI output is consequential, auditable, and human-reviewed. It doesn't fit everywhere.
- Latency-sensitive consumer chat. The four-validator pass plus ledger write adds roughly 50β100ms per response. For a conversational assistant where the user is actively waiting, that's friction without payoff.
- Contexts where the LLM is supposed to own state. A calendar bot, a persistent memory agent, a long-running task executor. Project Pulse's "agent participates but doesn't own" is the wrong frame when the agent's whole job is to own state.
- Low-stakes summarization where audit isn't the point. If the output is throwaway (drafting a reply to a friend's email, summarizing an article for personal reading), the audit ledger is overhead, not insurance.
- Fully closed-loop automation with no human reviewer. If you're generating prompts and consuming outputs entirely inside the same automated pipeline, the validator layer becomes a redundant check on yourself. Reserve this pattern for cases where a human reviewer (or a regulator) needs to verify the LLM's work after the fact.
This pattern is for AI outputs that someone will read, trust, and act on, in contexts where the cost of being wrong is real.
| Layer | Tool | Why |
|---|---|---|
| Source of truth | Notion | Operational data lives where humans already trust it |
| Bridge / orchestrator | Supabase Edge Functions (TypeScript / Deno) | Server-side, secret-managed, serverless |
| LLM | Anthropic Claude Sonnet 4.6 (alias-pinned) | Strong narrative synthesis, transparent token pricing |
| Audit ledger | Supabase Postgres + RLS | Append-only enforced at the data layer |
| Frontend | Lovable + Vite + React + Tailwind | Internal-tool aesthetic, fast iteration |
| Hashing | SHA-256 via Web Crypto API | Standard, deterministic, no dependencies |
The two Edge Functions are roughly 150 lines of TypeScript each. The validators are pure functions of (digest_text, project_list). The frontend is three tabs and zero auth.
You'll need: a Notion account, a Supabase project, an Anthropic API key, and a Lovable workspace (or any static host if you don't use Lovable). Total time is roughly 30 minutes.
Create a database called Projects with these columns:
| Column | Type |
|---|---|
Name |
Title |
Owner |
Text or Person |
Status |
Select (Not Started, In Progress, Blocked, Done) |
Due Date |
Date |
Last Update |
Date |
Notes |
Text |
Populate with at least 8 sample rows across all four statuses.
Create a Notion integration at notion.so/profile/integrations with Read content capability only (least privilege). Connect it to your database via the database's Β·Β·Β· menu, then Connect to, then your integration.
Copy two values: the integration token (starts with ntn_) and the database ID (the 32-character hex string in the database URL).
Create a new Supabase project. In the SQL Editor, run supabase/migrations/0001_init.sql to create the digest_ledger table with its RLS policies. Verify in the Policies tab that the table has SELECT for anon and authenticated, and no INSERT/UPDATE/DELETE policies (writes happen via service role only).
In Project Settings β Edge Functions β Secrets, set three secrets:
| Secret | Value |
|---|---|
NOTION_TOKEN |
Your Notion integration token |
NOTION_PROJECTS_DB_ID |
Your Notion database ID |
ANTHROPIC_API_KEY |
Your Anthropic API key |
Deploy both Edge Functions from supabase/functions/ via the dashboard (Functions β Deploy a new function β paste the code). Set Verify JWT to OFF on both, since this reference build has no auth.
Test from the Supabase dashboard:
fetch-projectsshould return your project rows as JSON.generate-digestshould return{ digest, validation, ledger_id, token_cost, model }and write a row todigest_ledger.
Create a new Lovable project. Use the prompt in lovable-prompt.md to scaffold the three-tab frontend (Projects, Digest, Ledger).
The frontend calls the Edge Functions via direct fetch() against the function URLs rather than the supabase.functions.invoke() SDK, so substitute your project ref and anon key into the placeholders the prompt leaves for you:
const SUPABASE_URL = 'https://<YOUR-PROJECT-REF>.supabase.co'
const SUPABASE_ANON_KEY = '<YOUR-PUBLISHABLE-ANON-KEY>'Run the Lovable preview. Click Generate Weekly Digest. You should see a digest render with a validation badge, and a new row should appear in digest_ledger in the Supabase Table Editor.
That's it. Every click after this is a fresh end-to-end run.
project-pulse/
βββ README.md β this file
βββ assets/
β βββ banner.png β header graphic (rendered)
β βββ banner.svg β header graphic (source)
βββ supabase/
β βββ functions/
β β βββ fetch-projects/index.ts β Notion β JSON
β β βββ generate-digest/index.ts β orchestrator: fetch β call β validate β log
β βββ migrations/
β βββ 0001_init.sql β digest_ledger table + RLS
βββ src/ β Lovable-generated frontend (Vite/React)
βββ lovable-prompt.md β prompt to scaffold the frontend
βββ docs/
βββ screenshots/ β demo evidence
βββ evidence/
βββ first-warn.json β ledger row for first production Warn
The Edge Function code is the readable core of the architecture. Open supabase/functions/generate-digest/index.ts and you can trace the entire pipeline top to bottom: fetch, hash, prompt, call, validate, write, return.
The gaps a reviewer would find in v0.1 are surfaced below. Most are deliberate scope choices; some are real and roadmapped.
These are properties of v0.1's scope, not bugs.
- Count-accuracy is scoped to "blocked". The validator checks
N projects blockedclaims against the actual count, but does not generalize toN projects done,N not started, etc. The first production run surfaced this gap immediately. The digest correctly stated "3 blocked" but claimed "3 not started" when the actual count was 2. The narrower check passed; the looser claim slipped through. v0.2 generalizes count-accuracy to all status categories. - Hallucination detection is coverage-based, not name-based. The validator catches projects that should be named but weren't. It does not catch project names that were added by the LLM but don't exist in the input (e.g., "Atlas Migration" appearing in a digest when no such project is in Notion). v0.2 adds a set-membership check on extracted proper-noun phrases against the input project list.
- No auth. The frontend has no login. Lovable's anon key is exposed to the browser, the Edge Functions have
Verify JWT = OFF. This is appropriate for a single-operator internal-tool reference build. Production deployment would require Supabase Auth, JWT verification on the Edge Functions, and per-user RLS scoping on the ledger. - Notion is polled per request. Every Generate Weekly Digest click fetches the entire Notion database. At ~12 rows this is fine; at thousands of rows you would materialize Notion into Supabase with a periodic sync and query the Supabase view from the frontend.
- v0.2: Generalize count-accuracy validator. Extend the regex-based pattern matcher to check
done,blocked,in progress,not startedclaims against actual counts. Pure validator change, no LLM or schema impact. - v0.2: Set-membership name validation. Extract capitalized 2+ word phrases from the digest, verify each exists in the input project list. Catches the inverse-of-coverage hallucination class.
- v0.3: Auth. Supabase Auth + per-user RLS on
digest_ledger. Multi-tenant ready. - v0.3: Notion β Supabase materialization. Periodic sync of the Notion database into a Supabase table for cheap reads at scale.
- v0.4: Multiple digest modes. Same architecture, different prompts. For example, a month-end variance digest mode for finance teams, a blockers-only mode for daily standups. The validator scaffolding is mode-agnostic.
The pattern in this repo is informed by my other builds in the same architectural family:
- FinAgent OS. The governance-first AI agent platform calibrated for SOX-regulated crypto finance. The "Where is the AI?" enumeration discipline, the SoD-enforced-at-the-database pattern, the Shadow Ledger primitive, and the principle that audit-grade properties must be structurally true (not just policy-true) all originate there. Project Pulse is a 4-hour distillation of that posture into a single reference workflow.
- regulated-rag. Citation-grounded, refusal-bounded retrieval over the Fair Debt Collection Practices Act. Same architectural philosophy: deterministic checks on both sides of the LLM, every retrieval and generation logged with a
request_id, refusal is a first-class output. The validator-as-quality-gate pattern adapted here was first formalized there. - JobSignal Engine. Open-source job discovery pipeline with eval-driven scoring rubric. Same eval-as-architecture discipline, different domain.
The shared thesis across all three: AI systems in regulated or audit-adjacent contexts need both deterministic spines and immutable evidence, not just outputs. Project Pulse is the smallest possible artifact that makes that thesis visible end-to-end.
MIT: see LICENSE. Use this, fork it, adapt the validator-and-ledger pattern into your own work. The acknowledgement is a star and a link back; the value to me is the pattern reaching more agent-in-operations builds.
Rizwan Ahmed, ACCA Founder, Velocyt Consulting Β· Mississauga, ON
- Website: velocyt.ca
- GitHub: @RZ-Logic
Also built: FinAgent OS (SOX-aligned agent platform) Β· regulated-rag (citation-grounded RAG) Β· Immi-OS (immigration automation, real clients) Β· JobSignal Engine (job tracker, $0β12/mo)
If you find this useful, a star helps others find it.
Built in Toronto by an audit-trained ACCA who keeps building systems where the AI participates without owning state.
