Assemble payer-ready prior authorization packets from FHIR R4 patient records, with every clinical claim cited back to the source resource and every model decision logged for audit.
Built to demonstrate the architecture pattern that survives both the CMS-0057-F 72-hour clock and a Lokken-style discovery order: AI prepares, humans decide, audit trail proves it.
CMS-0057-F took effect January 1, 2026. Payers must now decide urgent prior authorizations in 72 hours and standard ones in 7 days, with structured denial reasons attached. The volume is impossible without AI in the loop.
At the same time, the Estate of Lokken v. UnitedHealth Group litigation has forced public disclosure of how AI is actually used in coverage decisions. The architecture choices that produced 1.2-second-per-claim Cigna-PXDX-style outcomes are now the architecture choices that get subpoenaed.
This repo is the alternative: an assistive system that pulls clinical evidence via FHIR, maps it to specific payer policy criteria, generates a structured packet for a human reviewer, and logs every input, prompt, model version, and decision against a single decision_id.
flowchart LR
A[FHIR Bundle<br/>Patient + Encounter +<br/>Condition + Procedure] --> B[Resource Extractor]
P[Payer Policy JSON<br/>criteria + required evidence] --> C[Criterion Matcher]
B --> C
C -->|prompt cached:<br/>policy criteria| D[Claude Sonnet 4.6<br/>citation generator]
D --> E[Structured PA Packet<br/>criterion -> evidence -> citation]
D --> F[Audit Log<br/>decision_id, model_version,<br/>prompt_hash, retrieved_docs]
E --> R[Human Reviewer<br/>approves / edits / denies]
R --> F
- Loads a FHIR R4 Bundle from JSON (Synthea-compatible)
- Loads a payer policy ruleset (JSON, criterion-by-criterion)
- For each criterion, retrieves the matching FHIR resources
- Calls Claude Sonnet 4.6 with prompt caching on the policy block
- Produces a structured PA packet where each clinical claim cites its source FHIR resource by
resourceType/id - Writes a complete audit log per
decision_idincluding model version, prompt hash, retrieved resource IDs, raw model output, and final structured packet
- Does not make the final approval/denial decision. That stays with the human reviewer
- Does not generate freeform denial language. Denial reasons map to specific policy criteria; the model cites evidence, the rules engine resolves the criterion
- Does not chain into automated EHR write-back. The packet is a review artifact, not an action
git clone https://github.com/devjothish/fhir-prior-auth-assembler.git
cd fhir-prior-auth-assembler
pip install -r requirements.txt
cp .env.example .env # add your ANTHROPIC_API_KEY
python -m src.assembler --patient data/patients/sample_skilled_nursing.json --policy data/policies/skilled_nursing_facility.jsonOutput: a structured packet.json and an audit/<decision_id>.json file.
For a synthetic patient with a stroke diagnosis being considered for skilled nursing facility admission, the assembler produces:
{
"decision_id": "a4f2...",
"criteria_assessed": [
{
"criterion_id": "snf-medical-necessity-1",
"criterion_text": "Patient requires skilled nursing or rehab services 5+ days/week",
"evidence_found": true,
"citations": [
{
"claim": "Patient requires daily PT and OT post-stroke",
"source": "Condition/cond-001",
"source_text": "Acute ischemic stroke, sequelae"
},
{
"claim": "Inpatient PT ordered 5 sessions/week",
"source": "ServiceRequest/svc-014",
"source_text": "Physical therapy, frequency: 5 times weekly"
}
],
"recommendation": "MEETS_CRITERION",
"confidence": "high"
}
]
}The recommendation is for human review, not auto-action.
python -m eval.run_eval --cases eval/cases/Runs the assembler against synthetic cases with known ground truth (meets / does not meet criterion) and reports:
- Citation accuracy (does cited evidence actually appear in the source resource?)
- Criterion assessment agreement vs ground truth
- Low-confidence routing rate
- Per-criterion bias check across synthetic demographic groups
- Citation is structured, not generated. The model identifies a
resourceType/idand quotes a span, but the citation linkage is enforced by post-processing against the actual FHIR bundle. If the model invents aresourceType/idthat does not exist, the packet rejects it. - Prompt caching on the policy block. Payer policies are stable; patient bundles change every call. Caching the policy block per Anthropic's prompt caching API keeps cost predictable at scale.
- Audit log is append-only and joinable. Every entry includes
decision_id,model(with version),prompt_hash,retrieved_resource_ids,raw_response,structured_output,human_decision(when added),timestamp. A single decision_id can reproduce the entire context that produced the packet. - Low-confidence cases route to humans with no recommendation attached. A nudged reviewer is a worse decision-maker than an unprimed one. If confidence is below threshold, the packet shows evidence but suppresses the model's recommendation field.
- No clinician name appears on outputs the clinician did not see. Every signed artifact requires an explicit human review event in the audit log.
- CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F)
- Estate of Lokken v. UnitedHealth Group
- HL7 Da Vinci Prior Authorization Support IG
- Companion newsletter: 72 Hours and 1.2 Seconds: The Architecture Problem CMS Just Handed Every Payer (link forthcoming)
Jothiswaran Arumugam - AI Engineer building production agent systems for regulated industries
- 6x AWS + 7x GCP Certified
- Jo's Cloud AI Hub Newsletter
Status: v0 scaffold, active development. See NEXT_STEPS.md for the 2-week build roadmap.