ED Chief Complaint → ICD-10 RAG Pipeline

A proof-of-concept demonstrating that a sub-2 GB LLM running entirely on CPU can normalize emergency-department (ED) chief complaints into candidate ICD-10 codes via Retrieval-Augmented Generation (RAG) — with no cloud API calls, no GPU, and no proprietary data.

⚠️ Status: Research / dev prototype. Not validated for clinical use.

Overview

Step	Script	What it does
1a	`01_build_fake_kb_lancedb.py`	Generates synthetic KB docs (hospital SQL schema + ~600 ICD-10 codes) and writes them to `fake_kb_data.csv`.
1b	`01_build_fake_kb_lancedb.py`	Reads back the CSV, flushes the existing LanceDB table, embeds all docs, and re-ingests into LanceDB.
2	`02_rag_llama32_edge_tests.py`	Runs 33 structured edge-test cases through the full RAG pipeline: embed → retrieve → prompt → generate → parse → validate JSON. Auto-saves results to `results/rag_results_<timestamp>.json`.
—	`run_pipeline.py`	One-command runner that chains steps 1 and 2 end-to-end.

Architecture

Chief Complaint (free text)
        │
        ▼
  Sentence Embedder
  NeuML/pubmedbert-base-embeddings  (110 M params, 768-dim, CPU)
        │
        ▼
  LanceDB cosine search ◄── KB: ~600 ICD-10 codes + 3 SQL schema docs
        │                         ▲
        │                         │  ingested from fake_kb_data.csv
        │                         │  (flushed + re-embedded on each build)
        │  top-k retrieved chunks
        ▼
  Prompt builder
  (structured system prompt + anti-laziness rules + few-shot behavior examples)
        │
        ▼
  Tiny LLM  (1 B params, CPU-only, float16 or Q4_K_M)
        │
        ▼
  JSON output parser + post_process() guards + validator
        │
        ▼
  { candidate_icd_codes, confidence, flags, … }
        │
        ▼
  results/rag_results_<timestamp>.json

Model Profiles (`model_config.yaml`)

Four profiles are defined and benchmarked. Switch with --model <profile>, the MODEL_PROFILE env var, or by editing active_profile in model_config.yaml.

Profile key	Model	Size	Dtype	Gated	Backend
`llama32_1b_instruct`	`meta-llama/Llama-3.2-1B-Instruct`	~2.5 GB	float16	✅ HF token + Meta licence	Transformers
`gemma3_1b`	`google/gemma-3-1b-it`	~2.0 GB	bfloat16	✅ HF token + Google licence	Transformers
`danube3_500m`	`h2oai/h2o-danube3-500m-chat`	~0.98 GB	float16	❌ ungated	Transformers
`llama32_1b_q4km` (default)	`bartowski/Llama-3.2-1B-Instruct-GGUF` (Q4_K_M)	~0.81 GB	Q4_K_M	❌ ungated	llama-cpp-python

No HF token? Use danube3_500m or llama32_1b_q4km — both run with zero credentials.

Knowledge Base (`01_build_fake_kb_lancedb.py`)

The builder now follows a CSV-first pipeline:

Generate — build_fake_schema_docs() + build_fake_icd_docs() produce in-memory dicts.
Persist to CSV — write_docs_to_csv() serialises all docs to fake_kb_data.csv (columns: doc_type, doc_id, title, text, icd_code, icd_desc).
Reload from CSV — read_docs_from_csv() reads the CSV back, making the CSV the authoritative source for downstream ingestion.
Flush — flush_lancedb_table() drops the existing kb_docs table for a clean slate.
Embed + ingest — Docs are vectorised with NeuML/pubmedbert-base-embeddings (768-dim, domain-tuned on PubMed/MEDLINE) and written to lancedb_store/.

Override paths via env vars: KB_CSV_PATH (CSV output), LANCEDB_DIR (store directory).

Two document types are embedded and stored in ./lancedb_store:

Hospital schema stubs

Three SQL CREATE TABLE definitions:

dbo.Patient — MRN, DOB, Sex, PostalCode
dbo.Encounter — EncounterDtm, FacilityCode, ChiefComplaint, TriageAcuity
dbo.Diagnosis — DxCode, DxSystem, DxDescription, DxRank

ICD-10 code index (~600 codes)

Clinical domain	Code range	Sample conditions
Symptoms & Signs	R00–R99	Chest pain, syncope, fever, haematuria, altered mental status
Cardiovascular	I10–I99	Angina, STEMI, NSTEMI, AFib, PE, aortic dissection, DVT, heart failure
Respiratory	J00–J99	URI, pharyngitis, pneumonia, COPD exacerbation, asthma, pneumothorax
Gastrointestinal	K00–K95	GERD, appendicitis, bowel obstruction, cholecystitis, pancreatitis, GI bleed
Genitourinary	N00–N99	UTI, pyelonephritis, renal colic, kidney stones, PID, torsion
Neurology	G00–G99	Meningitis, seizure, migraine, TIA, stroke, Bell palsy
Mental Health / Tox	F00–F99	Alcohol withdrawal, opioid OD, psychosis, depression, panic, PTSD
Musculoskeletal	M00–M99	Gout, OA, low back pain, sciatica, rotator cuff, fibromyalgia
Infectious Disease	A00–B99	Sepsis, C. diff, herpes zoster, HIV, hepatitis, Lyme
Endocrine / Metabolic	E00–E90	DKA, hypoglycaemia, thyroid storm, electrolyte disorders
Dermatology	L00–L99	Cellulitis, abscess, urticaria, Stevens-Johnson
Eye / ENT	H00–H95	Conjunctivitis, acute glaucoma, otitis media, Ménière's, vertigo
Haematology	D50–D89	Anaemia, DIC, ITP, neutropenia, sickle-cell crisis
Injury / Trauma	S00–T98	Fractures, concussion, burns, poisoning, anaphylaxis
Obstetrics	O00–O9A	Spontaneous abortion, pre-eclampsia, hyperemesis, PPROM
Neoplasms	C00–D49	Common oncology presentations seen in the ED

Test Suite (`02_rag_llama32_edge_tests.py`)

33 cases cover three behavioural categories:

Core cases (original 9)

Case ID	Category	CTAS	What is being tested
`GREEN_angina_like`	GREEN	3	Happy path — chest tightness → I20.9 / R07.9
`GREEN_uri_like`	GREEN	4	Happy path — fever + sore throat → J06.9
`GREEN_uti_like`	GREEN	4	Happy path — pelvic pain + urgency → N39.0
`RED_empty`	RED	5	Empty input → `EMPTY_INPUT` flag, zero ICD codes
`RED_nonsense`	RED	5	Garbage text → `NONSENSE_INPUT` / `LOW_CONTEXT` flag, zero ICD codes
`RED_schema_request`	RED	—	Prompt-injection asking for DB schema → rejected, no ICD codes
`EDGE_contains_code`	EDGE	4	Complaint already contains an ICD code string → test for echo / hallucination
`EDGE_conflicting_symptoms`	EDGE	3	Multi-system symptoms → `CONFLICTING_SYMPTOMS` flag
`EDGE_very_long`	EDGE	3	~1 800-token input → truncation + token limit handling

Expanded GREEN cases (15 additional)

Case ID	Category	What is being tested
`GREEN_chest_pain_exertional`	GREEN	Exertional chest pain → R07.x
`GREEN_sob_acute`	GREEN	Acute shortness of breath → J96.x
`GREEN_appendicitis_like`	GREEN	RLQ pain, nausea → K35.x
`GREEN_dvt_leg`	GREEN	Unilateral leg swelling → I82.4x
`GREEN_migraine_classic`	GREEN	Classic migraine with aura → G43.x
`GREEN_wrist_injury`	GREEN	Wrist injury / fracture → S52.x
`GREEN_cellulitis_leg`	GREEN	Red/warm leg → L03.1x
`GREEN_hypoglycemia`	GREEN	Shakiness, diaphoresis → E16.x
`GREEN_hypertension_headache`	GREEN	Headache + elevated BP → R51 / I10
`GREEN_eye_redness`	GREEN	Red painful eye → H10.x
`GREEN_back_pain_acute`	GREEN	Acute low back pain → M54.5
`GREEN_pediatric_ear`	GREEN	Ear pain, paediatric → H66.x
`GREEN_allergic_hives`	GREEN	Urticaria, allergic reaction → L50.x
`GREEN_vertigo`	GREEN	Dizziness / vertigo → R42 / H81.x
`GREEN_kidney_stone`	GREEN	Flank pain, haematuria → N20.x

Expanded EDGE cases (9 additional)

Case ID	Category	What is being tested
`EDGE_vague_unwell`	EDGE	"Just not feeling right" — minimal info
`EDGE_sob_abbreviations`	EDGE	Heavy use of clinical abbreviations
`EDGE_overdose_intentional`	EDGE	Intentional overdose — safety-sensitive
`EDGE_seizure_postictal`	EDGE	Post-ictal state description → G40.x
`EDGE_pregnancy_bleeding`	EDGE	Early pregnancy bleeding → O20.x
`EDGE_mental_health`	EDGE	Psychiatric presentation → F-codes
`EDGE_hematuria_painless`	EDGE	Painless haematuria → R31.x
`EDGE_anaphylaxis`	EDGE	Anaphylactic reaction → T78.2
`EDGE_foreign_body_ingested`	EDGE	Swallowed foreign body → T18.x

Output Schema

Every case produces a validated JSON object:

{
  "input_text": "<exact ChiefComplaint text, char-for-char>",
  "normalized_chief_complaint": "<cleaned clinical phrase>",
  "candidate_icd_codes": ["I20.9"],
  "candidate_icd_rationales": ["Chest tightness on exertion, relieved at rest — consistent with stable angina."],
  "sql_fields_to_store": [
    "EncounterId", "ChiefComplaintRaw", "ChiefComplaintNormalized",
    "CandidateICD1", "CandidateICD1Confidence", "ModelName", "RunTimestampUTC"
  ],
  "confidence": 0.72,
  "flags": [],
  "model_used": "meta-llama/Llama-3.2-1B-Instruct"
}

Validation rules enforced on every output:

All required keys present
candidate_icd_codes contains only codes that appear in the retrieved context (grounding check)
input_text matches the original complaint exactly
confidence in [0.0, 1.0]; universal floor of 0.10 when codes are present
Placeholder strings (e.g. "string") are rejected
On parse or validation failure: prompt is reinforced and retried once before raising

post_process() guards (applied between generation and validation):

Dict coercion — If candidate_icd_rationales contains dict objects (a frequent 1B model artefact), each is unwrapped to its string value automatically.
JSON-bleed sanitisation — If normalized_chief_complaint starts with { or contains candidate_icd_codes, the field is cleared to prevent schema leakage.
Schema-echo guard — Detects and removes schema keywords (including CANDIDATEICD1, MODELNAME, RUNTIMESTAMPUTC) leaked into clinical fields.
ICD fallback — When the LLM returns an empty candidate_icd_codes list, the retrieval rank-1 code is injected so validation always receives at least one code.
Rationale padding / truncation — Pads with "No rationale provided." or truncates so rationales align 1:1 with codes.
Spurious LOW_CONTEXT stripping — Removes the LOW_CONTEXT flag when codes were actually extracted successfully.

Benchmark Results Summary

Tested on GitHub Codespace — AMD EPYC 9V74, 4 vCPUs, 32 GB RAM, no GPU.

Model	Pass rate (33 cases)	Avg latency/case	Peak RAM
Llama 3.2 1B Instruct (float16)	see `MODEL_COMPARISON_REPORT.md`	~8–12 tok/s	~2.5 GB
Gemma 3 1B Instruct (bfloat16)	see report	~8–11 tok/s	~2.0 GB
H2O-Danube3 500M (float16)	see report	~14–18 tok/s	~0.98 GB
Llama 3.2 1B Q4_K_M (GGUF)	33/33 (100%)	~10.7 s/case	~0.81 GB

The Q4_K_M profile achieved 33/33 structural passes with the PubMedBERT embedder and anti-laziness prompt. ICD fallback rate was reduced from 48% → 29% through prompt engineering alone. See results/PIPELINE_REPORT_20260306.md for the detailed per-case run report.

Full per-case pass/fail table and timing data: MODEL_COMPARISON_REPORT.md

Setup

Prerequisites

Python 3.10+
~3 GB free disk for model weights (less for GGUF / Danube3)
No GPU required

1 — Clone and install dependencies

pip install lancedb sentence-transformers transformers torch python-dotenv pyyaml pandas

For the GGUF profile, also install:

pip install llama-cpp-python

2 — Configure environment

cp .env.example .env
# Edit .env — set HF_TOKEN if using a gated model, adjust HF_HOME to your cache path

3 — Build the vector store

python 01_build_fake_kb_lancedb.py

This runs the full CSV-first pipeline:

Generates 605 KB docs (3 schema + 602 ICD-10)
Writes them to fake_kb_data.csv
Flushes any existing kb_docs table from LanceDB
Embeds all docs and re-ingests into ./lancedb_store/

4 — Run the edge tests

# Default profile (llama32_1b_instruct) — saves JSON automatically
python -u 02_rag_llama32_edge_tests.py

# Ungated alternative (no HF token needed)
python -u 02_rag_llama32_edge_tests.py --model danube3_500m

# Specify custom results path
python -u 02_rag_llama32_edge_tests.py --save-results my_results.json

# Via environment variable
MODEL_PROFILE=gemma3_1b python -u 02_rag_llama32_edge_tests.py

Results are always persisted to results/rag_results_<YYYYMMDD_HHMMSS>.json (unless overridden with --save-results).

5 — Run the full pipeline in one command

# Uses active_profile from model_config.yaml
python run_pipeline.py

# Override model
python run_pipeline.py --model danube3_500m

# Override output path
python run_pipeline.py --save-results path/to/output.json

run_pipeline.py chains steps 3 and 4: build KB from CSV → ingest LanceDB → run RAG tests → save JSON.

Key Design Decisions

Decision	Rationale
CPU-only inference	Targets Windows Server / developer laptops with no GPU; PyTorch CPU threads saturate available cores.
Grounded ICD codes only	The prompt forbids codes absent from the retrieved context, directly reducing hallucination.
Retry with reinforcement	On JSON parse / validation failure, stricter rules are appended and generation is retried once.
Heartbeat threads	Long model-load and generation phases emit periodic `HEARTBEAT` lines so the process never looks hung in CI or terminal.
Behaviour-only few-shot examples	In-prompt examples demonstrate empty/nonsense handling only — no diagnosis-anchoring — to avoid biasing ICD predictions.
`model_config.yaml` profiles	All model hyperparameters (dtype, max_new_tokens, repetition_penalty, backend) live in a single YAML file; switching models requires no code changes.
Domain-tuned embedder	`NeuML/pubmedbert-base-embeddings` (768-dim) replaces the generic `all-MiniLM-L6-v2`; dramatically improves retrieval for clinical/obstetric queries.
Anti-laziness prompt	Explicit instruction that the model must extract ≥ 1 ICD code if any retrieved context code is even partially relevant; reduced fallback rate by 40%.
Defensive `post_process()` pipeline	Six guards (dict coercion, JSON-bleed, schema-echo, ICD fallback, rationale alignment, flag cleanup) run between generation and validation, catching 1B-model output artefacts without requiring retries.
Universal confidence floor	`confidence ≥ 0.10` whenever `candidate_icd_codes` is non-empty, preventing misleading zero-confidence outputs.

Extending This

Swap in real data — Replace build_fake_icd_docs() with a reader for a real ICD-10 CSV (CMS tabular file or WHO release); it will be automatically written to fake_kb_data.csv and ingested.
Add real EMR schema — Extend build_fake_schema_docs() with your actual CREATE TABLE definitions.
Swap the generator — Any AutoModelForCausalLM-compatible model works; add a new profile block to model_config.yaml.
Persist results — The results/rag_results_<timestamp>.json file includes full case_outputs per case; feed those into your pipeline or BI tool.
Batch mode — Wrap run_case() in a loop over a CSV of real chief complaints, or extend build_test_cases() to load from a CSV.

File Reference

.
├── 01_build_fake_kb_lancedb.py   # KB builder: generate → CSV → flush → embed → LanceDB
├── 02_rag_llama32_edge_tests.py  # RAG pipeline + 33-case edge-test harness (auto-saves JSON)
├── run_pipeline.py               # One-command runner: KB build → RAG tests → JSON results
├── fake_kb_data.csv              # Generated KB docs (CSV intermediary; auto-created by step 1)
├── model_config.yaml             # Model profiles (switch without code changes)
├── lancedb_store/                # Persisted vector store (flushed + rebuilt by step 1)
├── results/                      # Per-run JSON results (created by step 2 / run_pipeline.py)
├── MODEL_COMPARISON_REPORT.md    # Benchmark results across all four model profiles
├── TEST_REPORT.md                # Detailed per-case pass/fail output
├── results/PIPELINE_REPORT_20260306.md  # Detailed pipeline run report with per-case analysis
├── .env.example                  # Environment variable template (copy → .env)
└── .gitignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ED Chief Complaint → ICD-10 RAG Pipeline

Overview

Architecture

Model Profiles (`model_config.yaml`)

Knowledge Base (`01_build_fake_kb_lancedb.py`)

Hospital schema stubs

ICD-10 code index (~600 codes)

Test Suite (`02_rag_llama32_edge_tests.py`)

Core cases (original 9)

Expanded GREEN cases (15 additional)

Expanded EDGE cases (9 additional)

Output Schema

Benchmark Results Summary

Setup

Prerequisites

1 — Clone and install dependencies

2 — Configure environment

3 — Build the vector store

4 — Run the edge tests

5 — Run the full pipeline in one command

Key Design Decisions

Extending This

File Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
results		results
.env.example		.env.example
.gitignore		.gitignore
01_build_fake_kb_lancedb.py		01_build_fake_kb_lancedb.py
02_rag_llama32_edge_tests.py		02_rag_llama32_edge_tests.py
03_build_icd9_kb.py		03_build_icd9_kb.py
04_map_uncoded_problems.py		04_map_uncoded_problems.py
MODEL_COMPARISON_REPORT.md		MODEL_COMPARISON_REPORT.md
README.md		README.md
fake_kb_data.csv		fake_kb_data.csv
model_config.yaml		model_config.yaml
requirements.txt		requirements.txt
run_icd9_pipeline.py		run_icd9_pipeline.py
run_pipeline.py		run_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

ED Chief Complaint → ICD-10 RAG Pipeline

Overview

Architecture

Model Profiles (model_config.yaml)

Knowledge Base (01_build_fake_kb_lancedb.py)

Hospital schema stubs

ICD-10 code index (~600 codes)

Test Suite (02_rag_llama32_edge_tests.py)

Core cases (original 9)

Expanded GREEN cases (15 additional)

Expanded EDGE cases (9 additional)

Output Schema

Benchmark Results Summary

Setup

Prerequisites

1 — Clone and install dependencies

2 — Configure environment

3 — Build the vector store

4 — Run the edge tests

5 — Run the full pipeline in one command

Key Design Decisions

Extending This

File Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Model Profiles (`model_config.yaml`)

Knowledge Base (`01_build_fake_kb_lancedb.py`)

Test Suite (`02_rag_llama32_edge_tests.py`)

Packages