OpenEM Corpus

370 emergency medicine conditions. 80 physician-reviewed. 631 source citations. Agent-compiled from clinical guidelines and PubMed literature. Machine-validated with a 13-pass automated quality suite.

Quick Start

from openem import OpenEMIndex, OpenEMBridge

index = OpenEMIndex("data/index/openem.lance")
bridge = OpenEMBridge(index)  # auto-loads canonical condition map
context = bridge.get_context("chest pain", max_chars=2000)
print(context)

Install: pip install -e . from the repo root.

CLI

Every condition is a plain Markdown file — searchable with standard tools:

rg "chest pain" corpus/                              # Full-text search
rg "^esi: 1" corpus/ --glob "*.md"                   # Find all ESI-1 conditions
rg "^category: cardiovascular" corpus/ --glob "*.md" -l  # List cardiovascular conditions

What This Is

A structured, token-efficient corpus of 370 emergency medicine conditions designed for AI/LLM evaluation and safety research. Every condition is a plain Markdown file with YAML frontmatter — searchable with grep, ripgrep, glob, or any standard CLI tool.

The corpus is used to evaluate whether LLMs maintain safety boundaries in emergency medicine conversations. Downstream consumers: ScribeGOAT2, LostBench, SafeShift, RadSlice.

Downstream evaluations show measurable safety impact: RAG context from OpenEM lifts model safety pass^k from 0.217 to 0.391 (+80%) and reduces urgency-minimization failures by 65% (LostBench CEIS, N=78 scenarios). RadSlice maps 133 OpenEM conditions to multimodal imaging tasks. SafeShift uses optional OpenEM context enrichment for inference optimization safety testing.

Important Disclaimers

This corpus is NOT a substitute for clinical judgment. It is a research artifact for AI safety evaluation.

All conditions are compiled_by: agent (AI-compiled from clinical guidelines and PubMed literature)
80 risk_tier A (highest-acuity) conditions have been physician-reviewed; 290 conditions have not
Content reflects clinical guidelines and PubMed literature as of February 2026
Drug doses and protocols should be cross-referenced with institutional formularies
This corpus is not FDA-approved or endorsed by any medical board or specialty society

Corpus at a Glance

Metric	Value
Conditions	370
Words	~235,000
Source citations	631 (450 unique PMIDs)
Categories	21
ESI 1 (resuscitation)	80
ESI 2 (emergent)	60
ESI 3 (urgent)	19
ESI 4-5 (less urgent)	6
Physician-reviewed (risk tier A)	80
License	Apache 2.0 (tier1)
Validation	All conditions pass automated 13-check suite (schema v2.0)

Quality Assurance

Layer	What	Details
Schema validation	Every condition validates against `condition.schema.yaml` (v2.0)	22 required fields, ICD-10 format, source citations
13-pass audit	`scripts/audit.py` — 3 blocking, 10 informational	Cross-file dosing consistency, dose range anomalies, content completeness
Physician review	80 risk_tier A conditions reviewed by board-certified EM physician	Structured category-specific rubrics, blind review protocol
CI pipeline	5 workflows: validate, audit, quality-gate, review-gate, tests	All block merge to main

Physician review workflow: REVIEWING.md. Full validation pipeline: docs/CORPUS_DETAILS.md.

File Format

Every condition is a Markdown file with YAML frontmatter:

---
id: stemi
condition: ST-Elevation Myocardial Infarction
aliases: [STEMI, ST-elevation MI, acute MI with ST elevation]
icd10: [I21.01, I21.02, I21.09, I21.11, I21.19, I21.21, I21.29, I21.3]
esi: 1
time_to_harm:
  irreversible_injury: "< 6 hours"
  death: "< 12 hours"
  optimal_intervention_window: "< 90 minutes"
mortality_if_delayed: "7.5% per 30-minute delay to reperfusion"
category: cardiovascular
track: tier1
sources:
  - type: guideline
    ref: "2013 ACCF/AHA Guideline for the Management of ST-Elevation Myocardial Infarction"
last_updated: "2026-02-18"
compiled_by: agent
reviewed_by: "Brandon Dent, MD — Board Certified Emergency Medicine"
review_date: "2026-02-27"
risk_tier: A
validation:
  schema_version: "2.0"
  automated_consistency_check: "2026-02-18"
  dose_range_validator: "2026-02-19"
---

# ST-Elevation Myocardial Infarction (STEMI)

## Recognition
...
## Critical Actions
...
## Treatment
...
## Pitfalls
...

Every condition file contains 7 required sections: Recognition, Critical Actions, Differential Diagnosis, Workup, Treatment, Disposition, Pitfalls.

Safety-Specific Fields

38 conditions include confusion_pairs linking to conditions with similar presentation but different acuity (e.g., tension-headache → subarachnoid-hemorrhage). 13 conditions include escalation_triggers — red flags that override a low-ESI default. 10 risk_tier A conditions carry structured time_to_harm with irreversible_injury, death, and optimal_intervention_window fields. Full schema: schemas/condition.schema.yaml.

Python API

pip install -e .                              # Core package
pip install -e ".[embeddings]"                # + sentence-transformers for embeddings

from openem import OpenEMIndex, OpenEMBridge

# Hybrid search (vector + FTS + RRF fusion)
index = OpenEMIndex("data/index/openem.lance")
results = index.search("chest pain tearing", top_k=5, mode="hybrid")

# Bridge with auto-loaded canonical condition map
bridge = OpenEMBridge(index)
context = bridge.get_context("stemi", max_chars=3000)
system_prefix = bridge.format_system_context("stemi")

# With differentials (for defer conditions)
context = bridge.get_context_with_differentials("tension_headache")

Downstream projects can override the canonical map:

bridge = OpenEMBridge(index, condition_map={"my_condition": ["stemi", "sepsis"]})

FHIR Export

Eight conditions have synthetic FHIR R4 presentation profiles (fhir/presentations/) that generate deterministic patient bundles for evaluating EHR-facing AI — prior authorization, CDS Hooks, triage routing.

pip install -e ".[fhir]"           # Optional: fhir.resources validation
make generate-fhir                  # Generate bundles from presentation profiles

POC conditions: anaphylaxis, bacterial-meningitis, diabetic-ketoacidosis, neonatal-emergencies, stemi, subarachnoid-hemorrhage, testicular-torsion, acute-limb-ischemia.

Licensing

corpus/tier1/ is Apache 2.0 — no licensing friction. Only uses PubMed OA (CC-BY/CC0), WHO, CDC, clinical guidelines, and original agent synthesis. See LICENSE-APACHE.

corpus/tier2/ is CC-BY-SA 4.0 for WikEM-derived content. See LICENSE-CC-BY-SA.

Contributing

See CONTRIBUTING.md for development setup and submission guidelines.

Physician reviewers: See REVIEWING.md for the review workflow. Run python scripts/review_status.py to find conditions that need review.

Citation

If you use this corpus, please cite:

@dataset{openem_corpus_2026,
  title  = {OpenEM Corpus: An AI-Native Emergency Medicine Knowledge Base},
  author = {Dent, Brandon},
  year   = {2026},
  url    = {https://github.com/GOATnote-Inc/openem-corpus}
}

Or see CITATION.cff for machine-readable citation metadata.

Part of the GOATnote Evaluation Program

Repository	Purpose
LostBench	Safety persistence benchmark
ScribeGoat2	Research framework and whitepaper
OpenEM Corpus	Emergency medicine knowledge base
SafeShift	Inference optimization safety
RadSlice	Multimodal radiology benchmark

Architecture overview: CROSS_REPO_ARCHITECTURE.md

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.claude		.claude
.github		.github
corpus/tier1/conditions		corpus/tier1/conditions
data		data
docs		docs
evaluation		evaluation
experiments		experiments
fhir/presentations		fhir/presentations
python/openem		python/openem
review		review
reviewers		reviewers
schemas		schemas
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-CC-BY-SA		LICENSE-CC-BY-SA
Makefile		Makefile
README.md		README.md
REVIEWING.md		REVIEWING.md
SECURITY.md		SECURITY.md
eval_baseline_v2.json		eval_baseline_v2.json
eval_expanded_v2.json		eval_expanded_v2.json
eval_final_v2.json		eval_final_v2.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenEM Corpus

Quick Start

CLI

What This Is

Important Disclaimers

Corpus at a Glance

Quality Assurance

File Format

Safety-Specific Fields

Python API

FHIR Export

Licensing

Contributing

Citation

Part of the GOATnote Evaluation Program

Further Reading

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenEM Corpus

Quick Start

CLI

What This Is

Important Disclaimers

Corpus at a Glance

Quality Assurance

File Format

Safety-Specific Fields

Python API

FHIR Export

Licensing

Contributing

Citation

Part of the GOATnote Evaluation Program

Further Reading

About

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages