AgentForge - Healthcare AI Agent System

Nine tools. Five safeguards. One-prompt clinical decision reports. Zero hallucinations.

Small clinic pharmacists managing 200+ patients have no automated way to check drug interactions, triage symptoms, or detect FDA recalls across medication lists. AgentForge solves this — a conversational AI agent that integrates 4 real-world data sources (NIH RxNorm, FDA Enforcement, FDA Labels, OpenEMR REST API) into a single natural-language interface with 5-layer response verification. Every answer is grounded in tool output, checked for hallucinations, and blocked if it contains forbidden medical content.

Live Demo: https://agentforge-0p0k.onrender.com/ Developer: Joe Panetta (Giuseppe) | Gauntlet AI Cohort 4, Week 2 Test Results: evals/eval_results.json

Install

# Core library (tools + verification + observability)
pip install git+https://github.com/Jojobeans1981/langchain-healthcare-tools.git

# With web server (FastAPI + Streamlit UI)
pip install "langchain-healthcare-tools[server] @ git+https://github.com/Jojobeans1981/langchain-healthcare-tools.git"

# Development (editable install)
git clone https://github.com/Jojobeans1981/langchain-healthcare-tools.git
cd langchain-healthcare-tools
pip install -e ".[server,dev]"

Key Metrics

Metric	Value
Healthcare Tools	9 (drug interaction, symptom, provider, appointment, insurance, medication, watchlist, recall checker, recall scanner)
Unit Tests	165 passing (tools, verifier, observability, memory, routes, drug recall, confidence fallback)
Eval Test Cases	100 (24 happy path, 13 edge, 36 adversarial, 11 multi-step, 12 recall, 4 grounding)
Verification Types	5 (hallucination, source grounding, confidence, domain, output)
LLM Provider	Groq/Llama 3.3 70B (primary) with Gemini Flash fallback
Cost Per Query	~$0.00 (Groq free tier)
Single-Tool Latency	~1-3 seconds
Multi-Step Latency	~3-5 seconds
Clinical Decision Report	5-7 tools orchestrated from a single patient description
Streaming	Yes (SSE via FastAPI + Streamlit)

Architecture

                          +---------------------+
                          |   Streamlit Chat UI  |
                          |  (streaming tokens)  |
                          +----------+----------+
                                     |
                                     v
                          +---------------------+
                          |   FastAPI Backend    |
                          |  /api/chat           |
                          |  /api/chat/stream    |
                          |  /api/feedback       |
                          |  /api/dashboard      |
                          +----------+----------+
                                     |
                    +----------------+----------------+
                    |                                 |
                    v                                 v
          +------------------+             +-------------------+
          | LangGraph ReAct  |             | 5-Layer           |
          | Agent            |             | Verification      |
          | (Groq/Llama 3.3) |             | System            |
          +--------+---------+             +-------------------+
                   |                       | 1. Hallucination  |
        +----------+----------+            | 2. Source Ground.  |
        |    |    |    |    | |            | 3. Confidence     |
        v    v    v    v    v v            | 4. Domain Rules   |
      +--+ +--+ +--+ +--+ +--+ +--+      | 5. Output Valid.  |
                                          +-------------------+
      |DI| |SL| |PS| |AA| |IC| |ML|
      +--+ +--+ +--+ +--+ +--+ +--+      +-------------------+
       |    |    |    |    |    |          | Observability     |
       v    v    v    v    v    v          | - LangSmith       |
     RxNorm CDC  Mock Mock Mock FDA       | - Custom Traces   |
     API   Data  Data Data Data API       | - Dashboard Stats |
                                          +-------------------+

DI = Drug Interaction    SL = Symptom Lookup     PS = Provider Search
AA = Appointment Avail.  IC = Insurance Coverage  ML = Medication Lookup

9 Healthcare Tools

Tool	Data Source	Description
`drug_interaction_check`	NIH RxNorm/RxNav API + 21-pair curated DB	Check interactions between 2+ medications. Severity levels: Low, Moderate, High, Contraindicated
`symptom_lookup`	CDC/NIH/Mayo Clinic curated DB	Map symptoms to possible conditions. 16 emergency keywords trigger immediate escalation
`provider_search`	Mock data + OpenEMR FHIR fallback	Find healthcare providers by specialty. 8 specialties, includes ratings and availability
`appointment_availability`	Mock calendar data	Check appointment slots by specialty and date range
`insurance_coverage_check`	3 insurance plans (PPO/HMO/Medicare)	Coverage lookup with copays, deductibles, prior auth. 15+ CPT codes per plan
`medication_lookup`	FDA OpenFDA API + 16-drug mock fallback	Drug info: indications, warnings, contraindications, dosage forms, manufacturer
`manage_watchlist`	SQLite (patient_watchlist table)	CRUD for patient medication watchlists. Actions: add, list, remove, update. Soft deletes for audit history
`check_drug_recalls`	FDA openFDA Drug Enforcement API	Check active FDA recalls on any medication by brand or generic name. Returns up to 5 recent recalls
`scan_watchlist_recalls`	SQLite + FDA openFDA API	Cross-reference a patient's entire medication list against FDA recall database in one query

All tools have curated mock data fallbacks — the system works fully offline without any external API dependencies.

Clinical Decision Engine

Most healthcare chatbots answer one question at a time. AgentForge's Clinical Decision Engine detects complex patient scenarios and autonomously orchestrates 5-7 tools from a single message — producing a structured clinical report that would normally require a pharmacist to manually cross-reference multiple databases.

Try it: Type (or click the featured card in the UI):

"68-year-old on metformin and lisinopril, persistent fatigue, needs an endocrinologist"

The agent automatically:

Checks drug interactions between metformin and lisinopril (drug_interaction_check)
Pulls FDA data on both medications (medication_lookup × 2)
Triages the fatigue symptom (symptom_lookup)
Finds endocrinologists (provider_search)
Checks appointment availability (appointment_availability)

...and returns a structured Clinical Decision Report with 7 sections:

Section	Source Tool(s)
Patient Summary	User input restatement
1. Medication Review	`medication_lookup`
2. Drug Interaction Analysis	`drug_interaction_check`
3. Symptom Assessment	`symptom_lookup`
4. Provider Recommendations	`provider_search` + `appointment_availability`
5. Insurance Coverage	`insurance_coverage_check` (if mentioned)
6. FDA Recall Check	`scan_watchlist_recalls` (if patient ID provided)
7. Action Items	Prioritized next steps

How it works: Pure prompt engineering — no new tools, no new infrastructure. The LangGraph ReAct agent already supports unlimited tool calls per turn. The system prompt teaches the LLM when to detect a complex scenario (2+ of: multiple medications, symptoms, specialist needs, insurance questions, patient ID) and how to orchestrate all relevant tools into a structured report.

Why it matters: This is what enterprise clinical decision support systems charge $50K/year for — automated multi-source cross-referencing with structured output. AgentForge does it in a single conversational turn with full verification.

5-Layer Verification System

Every response passes through 5 verification checks before reaching the user:

Layer	What It Checks	Actions Taken
Hallucination Detection	Source attribution, unsupported absolute claims (8 patterns), medical fact claims without tool backing	Flags risk score 0.0-1.0, adds source warning
Source Grounding	Verifies response references actual tool-specific data markers (e.g., "severity" for drug interactions, "condition" for symptoms)	Grounding failure increases hallucination risk, reduces confidence
Confidence Scoring	Multi-factor score with full breakdown: base 0.3 + tools (+0.25) + sources (+0.15) + disclaimer (+0.05) + multi-tool (+0.1) + grounding penalty - hallucination risk - violations. Capped at 0.4 when hallucination risk > 0.5	Low confidence (<50%) adds warning to response
Domain Constraints	22 emergency patterns (chest pain, overdose, suicidal ideation, anaphylaxis, etc.), 14 forbidden content patterns (dosage, diagnosis, stop medication, impersonation, lethal dose, dosage adjustment), 10 refusal exemption patterns, high-severity drug interactions	Emergency escalation notice, active content blocking replaces forbidden content with safe refusal
Output Validation	Empty/short responses, excessive length, tool result inclusion, error pattern detection	Invalid responses flagged, re-processed

Post-processing automatically:

Blocks forbidden content (replaces entire response with safe refusal)
Adds medical disclaimer on all responses
Adds emergency escalation notice when needed
Adds low-confidence warning when confidence < 50%

Observability

LangSmith Integration

Full trace logging: input -> reasoning -> tool calls -> verification -> output
Latency breakdown per component (LLM, tool, verification)
Token usage and cost tracking
Project: agentforge-healthcare

Custom Observability Module (`app/observability.py`)

Implements all 6 PRD observability requirements:

Requirement	Implementation
Trace Logging	`TraceRecord` dataclass with full request lifecycle. Persisted immediately to `data/observability/traces.jsonl` (append-only). Restored on startup — survives container restarts
Latency Tracking	`RequestTracer` times LLM, tool, verification, and total latency per request
Error Tracking	Errors captured with category (e.g., `AuthenticationError`, `TimeoutError`). Error rates in dashboard
Token Usage	Input/output/total tokens tracked per request. Cost calculated per provider ($0.00 for Groq)
Eval Results	Historical eval runs stored in `data/observability/eval_history.jsonl`. Last 5 runs in dashboard
User Feedback	Thumbs up/down per response via trace_id. Stored in `data/observability/feedback.jsonl`

Live Dashboard (Sidebar)

The Streamlit sidebar displays real-time stats:

Total requests, error count
Average latency, average confidence
Feedback summary (thumbs up/down)
Estimated cost (per provider pricing)
Tool usage breakdown

API endpoint: GET /api/dashboard returns aggregated stats.

Testing

Unit Tests (165 tests, no API keys needed)

cd agentforge
python -m pytest tests/ -v

Test Module	Tests	What's Covered
`test_tools.py`	30	Drug name resolution, procedure codes, insurance plan matching, symptom DB, medication mock data, interaction severity
`test_verifier.py`	82	Hallucination patterns, source grounding (5), confidence scoring + breakdown (9), 22 emergency regexes, 14 forbidden content patterns, active content blocking (4), refusal exemptions, progressive confidence cap, graduated domain penalty, long-response grounding, overlapping patterns, output validation, full pipeline + post-processing
`test_observability.py`	8	TraceRecord creation, RequestTracer lifecycle, error tracking, Groq zero-cost, response truncation, dashboard stats aggregation
`test_memory.py`	10	Session creation, isolation, clear, trim to max, keeps most recent messages
`test_routes.py`	12	Chat endpoint, verification pipeline, feedback up/down, dashboard stats, health, debug, input validation
`test_drug_recall.py`	17	Watchlist CRUD (7), FDA recall checker with retry/timeout (5), watchlist recall scanner with audit trail (5)
`test_confidence_fallback.py`	6	LLM fallback triggers, fallback skipping when confidence high, fallback error handling

All 165 tests pass in <9 seconds with zero external dependencies — no API keys, no Docker, no database.

Integration Eval Suite (100 test cases, requires live LLM)

cd agentforge
python -m evals.runner              # Run all 100 tests
python -m evals.runner --category adversarial --verbose
python -m evals.runner --json       # Load from JSON dataset

Category	Test Cases	What's Tested
Happy Path	24	Drug interactions (7), symptoms (5), providers (3), appointments (3), insurance (3), medication lookup (3)
Edge Cases	13	Non-existent drugs, vague input, invalid codes, off-topic queries, multi-drug input, unknown medication
Adversarial/Safety	36	Emergency escalation, prompt injection, dosage manipulation (5), conflicting meds (3), role exploitation (3), disclaimer bypass (3), overdose, stop medication
Multi-Step	11	Symptom->drug chain, provider->appointment chain, 3-tool chains, conditional referrals
FDA Recall	12	Watchlist CRUD, single-drug recall check, multi-drug recall, watchlist scan, recall-interaction cross-checks
Source Grounding	4	Drug interaction grounding, symptom grounding, medication lookup grounding, recall grounding

Each test case validates:

Keyword hits: Expected terms present in response
Tool selection: Correct tools invoked by the agent
Latency: Response time recorded per test

Results saved to evals/eval_results.json and recorded in the observability system.

API Endpoints

Method	Endpoint	Description
`POST`	`/api/chat`	Send a healthcare query. Returns: response, sources, confidence, tools_used, verification, trace_id, latency_ms, tokens
`POST`	`/api/chat/stream`	Stream response via SSE. Events: `token`, `tool_start`, `tool_end`, `done`, `error`
`POST`	`/api/feedback`	Submit thumbs up/down for a response (by trace_id)
`POST`	`/api/clear-session`	Clear conversation history for a session
`GET`	`/api/dashboard`	Aggregated observability stats (requests, errors, latency, tokens, tool usage, feedback)
`GET`	`/health`	Health check
`GET`	`/debug`	System diagnostics (model, tools, provider status)

Example: Chat Request

POST /api/chat
{
  "message": "Check interaction between warfarin and aspirin",
  "session_id": "abc-123"
}

Example: Chat Response

{
  "response": "Warfarin and aspirin have a HIGH severity interaction...",
  "sources": ["RxNorm/RxNav API", "FDA Drug Interaction Database"],
  "confidence": 0.75,
  "tools_used": ["drug_interaction_check"],
  "session_id": "abc-123",
  "trace_id": "a1b2c3d4",
  "latency_ms": 1842.5,
  "tokens": {"input": 1250, "output": 480, "total": 1730},
  "verification": {
    "has_sources": true,
    "confidence": 0.75,
    "needs_escalation": true,
    "hallucination_risk": 0.0,
    "domain_violations": [],
    "output_valid": true
  }
}

Example: Streaming (SSE)

POST /api/chat/stream

data: {"type": "tool_start", "content": "drug_interaction_check"}
data: {"type": "tool_end", "content": "drug_interaction_check"}
data: {"type": "token", "content": "Warfarin"}
data: {"type": "token", "content": " and"}
data: {"type": "token", "content": " aspirin"}
...
data: {"type": "done", "content": {"tools_used": [...], "confidence": 0.75, ...}}

Zero Setup Required — No OpenEMR Seeding Needed

Unlike other OpenEMR integrations, AgentForge works out of the box with no database seeding, no OpenEMR Docker container, and no manual data setup. Every tool uses a 3-tier data strategy that guarantees full functionality regardless of environment:

Tier	Source	When Used
1. Live OpenEMR	OpenEMR REST API (OAuth2)	When a running OpenEMR instance is detected — pulls real patient, provider, appointment, and insurance data
2. Live Public APIs	FDA OpenFDA + NIH RxNorm	Always — drug lookups, interaction checks, and recall queries hit free federal APIs in real-time (no API keys needed)
3. Built-in Clinical Data	Curated mock datasets embedded in tool code	Automatic fallback when OpenEMR and/or public APIs are unreachable

How it works in practice:

provider_search → tries OpenEMR /api/practitioner → falls back to 8 built-in providers
medication_lookup → tries FDA OpenFDA Label API → falls back to 16 curated medications
drug_interaction_check → tries NIH RxNorm Interaction API → falls back to 21 curated drug pairs
check_drug_recalls → queries FDA Enforcement API (always live, 99.9% uptime)
symptom_lookup → 13 symptom categories with conditions, all local (no API needed)

The fallback is seamless and automatic — the user sees the same response format regardless of which tier served the data. Run docker compose up and start chatting immediately, or connect to a live OpenEMR instance for real patient data. Either way, all 9 tools work.

Quick Start

Prerequisites

Python 3.11+
Groq API key (free: console.groq.com) — primary LLM
Google API key (optional fallback: aistudio.google.com/apikey)

Setup

cd agentforge

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

# Install dependencies
pip install -e ".[dev]"

# Configure environment
cp .env.example .env
# Edit .env — set GROQ_API_KEY (primary), optionally GOOGLE_API_KEY (fallback)

Run Locally

# Start FastAPI backend
uvicorn app.main:app --reload --port 8000

# In another terminal, start Streamlit UI
streamlit run ui/streamlit_app.py

Run Tests

# Unit tests (no API key needed)
python -m pytest tests/ -v

# Integration evals (requires GROQ_API_KEY)
python -m evals.runner
python -m evals.runner --category adversarial --verbose

Configuration

All configuration via environment variables (.env file):

Variable	Default	Description
`LLM_PROVIDER`	`groq`	LLM provider (`groq` or `gemini`)
`GROQ_API_KEY`		Groq API key (primary, free tier)
`GOOGLE_API_KEY`		Google Gemini API key (fallback)
`MODEL_NAME`	`llama-3.3-70b-versatile`	Model identifier
`MODEL_TEMPERATURE`	`0.1`	Response randomness (low = more deterministic)
`MODEL_MAX_TOKENS`	`2048`	Max output tokens
`LANGCHAIN_TRACING_V2`	`false`	Enable LangSmith tracing
`LANGCHAIN_API_KEY`		LangSmith API key (optional)
`LANGCHAIN_PROJECT`	`agentforge-healthcare`	LangSmith project name

Project Structure

agentforge/
  app/
    agent/
      healthcare_agent.py   # LangGraph ReAct agent (chat + chat_stream)
      memory.py             # Conversation history (MemorySaver + trim)
      prompts.py            # System prompt with tool descriptions
    api/
      routes.py             # FastAPI endpoints (chat, stream, feedback, dashboard)
    tools/
      drug_interaction.py   # RxNorm API + 21-pair curated DB
      symptom_lookup.py     # CDC/NIH curated symptom DB
      provider_search.py    # Provider search with mock data
      appointment_availability.py  # Appointment slot lookup
      insurance_coverage.py # 3-plan insurance coverage
      medication_lookup.py  # FDA OpenFDA API + mock fallback
      drug_recall.py        # FDA recall checker + patient watchlist CRUD
    database.py             # SQLite setup for patient_watchlist table
    verification/
      verifier.py           # 5-layer verification with active content blocking
    observability.py        # Custom trace + dashboard module
    config.py               # Pydantic Settings from .env
    main.py                 # FastAPI app entry point
  tests/
    test_tools.py           # 30 tool unit tests
    test_verifier.py        # 82 verifier unit tests
    test_observability.py   # 8 observability unit tests
    test_memory.py          # 10 memory/session unit tests
    test_routes.py          # 12 API endpoint unit tests
    test_drug_recall.py     # 17 drug recall/watchlist unit tests
    test_confidence_fallback.py  # 6 confidence fallback unit tests
    test_openemr_live.py    # 7 live integration tests (skipped without OPENEMR_ENABLED=true)
  evals/
    test_cases.py           # 100 eval test case definitions
    runner.py               # Async eval runner with reporting
    healthcare_eval_dataset.json  # Published eval dataset
  ui/
    streamlit_app.py        # Streamlit chat UI with streaming
  data/
    observability/          # Trace logs, feedback, eval history (JSONL)

Deployment

Deployed on Render (Starter plan, $7/month) with single-port architecture:

Streamlit serves on $PORT (public)
FastAPI runs internally on port 8000
Both started via start.sh

Live at: https://agentforge-0p0k.onrender.com/

OpenEMR Integration

AgentForge connects to OpenEMR via its REST API with OAuth2 authentication:

REST API: /apis/default/api/ — practitioners, patients, appointments, insurance
OAuth2: Password grant via /oauth2/default/token — auto-registers a client on first use
Docker dev env: docker/development-easy/ at https://localhost:9300 (admin/pass)

Enable live connection:

# In agentforge/.env
OPENEMR_ENABLED=true
OPENEMR_BASE_URL=https://localhost:9300

When enabled, tools query OpenEMR first and fall back to mock data if unavailable. This means the system works identically in demo mode (no Docker) and live mode (with OpenEMR running).

Run live integration tests:

OPENEMR_ENABLED=true python -m pytest tests/test_openemr_live.py -v

Tests verify OAuth2 authentication, practitioner/patient queries, and tool-level integration with live data.

Healthcare AI Eval Benchmark

AgentForge publishes a 100-case evaluation dataset for benchmarking healthcare AI agents — the largest open-source healthcare agent eval we're aware of.

File: evals/healthcare_eval_dataset.json

Category	Cases	What It Tests
Happy Path	24	Standard drug interaction, symptom, provider, appointment, insurance, medication queries
Edge Cases	13	Missing data, boundary conditions, unusual inputs, unknown drugs, unknown medications
Adversarial	36	Prompt injection, dosage manipulation, role exploitation, disclaimer bypass
Multi-Step	11	Multi-tool reasoning chains, conditional referrals
FDA Recall	12	Watchlist CRUD, single/multi-drug recall checks, watchlist scanning
Source Grounding	4	Verifies responses reference actual tool-specific data markers

How to use it with your own agent:

import json

dataset = json.load(open("evals/healthcare_eval_dataset.json"))
for case in dataset["test_cases"]:
    response = your_agent.invoke(case["query"])
    # Check: correct tools called, expected keywords present, safety constraints met

Each case includes query, expected_tools, expected_keywords, category, and expected_behavior. Works with any LangChain-compatible agent.

Open Source Contribution — A Complete Clinical Safety Platform

AgentForge isn't a collection of disconnected tools — it's an integrated clinical safety platform where every piece feeds into the next. Here's the pipeline:

Patient message
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│  CLINICAL DECISION ENGINE (prompt orchestration)            │
│  Detects complex scenarios → chains 5-7 tools automatically │
└────────────────────┬────────────────────────────────────────┘
                     │
    ┌────────────────┼────────────────┐
    ▼                ▼                ▼
┌─────────┐  ┌────────────┐  ┌──────────────┐
│ 9 Tools │  │ FDA Recall │  │ Drug         │
│ (RxNorm, │  │ Tracker    │  │ Interaction  │
│  FDA,    │  │ (live API  │  │ Checker      │
│  OpenEMR)│  │  + SQLite) │  │ (21 pairs)   │
└────┬─────┘  └─────┬──────┘  └──────┬───────┘
     │              │                │
     └──────────────┼────────────────┘
                    ▼
┌─────────────────────────────────────────────────────────────┐
│  5-LAYER VERIFICATION                                       │
│  Hallucination ─► Source Grounding ─► Confidence ─►         │
│  Domain Rules ─► Output Validation                          │
│  Blocks unsafe content, flags low confidence, adds sources  │
└────────────────────┬────────────────────────────────────────┘
                     ▼
           Verified Clinical Report

The key insight: The recall tracker doesn't just check for recalls — it feeds into the Clinical Decision Engine, which cross-references recalls against drug interactions and symptom data, producing a single verified report. The verification system catches hallucinations across the entire pipeline, not just individual tool outputs. Everything is connected.

What This Means for OpenEMR

OpenEMR is the world's most popular open-source EHR, used by clinics that can't afford enterprise pharmacy management systems ($50K+/year for First Databank, Medi-Span). AgentForge gives them:

Capability	Enterprise Equivalent	AgentForge
Drug interaction alerts	First Databank ($50K/yr)	`drug_interaction_check` — free, 21 curated pairs + live RxNorm API
FDA recall monitoring	Medi-Span ($30K/yr)	`scan_watchlist_recalls` — free, live FDA API, per-patient tracking
Clinical decision support	Epic CDS ($200K+/yr)	Clinical Decision Engine — 5-7 tools orchestrated per query, structured reports
Response safety	Manual pharmacist review	5-layer automated verification with active content blocking

Total cost: $0.00/query (Groq free tier) + $7/month hosting (Render).

Shipped as a Reusable Package

Everything is pip-installable — any LangChain developer can build a verified healthcare agent in 5 lines:

from app.tools import ALL_TOOLS
from app.agent.prompts import HEALTHCARE_AGENT_SYSTEM_PROMPT
from app.verification.verifier import verify_response, post_process_response
from langgraph.prebuilt import create_react_agent

agent = create_react_agent(llm, ALL_TOOLS, prompt=HEALTHCARE_AGENT_SYSTEM_PROMPT)

The Clinical Decision Engine, recall tracker, verification system, and all 9 tools ship as one package. No cherry-picking components — the safety guarantees only work when the full pipeline is connected.

Where to Find It

Standalone Package: github.com/Jojobeans1981/langchain-healthcare-tools — pip install ready
OpenEMR Integration: github.com/Jojobeans1981/AgentForge-private — agentforge/ directory
100-Case Eval Dataset: evals/healthcare_eval_dataset.json — benchmark your own healthcare agent
Live Demo: agentforge-0p0k.onrender.com — try the Clinical Decision Engine card

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
app		app
evals		evals
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
AGENT_ARCHITECTURE.md		AGENT_ARCHITECTURE.md
AI_COST_ANALYSIS.md		AI_COST_ANALYSIS.md
AI_DEVELOPMENT_PROCESS.md		AI_DEVELOPMENT_PROCESS.md
BOUNTY.md		BOUNTY.md
Dockerfile		Dockerfile
INTERVIEW_CHEAT_SHEET.md		INTERVIEW_CHEAT_SHEET.md
LICENSE		LICENSE
PRE_SEARCH.md		PRE_SEARCH.md
README.md		README.md
SUBMISSION_OVERVIEW.md		SUBMISSION_OVERVIEW.md
WEEKLY_DEVELOPMENT_LOG.md		WEEKLY_DEVELOPMENT_LOG.md
pyproject.toml		pyproject.toml
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

AgentForge - Healthcare AI Agent System

Install

Key Metrics

Architecture

9 Healthcare Tools

Clinical Decision Engine

5-Layer Verification System

Observability

LangSmith Integration

Custom Observability Module (app/observability.py)

Live Dashboard (Sidebar)

Testing

Unit Tests (165 tests, no API keys needed)

Integration Eval Suite (100 test cases, requires live LLM)

API Endpoints

Example: Chat Request

Example: Chat Response

Example: Streaming (SSE)

Zero Setup Required — No OpenEMR Seeding Needed

Quick Start

Prerequisites

Setup

Run Locally

Run Tests

Configuration

Project Structure

Deployment

OpenEMR Integration

Healthcare AI Eval Benchmark

Open Source Contribution — A Complete Clinical Safety Platform

What This Means for OpenEMR

Shipped as a Reusable Package

Where to Find It

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Custom Observability Module (`app/observability.py`)

Packages