I build AI systems that verify before they generate.
Applied AI engineer working on LLM systems for healthcare — verification, regulatory review, patient communication. Particularly interested in constraining model behaviour to reduce harm in high-stakes domains (the subject of my recent publication, below). Founder of PharmaTools.AI, a suite of production AI tools used by clinicians, medical writers, and patients.
- Retrieve evidence rather than invent it
- Expose uncertainty rather than conceal it
- Constrain capability where consequences are high
- Help humans audit reasoning, not replace judgement
LLM systems: Claude API · MCP · RAG (Pinecone · FAISS) · multimodal · structured outputs · evals
Application: Node.js · Python · Swift / SwiftUI · Firebase · Postgres
Lamb NJ. Translation, not Interpretation: Rethinking Language Model Design for Healthcare. SN Comprehensive Clinical Medicine. 2026;8:71.
Argues that LLMs in healthcare should be constrained to translational tasks — restructuring information across clinical, scientific, regulatory and patient-facing domains — rather than performing open-ended interpretation. A scoping argument aligned with capability-control approaches to AI safety: narrower model surface, clearer accountability, lower harm ceiling.
Lamb NJ. Validation of an AI-powered mobile application for personalizing medical note explanations. medRxiv, 2025.
A three-phase validation of Patiently AI — computational readability metrics across 210 outputs, expert review by 15 clinicians, and a 54-patient survey — finding 87.3% of outputs rated clinically safe, 70% patient preference over standard notes, and Flesch–Kincaid grade level reduced by ~3. Empirical evidence for the "translation, not interpretation" thesis above, applied in a shipped product.
Patiently AI — Patient Communication
Transforms complex medical notes into clear, patient-friendly language. Approach: constrained simplification to a target audience and reading level, gated so the model stays in translation — restating what the note already says, never crossing into diagnosis or new clinical interpretation.
flowchart TD
A["Medical note<br/>(typed, uploaded, scanned or dictated)"] --> T
U["User preferences<br/>Audience + reading level + language + tone"] --> T
T["AI transformation<br/>Simplifies and explains medical language<br/>while preserving clinical meaning"] --> S
S{"Safety guardrails<br/>No diagnosis, no new clinical facts,<br/>no medical interpretation added"}
S --> O["Clear, audience-appropriate explanation<br/>Same clinical information, easier to understand"]
5× Award Winner — PMEA 2025 (Innovation & Patient Education), Communiqué 2025 Progress Award, HTN AI & Data 2025 (Highly Commended), Best Mobile App Awards.
RefCheckr — Medical Writing
Verifies clinical claims against supporting references for medical writers and MLR reviewers. Approach: the user's draft claim is judged against each reference by an LLM that must cite verbatim passages; a post-hoc integrity check rejects any citation that can't be located in the source PDF (hallucinated quotes get the verdict downgraded). References can be uploaded PDFs (with OCR fallback) or fetched live from PubMed / ClinicalTrials.gov / DailyMed. Output is an annotated PDF with colour-coded highlights.
flowchart TD
A[Draft claim] --> V
B["References — uploads, library,<br/>or live search via PubCrawl MCP"] --> X[PDF text extraction<br/>OCR fallback for scans]
X --> V[Per-reference LLM verification<br/>judges claim against full source text]
V --> O[Verdict + confidence +<br/>quoted passages with PDF locations]
O --> G{Can each quote be matched<br/>in the source PDF?}
G -- no --> D[Downgrade verdict<br/>hallucinated quotes rejected]
G -- yes --> P[Annotated PDF<br/>colour-coded highlights]
D --> P
MedCheckr — Regulatory Compliance
AI-powered regulatory review tool that checks promotional claims against the ABPI Code of Practice with clause-level transparency. Code Clarity Awards Winner, 2024. Approach: RAG over the ABPI Code corpus with Pinecone vector embeddings; every finding cites the specific clause(s) it relies on, so reviewers can audit each decision.
PosterLens — Research
Captures scientific posters and generates instant AI summaries. Presented at ESMO AI & Digital Oncology Congress 2025. Approach: mobile vision capture → multimodal LLM extraction into a structured schema (study design, endpoints, results, limitations).
BiomarkerFinder — Drug Discovery
AI-powered insights into complex biomarker data, explained in plain language. Winner at Open Targets Hackathon. Approach: pulls structured biomarker associations from Open Targets and translates them into plain-language explanations with provenance back to the underlying datasets.
HushMap — Wellbeing
Helps neurodivergent individuals locate sensory-friendly places nearby. Community contributions and Apple Watch support.
PubCrawl — MCP server for biomedical literature
TypeScript MCP server giving LLM clients direct access to PubMed, ClinicalTrials.gov, FDA DailyMed (USPI), and the UK eMC (SmPC) — including a side-by-side US/UK label comparison tool. Built so models retrieve and reason over real biomedical literature rather than rely on parametric memory. Powers retrieval in RefCheckr; published to npm as @pharmatools/pubcrawl, and listed in the official MCP registry.
RSI Loop — Validated self-improving detector
A computer-vision pipeline that detects RSI risk and improves its own detection logic against a benchmark suite — but is gated by a separate regulatory Auditor that rejects mutations producing test-passing but clinically implausible thresholds. A concrete miniature of specification-gaming / reward-hacking mitigation: optimise freely, accept only iterations that are simultaneously accurate and within published clinical norms.
flowchart TD
A["Webcam pose + hand landmarks"] --> D["RSI-risk detector vN"]
D --> E["Score against benchmark suite"]
E --> I["Self-improvement proposes vN+1<br/>new thresholds / logic"]
I --> G{"Auditor: accurate on benchmark<br/>AND within published clinical norms?"}
G -- no --> R["Reject mutation<br/>specification-gaming blocked"]
R --> I
G -- yes --> N["Accept vN+1 as new baseline"]
N --> D
LitRAG — Grounded RAG with a built-in citation-faithfulness eval
A small, readable RAG pipeline over PubMed abstracts that doesn't stop at "it retrieved something and answered" — it checks whether each generated claim is actually supported by its cited source, and flags hallucinated or unsupported ones. A deterministic quote-locator catches fabricated citations for free; an LLM-as-judge then grades support level (supports / partial / contradicts / not-found) from the passage alone. Embedding and retrieval run fully local (Hugging Face sentence-transformers + FAISS — no managed vector-DB key); only generation and the judge call an LLM. It's the citation-faithfulness pattern behind RefCheckr, distilled into an open reference implementation, with its corpus pulled via PubCrawl.
flowchart TD
Q[Question] --> R["Local retrieval<br/>sentence-transformers + FAISS"]
R --> G["LangChain RAG chain<br/>answer + verbatim cited quote per claim"]
G --> L{"Quote locatable in source?<br/>(deterministic — no model call)"}
L -- no --> H["Flag: hallucinated quote"]
L -- yes --> J{"LLM judge: does the passage<br/>support the claim?"}
J -- no --> F["Flag: unsupported / contradicted"]
J -- yes --> S["Grounded ✓"]


