feat: Dynamic AI Semantic Mapper — Universal Schema-less PDF Generation from Data Lake#386
Open
utkarshqz wants to merge 7 commits intofireform-core:mainfrom
Open
feat: Dynamic AI Semantic Mapper — Universal Schema-less PDF Generation from Data Lake#386utkarshqz wants to merge 7 commits intofireform-core:mainfrom
utkarshqz wants to merge 7 commits intofireform-core:mainfrom
Conversation
This was referenced Mar 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Building directly on the Master Incident Data Lake (PR #385 ), this PR introduces the Dynamic AI Semantic Mapper — the intelligent translation layer that makes FireForm truly universal.
The problem this solves: The Data Lake captures all spoken intelligence with dynamically invented keys. A transcript about "Jack Portman" stores
"Speaker": "Jack Portman". But a Fire Department PDF demands"FullName". A Police form demands"OfficerNamePrint". An EMS record demands"RespondingOfficer".Standard Python dictionary matching silently drops all three — zero fields filled. This PR eliminates that failure mode entirely, for any PDF, from any agency, forever.
Fixes #206
🎯 Overview
🚀 Key Changes
1.
async_semantic_map(src/llm.py)A new
@staticmethod asyncmethod — the core of this PR.At PDF-generation time it receives:
It sends Mistral a precision-engineered prompt:
Mistral returns a perfectly keyed JSON object — keys match the PDF exactly.
src/filler.pyreceives this and fills the form without a single string comparison written by hand.format: jsonis enforced on the Ollama payload to guarantee valid JSON output and prevent parse failures from verbose LLM responses.2. Schema-less Extraction Upgrade (
src/llm.py)The extraction prompt now operates in two modes:
Template-guided + dynamic: When templates exist, Mistral maps the known fields and invents additional descriptive keys for any other critical details in the transcript (
"VictimInjury","WeaponType","SuspectVehicle").Pure schema-less: When no template is uploaded at all, Mistral invents every key from scratch:
This means FireForm can capture intelligence even before the relevant PDF template is registered.
3. Dynamic Generate Endpoint (
api/routes/incidents.py)The
POST /incidents/{incident_id}/generate/{template_id}endpoint is upgraded toasyncand now calls the Semantic Mapper before every PDF fill:Two-layer resilience fallback — PDF is ALWAYS generated:
{}No LLM failure can produce a
500error on PDF generation.4. Test Coverage (
tests/test_semantic_mapper.py)10 new tests added — all Ollama calls mocked, no running instance needed:
Unit tests (
async_semantic_map):"Speaker"→"FullName") — the core innovation{}gracefully on LLM connection failureIntegration tests (generate endpoint):
{}404for missing incident (unaffected by Mapper)404for missing template (unaffected by Mapper)python -m pytest tests/test_semantic_mapper.py -v # 10 passed in 0.Xs5. Documentation (
docs/SETUP.md)A full 🧠 Dynamic AI Semantic Mapper section added, covering:
🛠 Technical Highlights
if/elsechains mapping field names anywhere in the codebase. All translation is delegated entirely to Mistral's linguistic understanding.httpx.AsyncClientused throughout — no event-loop blocking on slow local hardware.format: jsonenforced: Eliminates unparsable LLM responses from the mapper call.🔬 Live Demonstration — Collaborative Consensus Engine + Semantic Mapper
This demonstrates two features working together in a real run:
Speaker→FullNameand fills all 7/8 fieldsBefore: First Officer Report (Jack Portman)
📋 Server Log
After: Second Officer Corrects Name (Portman Issac) — Same Incident ID
📋 Server Log
Type of change
How Has This Been Tested?
Automated (10 tests, no Ollama required):
Manual end-to-end verification:
Incident IDreturned fromPOST /incidents/extractFullName,BadgeNumber)POST /incidents/{id}/generate/{template_id}[SEMANTIC MAPPER] Mapping N lake fields to N PDF fields...FullNamefilled correctly despite Data Lake storingSpeakerTest Configuration:
mistral(for manual verification)OLLAMA_TIMEOUT=300recommended for local hardwareChecklist: