Skip to content

feat: Dynamic AI Semantic Mapper — Universal Schema-less PDF Generation from Data Lake#386

Open
utkarshqz wants to merge 7 commits intofireform-core:mainfrom
utkarshqz:feat/ai-semantic-mapper
Open

feat: Dynamic AI Semantic Mapper — Universal Schema-less PDF Generation from Data Lake#386
utkarshqz wants to merge 7 commits intofireform-core:mainfrom
utkarshqz:feat/ai-semantic-mapper

Conversation

@utkarshqz
Copy link
Copy Markdown

Description

Building directly on the Master Incident Data Lake (PR #385 ), this PR introduces the Dynamic AI Semantic Mapper — the intelligent translation layer that makes FireForm truly universal.

The problem this solves: The Data Lake captures all spoken intelligence with dynamically invented keys. A transcript about "Jack Portman" stores "Speaker": "Jack Portman". But a Fire Department PDF demands "FullName". A Police form demands "OfficerNamePrint". An EMS record demands "RespondingOfficer".

Standard Python dictionary matching silently drops all three — zero fields filled. This PR eliminates that failure mode entirely, for any PDF, from any agency, forever.

"Record Once. Fill Any Form. Anywhere."
One unstructured Data Lake record. Mistral understands that "Speaker" means "FullName". Zero if/else chains. Zero per-template hardcoding. Ever.

Fixes #206


🎯 Overview

Without Semantic Mapper (fragile exact-match):
  Data Lake: { "Speaker": "Jack", "Identity": "EMP-001" }
  PDF wants: { "FullName": "", "BadgeNumber": "" }
  Result:    { }  ← silent failure, zero fields filled ❌

With AI Semantic Mapper (intelligent translation):
  Data Lake: { "Speaker": "Jack", "Identity": "EMP-001" }
             ↓ Mistral understands semantics ↓
  PDF gets:  { "FullName": "Jack", "BadgeNumber": "EMP-001" } ✅

  Fire Dept PDF  → "FullName"       ← Mistral maps "Speaker"
  Police PDF     → "OfficerPrint"   ← Mistral maps "Speaker"
  EMS PDF        → "Responder"      ← Mistral maps "Speaker"
  All from the same single Data Lake record. Zero new extractions.

🚀 Key Changes

1. async_semantic_map (src/llm.py)

A new @staticmethod async method — the core of this PR.

At PDF-generation time it receives:

  • The full Data Lake JSON (unstructured, arbitrary keys)
  • The target PDF's field name list (rigid, exact strings)

It sends Mistral a precision-engineered prompt:

"Map the available incident data into these specific PDF fields. Look for semantic synonyms — if the target is FullName, look for Speaker, ApplicantName, Officer, Applicant, etc. in the available data."

Mistral returns a perfectly keyed JSON object — keys match the PDF exactly. src/filler.py receives this and fills the form without a single string comparison written by hand.

format: json is enforced on the Ollama payload to guarantee valid JSON output and prevent parse failures from verbose LLM responses.


2. Schema-less Extraction Upgrade (src/llm.py)

The extraction prompt now operates in two modes:

Template-guided + dynamic: When templates exist, Mistral maps the known fields and invents additional descriptive keys for any other critical details in the transcript ("VictimInjury", "WeaponType", "SuspectVehicle").

Pure schema-less: When no template is uploaded at all, Mistral invents every key from scratch:

if not self._target_fields:
    # PURE SCHEMA-LESS: No templates — fully ad-hoc extraction
    prompt = "Extract every meaningful piece of information... invent descriptive JSON keys..."

This means FireForm can capture intelligence even before the relevant PDF template is registered.


3. Dynamic Generate Endpoint (api/routes/incidents.py)

The POST /incidents/{incident_id}/generate/{template_id} endpoint is upgraded to async and now calls the Semantic Mapper before every PDF fill:

# THE MAGIC BRIDGE
mapped_data = await LLM.async_semantic_map(
    master_json=master_data,
    target_pdf_fields=tpl_fields
)

Two-layer resilience fallback — PDF is ALWAYS generated:

Scenario Behaviour
Mapper succeeds PDF filled via AI semantic understanding
Mapper returns {} Falls back to exact-string matching from Data Lake
Mapper raises exception (timeout/crash) Falls back to exact-string matching from Data Lake

No LLM failure can produce a 500 error on PDF generation.


4. Test Coverage (tests/test_semantic_mapper.py)

10 new tests added — all Ollama calls mocked, no running instance needed:

Unit tests (async_semantic_map):

  • ✅ Correctly maps exact-match keys
  • ✅ Resolves synonym mismatches ("Speaker""FullName") — the core innovation
  • ✅ Returns {} gracefully on LLM connection failure
  • ✅ Handles empty Data Lake JSON
  • ✅ Handles invalid/non-JSON LLM response

Integration tests (generate endpoint):

  • ✅ Uses Semantic Mapper output to fill PDF
  • ✅ Fallback triggers correctly when mapper returns {}
  • ✅ Fallback triggers correctly when mapper raises exception
  • 404 for missing incident (unaffected by Mapper)
  • 404 for missing template (unaffected by Mapper)
python -m pytest tests/test_semantic_mapper.py -v
# 10 passed in 0.Xs

5. Documentation (docs/SETUP.md)

A full 🧠 Dynamic AI Semantic Mapper section added, covering:

  • The problem (synonym mismatch, silent field drops)
  • Architecture diagram (Data Lake → Mistral → PDF)
  • Resilience & fallback table
  • Pure schema-less mode explanation
  • Test running instructions
  • Environment variable reference

🛠 Technical Highlights

  • Zero hardcoding: No if/else chains mapping field names anywhere in the codebase. All translation is delegated entirely to Mistral's linguistic understanding.
  • Truly universal: Any user, any department, any PDF uploaded anywhere — the Semantic Mapper handles the translation automatically with no human intervention.
  • Fully async: httpx.AsyncClient used throughout — no event-loop blocking on slow local hardware.
  • format: json enforced: Eliminates unparsable LLM responses from the mapper call.
  • Graceful degradation: Two-layer fallback guarantees PDF generation even during complete LLM outages.

🔬 Live Demonstration — Collaborative Consensus Engine + Semantic Mapper

This demonstrates two features working together in a real run:

  1. AI Semantic Mapper — correctly bridges SpeakerFullName and fills all 7/8 fields
  2. Collaborative Consensus Engine — Officer 2 updates the name; all other fields remain protected

Before: First Officer Report (Jack Portman)

📋 Server Log
[SEMANTIC MAPPER] Successfully mapped 7 out of 8 required PDF fields.
[DATA LAKE] Template needs 8 fields, Semantic Mapper produced 8 fields
[log extracted successfully] Found 8 fields mapped from Data Lake.
  [FILLER] Filling 'FullName'   = Jack Portman                                        → Jack Portman ✓
  [FILLER] Filling 'ID'         = EMP12388                                             → EMP12388 ✓
  [FILLER] Filling 'Gender'     = Male                                                 → /0 ✓
  [FILLER] Filling 'Married'    = Yes                                                  → /Yes ✓
  [FILLER] Filling 'City'       = Mumbai                                               → Mumbai ✓
  [FILLER] Filling 'Language'   = English                                              → English ✓
  [FILLER] Filling 'Notes'      = This is a test note using ai in extraction and mapping → This is a test note... ✓
Screenshot 2026-03-29 181355

After: Second Officer Corrects Name (Portman Issac) — Same Incident ID

Screenshot 2026-03-29 182927
📋 Server Log
[SEMANTIC MAPPER] Successfully mapped 7 out of 8 required PDF fields.
[DATA LAKE] Template needs 8 fields, Semantic Mapper produced 8 fields
[log extracted successfully] Found 8 fields mapped from Data Lake.
  [FILLER] Filling 'FullName'   = Portman Issac                                        → Portman Issac ✓  ← UPDATED
  [FILLER] Filling 'ID'         = EMP12388                                             → EMP12388 ✓       ← PROTECTED
  [FILLER] Filling 'Gender'     = Male                                                 → /0 ✓             ← PROTECTED
  [FILLER] Filling 'Married'    = Yes                                                  → /Yes ✓           ← PROTECTED
  [FILLER] Filling 'City'       = Mumbai                                               → Mumbai ✓         ← PROTECTED
  [FILLER] Filling 'Language'   = English                                              → English ✓        ← PROTECTED
  [FILLER] Filling 'Notes'      = This is a test note using ai in extraction and mapping → This is a test note... ✓
Screenshot 2026-03-29 185934

What this proves: The Collaborative Consensus Engine correctly updated only FullName while protecting all other fields. The Semantic Mapper successfully bridged unstructured Data Lake keys to the PDF's required field names — with zero hardcoded mapping logic.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Automated (10 tests, no Ollama required):

python -m pytest tests/test_semantic_mapper.py -v
ai_mapper_testcases

Manual end-to-end verification:

  1. Dictate transcript with mismatched field names (e.g. "I am Jack Portman")
  2. Note Incident ID returned from POST /incidents/extract
  3. Upload a PDF whose fields use different names (e.g. FullName, BadgeNumber)
  4. POST /incidents/{id}/generate/{template_id}
  5. Observe console: [SEMANTIC MAPPER] Mapping N lake fields to N PDF fields...
  6. Download PDF — FullName filled correctly despite Data Lake storing Speaker

Test Configuration:

  • Python 3.11+
  • Ollama running mistral (for manual verification)
  • OLLAMA_TIMEOUT=300 recommended for local hardware
  • SQLite (default) or PostgreSQL

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Department Profile System for Pre-Mapped PDF Templates

1 participant