feat: implement Master Incident Data Lake - Supports File Once, Report Everywhere (along with Report Once , File Everywhere)#385
Open
utkarshqz wants to merge 6 commits intofireform-core:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR implements the Master Incident Data Lake — a foundational architectural re-design that transforms FireForm from a single-shot form filler into a persistent, intelligent incident intelligence system.
The core problem it solves: Every time a transcript was processed, FireForm extracted data against one specific PDF template, filled it, and silently discarded everything else. Any spoken detail that didn't match a field in that particular template was permanently lost. Officers were forced to re-dictate information for every agency form they needed to fill.
This PR introduces a clean separation of two concerns that were previously conflated:
The result delivers two complementary paradigms:
Because all incident data is stored as structured JSON in a queryable database, the Data Lake also acts as a live intelligence backbone:
Fixes #384
🎯 Overview
🚀 Key Changes
1.
IncidentMasterDataModel (api/db/models.py)Introduced a new SQLModel table that acts as the central Data Lake record:
Fully decoupled from
Template— the Data Lake record exists independently of any PDF.2. Collaborative Consensus Merge (
api/db/repositories.py)The most critical new capability: multiple officers can submit reports for the same
incident_idover time, and the system intelligently merges them.nullfor field that already has dataNotesorDescription[UPDATE: timestamp]tagThis prevents the most dangerous failure mode in multi-officer reporting: a partial or hallucinated LLM response silently overwriting real, validated data.
3. Incident Endpoints (
api/routes/incidents.py)POST /incidents/extractAccepts a raw transcript and optional
incident_id. Runs the LLM extraction batch and either creates a new Data Lake record or merges into an existing one.POST /incidents/{incident_id}/generate/{template_id}Generates a filled PDF for any registered agency template from the stored Data Lake — with zero new LLM calls. One incident. Any number of templates.
GET /incidents/{incident_id}Returns the full raw master JSON for any stored incident — useful for debugging, auditing, or downstream integrations.
GET /incidentsLists all stored incidents.
4. Schema-less LLM Extraction (
src/llm.py)The extraction prompt was updated to operate in two modes:
The Python-level field filter that previously stripped any key not present in the active template schema has been removed. All extracted keys are accepted and stored.
5. Frontend Integration (
frontend/index.html)6. Test Coverage (
tests/test_incidents.py)13 new tests added, covering:
Unit tests (no Ollama needed):
get_incidentretrieves the correct record by ID; returnsNonefor unknown IDs.nullvalues do not overwrite existing valid data.Notesfields append with[UPDATE]tags.Integration tests (LLM mocked):
POST /incidents/extract→ creates new incident, returns"status": "created".POST /incidents/extract(same ID) → merges, returns"status": "merged".GET /incidents/{id}→ returns stored master JSON.GET /incidents/{id}(unknown) →404.POST /incidents/{id}/generate/{template_id}(unknown incident) →404.POST /incidents/{id}/generate/{template_id}(unknown template) →404.GET /incidents→ returns list of all stored incidents.python -m pytest tests/test_incidents.py -v # 13 passed, 0 failed7. Documentation (
docs/SETUP.md)A full 🗄️ Master Incident Data Lake section was added to
SETUP.md, covering:OLLAMA_TIMEOUT)🛠 Technical Highlights
httpx.AsyncClientwith a configurableOLLAMA_TIMEOUT(default: 300s) to prevent event-loop blocking on high-latency local LLM hardware.format: jsonenforced: All Ollama API payloads pass"format": "json"to force strict JSON output, eliminating parse failures from verbose LLM responses.transcript_text, creating an immutable history of all inputs for legal/compliance purposes.Type of change
/formsendpoints are unaffected)How Has This Been Tested?
Automated (13 tests, no Ollama required):
Manual end-to-end verification:
ollama serveuvicorn api.main:app --reloadPOST /templates/uploadPOST /incidents/extract?input_text=<text>incident_idincident_id→ verify"status": "merged"GET /incidents/{incident_id}→ inspect full master JSONPOST /incidents/{incident_id}/generate/{template_id}→ download filled PDFTest Configuration:
mistrallocallyChecklist: