Skip to content

[FEAT]: Department Profile System for Pre-Mapped PDF Templates #206

@utkarshqz

Description

@utkarshqz

📝 Description

FireForm currently extracts PDF field names as machine-generated identifiers (e.g. textbox_0_0, textbox_0_1). When these are sent to Mistral for extraction, the model has no semantic context and either returns null for all fields or hallucinates a single value repeated across unrelated fields (see related Bug #173).

A Department Profile system would ship pre-built mappings between human-readable field labels and the internal PDF field identifiers for common agency forms used by Fire Departments, Police, and EMS.

💡 Rationale

FireForm's mission is to serve real first responders out of the box. Currently:

  • A firefighter uploads a Cal Fire incident form
  • Mistral receives {"textbox_0_0": "", "textbox_0_1": ""}
  • It has no idea what these fields mean → returns null or wrong values
  • The filled PDF is blank or incorrect

With department profiles:

  • The profile provides {"Officer Name": "textbox_0_0", "Incident Location": "textbox_0_1"}
  • Mistral receives human-readable labels → extracts correctly
  • The filled PDF is accurate

This solves the root cause of Issue #173 without requiring changes to the LLM pipeline.

🛠️ Proposed Solution

  • Create src/profiles/ directory with JSON profile files
  • Each profile maps human-readable field labels → internal PDF field IDs
  • Add profile selector to the frontend UI (dropdown by department type)
  • Pass field label mapping to LLM prompt during extraction

Profile schema:

{
  "department": "Fire Department",
  "description": "Standard Cal Fire incident report",
  "fields": {
    "Officer Name": "textbox_0_0",
    "Badge Number": "textbox_0_1",
    "Incident Location": "textbox_0_2",
    "Incident Date": "textbox_0_3",
    "Number of Victims": "textbox_0_4"
  },
  "example_transcript": "Officer Smith, badge 4421, responding to structure fire at 742 Evergreen Terrace on March 8th. Two victims on scene."
}

Profiles to implement:

  • fire_department.json — Cal Fire incident report
  • police_report.json — Standard police incident form
  • ems_medical.json — EMS patient care report
  • Logic change in src/llm.py to use profile labels in prompt
  • Frontend dropdown to select department profile

✅ Acceptance Criteria

  • At least 3 department profiles ship with the repo
  • Profile labels are injected into the Mistral prompt
  • Extraction accuracy improves for pre-mapped forms (no null output)
  • Feature works in Docker container
  • Documentation updated in docs/
  • JSON output validates against the schema

📌 Additional Context

Related bugs this directly addresses: #173 (PDF filler hallucinates repeating values)

Related features this complements: #111 (Field Mapping Wizard — for custom PDFs not covered by profiles)

This is especially important for FireForm's stated mission as a UN Digital Public Good — the system should work correctly for real first responders without requiring technical setup.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions