Skip to content

feat: Implement robust report structure management class #401

Open
Calebnzm wants to merge 1 commit intofireform-core:mainfrom
Calebnzm:feat/report-schema-management
Open

feat: Implement robust report structure management class #401
Calebnzm wants to merge 1 commit intofireform-core:mainfrom
Calebnzm:feat/report-schema-management

Conversation

@Calebnzm
Copy link
Copy Markdown

Summary

This PR introduces a Report Schema management system — a new abstraction layer that sits above individual form templates. It allows users to define a canonical schema that aggregates and unifies fields from multiple PDF templates into a single, structured report definition, enabling consistent data extraction and output generation across varied document formats.


Motivation

Previously, templates were managed in isolation. There was no way to:

  • Group related templates that represent the same logical report (e.g. different versions or agency variants of the same document type).
  • Define a canonical, unified field naming convention across templates.
  • Attach metadata constraints — data types, word limits, required flags, allowed values — to guide and validate LLM extraction.
  • Persist the mapping between canonical field names and PDF-specific field names so extraction results can be automatically distributed to the correct fields at fill time.

This PR introduces all of that as a first-class, database-backed system.


Workflow

Create ReportSchema → Attach Template(s) → Configure SchemaFields → Canonize → Fill
  1. Create a Report Schema — A schema has a name, description, and use_case. It represents the logical structure of a report type (e.g. "Accident Report" or "End-of-Month Financial Summary").

  2. Attach Templates — One or more PDF templates are linked via add_template_to_schema. This automatically creates a SchemaField stub for every PDF field in the template, pre-populated with the raw field name and source template reference.

  3. Configure Field Metadata — Each SchemaField is configured via update_schema_field. Users set the description (to guide the LLM), data_type, word_limit, required flag, and allowed_values. Metadata is schema-scoped: the same PDF field in two different schemas can carry different constraints.

  4. Canonization — Fields that represent the same logical concept across templates are assigned a shared canonical_name, forming the unified vocabulary the LLM extracts against.

  5. Mapping Persistenceupdate_template_mapping builds a field_mapping JSON object (canonical_name → PDF field name(s)) and persists it on the junction record, ready for use at fill time.

  6. Multi-Template Fill — The stored mappings route extracted values to the correct fields in each template variant independently, producing consistent output from a single extraction pass.


Key Advantages

Advantage Detail
Template reuse across variants One schema unifies multiple document variants, avoiding duplicate extraction logic
LLM guidance via metadata Field descriptions, word limits, and allowed values constrain extraction without hardcoding rules
Schema-scoped constraints The same PDF field can carry different metadata in different schemas
Stable extraction prompts The LLM works against canonical names, not raw PDF field names — unaffected by template updates
Cascade safety Deleting a schema or removing a template cleans up all associated fields and junction records automatically

Changes

api/db/models.py

  • ReportSchema — New table: name (unique), description, use_case, timestamp.
  • SchemaField — New table: per-field metadata (description, data_type, word_limit, required, allowed_values, canonical_name) scoped to a schema and source template.
  • ReportSchemaTemplate — New junction table linking schemas to templates with a field_mapping JSON column. A UniqueConstraint on (template_id, report_schema_id) prevents duplicate associations.
  • Datatype enumstring, int, date, enum — used to type-annotate fields and drive validation.
  • Template.name — Added unique=True to prevent duplicate template registrations.

api/db/repositories.py

  • Template CRUD completed — Added get_template, update_template, delete_template (with cascade to junction records).
  • FormSubmission CRUD completed — Added get_form, update_form, delete_form.
  • ReportSchema CRUD — Full create / get / list / update / delete. Delete cascades through SchemaFields and junctions.
  • add_template_to_schema — Registers the association and auto-creates SchemaField stubs for all PDF fields in the template.
  • remove_template_from_schema — Removes the junction and all SchemaFields originating from that template.
  • get_schema_fields / get_schema_field / update_schema_field — Schema-scoped field metadata management; update_schema_field validates field ownership before applying changes.
  • update_template_mapping — Groups fields by canonical_name and persists the resulting mapping on the junction row.
  • get_field_mapping — Returns the stored canonical → PDF field mapping for a schema–template pair.

api/schemas/report_class.py (new)

Pydantic request/response models for ReportSchema, SchemaField, and ReportSchemaTemplate, ready for API route handlers.

api/db/database.py & api/db/init_db.py

Minor updates to register the new models with SQLModel metadata so tables are created on startup.

tests/unit/test_repositories.py

Full unit test coverage for all new repository functions: creation, retrieval, update, deletion, cascade behaviour, schema-scoped field ownership enforcement, and duplicate junction prevention.


Testing

pytest tests/unit/test_repositories.py

Related Issues

Closes / related to: #102, #111, #152, #196, #206, #255 , .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant