Skip to content

[FEAT]: Batch LLM extraction for faster PDF processing #399

@Harshitha-arch

Description

@Harshitha-arch

Description

Currently, FireForm processes each PDF field sequentially with independent LLM calls, which increases latency for multi-field forms.

Rationale

Batched processing can improve performance and scalability by sending all fields in a single structured prompt and mapping JSON output to PDF fields.

Proposed Solution

  • Change extraction logic to handle all fields in one request
  • Map JSON output to PDF fields
  • Update relevant tests and documentation

Acceptance Criteria

  • Latency for multi-field PDFs decreases
  • JSON output matches schema
  • Works in Docker container

Additional Context

This aligns with my GSoC proposal focused on improving FireForm's pipeline efficiency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions