Skip to content

[Performance Optimization]: Decouple File Upload from Grading Pipeline #13

@RishiGoswami-code

Description

@RishiGoswami-code

Description

The current upload workflow blocks grading and extraction behind file storage and database operations.

Current Flow

User Uploads File
        ↓
Store File
        ↓
Create Database Records
        ↓
Extract Content
        ↓
Run OCR
        ↓
Run AI Grading
        ↓
Return Result

This causes unnecessary latency because grading-related work cannot begin until file storage and database operations have completed.

The workflow should be redesigned so that grading and extraction start immediately after the file is received, while storage-related operations execute independently in the background.


Scope of This Issue

This issue focuses on one specific optimization only:

Start OCR/Extraction Immediately After File Reception

When a question paper or answer sheet is uploaded:

  1. Receive file in memory or temporary storage.
  2. Trigger OCR/content extraction immediately.
  3. Continue file persistence and metadata storage in parallel.
  4. Merge results when both tasks complete.

Expected Flow

User Uploads File
        ↓
Receive File
        ↓
 ┌──────────────┬──────────────┐
 │              │              │
 ↓              ↓              ↓
OCR         File Storage    DB Operations
 │              │              │
 └──────────────┴──────────────┘
        ↓
Continue Processing
        ↓
Grading
        ↓
Response

Why This Matters

For large PDFs and answer sheets, file uploads and database writes can consume a significant portion of request time before grading even starts.

By starting extraction immediately:

  • OCR begins sooner
  • End-to-end grading latency decreases
  • Users receive results faster
  • Infrastructure resources are utilized more efficiently

Suggested Implementation

Potential approaches:

  • FastAPI asyncio.create_task
  • Background task execution
  • Async worker pattern
  • Parallel async coroutines using asyncio.gather

Example concept:

extraction_task = asyncio.create_task(
    extract_content(file_bytes)
)

storage_task = asyncio.create_task(
    persist_file(file)
)

await asyncio.gather(
    extraction_task,
    storage_task
)

Acceptance Criteria

  • OCR/extraction begins immediately after file reception.
  • File storage runs independently of extraction.
  • Database operations do not block OCR startup.
  • Existing grading functionality remains unchanged.
  • No data integrity issues introduced.
  • Performance improvement is measurable and documented.

Impact

This optimization reduces processing latency by removing unnecessary dependencies between storage operations and extraction, providing a faster grading experience for end users.

Type: Performance Optimization
Difficulty: Intermediate
Labels: performance, backend, architecture, optimization, help wanted

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions