Description
The current upload workflow blocks grading and extraction behind file storage and database operations.
Current Flow
User Uploads File
↓
Store File
↓
Create Database Records
↓
Extract Content
↓
Run OCR
↓
Run AI Grading
↓
Return Result
This causes unnecessary latency because grading-related work cannot begin until file storage and database operations have completed.
The workflow should be redesigned so that grading and extraction start immediately after the file is received, while storage-related operations execute independently in the background.
Scope of This Issue
This issue focuses on one specific optimization only:
Start OCR/Extraction Immediately After File Reception
When a question paper or answer sheet is uploaded:
- Receive file in memory or temporary storage.
- Trigger OCR/content extraction immediately.
- Continue file persistence and metadata storage in parallel.
- Merge results when both tasks complete.
Expected Flow
User Uploads File
↓
Receive File
↓
┌──────────────┬──────────────┐
│ │ │
↓ ↓ ↓
OCR File Storage DB Operations
│ │ │
└──────────────┴──────────────┘
↓
Continue Processing
↓
Grading
↓
Response
Why This Matters
For large PDFs and answer sheets, file uploads and database writes can consume a significant portion of request time before grading even starts.
By starting extraction immediately:
- OCR begins sooner
- End-to-end grading latency decreases
- Users receive results faster
- Infrastructure resources are utilized more efficiently
Suggested Implementation
Potential approaches:
- FastAPI
asyncio.create_task
- Background task execution
- Async worker pattern
- Parallel async coroutines using
asyncio.gather
Example concept:
extraction_task = asyncio.create_task(
extract_content(file_bytes)
)
storage_task = asyncio.create_task(
persist_file(file)
)
await asyncio.gather(
extraction_task,
storage_task
)
Acceptance Criteria
Impact
This optimization reduces processing latency by removing unnecessary dependencies between storage operations and extraction, providing a faster grading experience for end users.
Type: Performance Optimization
Difficulty: Intermediate
Labels: performance, backend, architecture, optimization, help wanted
Description
The current upload workflow blocks grading and extraction behind file storage and database operations.
Current Flow
This causes unnecessary latency because grading-related work cannot begin until file storage and database operations have completed.
The workflow should be redesigned so that grading and extraction start immediately after the file is received, while storage-related operations execute independently in the background.
Scope of This Issue
This issue focuses on one specific optimization only:
Start OCR/Extraction Immediately After File Reception
When a question paper or answer sheet is uploaded:
Expected Flow
Why This Matters
For large PDFs and answer sheets, file uploads and database writes can consume a significant portion of request time before grading even starts.
By starting extraction immediately:
Suggested Implementation
Potential approaches:
asyncio.create_taskasyncio.gatherExample concept:
Acceptance Criteria
Impact
This optimization reduces processing latency by removing unnecessary dependencies between storage operations and extraction, providing a faster grading experience for end users.
Type: Performance Optimization
Difficulty: Intermediate
Labels:
performance,backend,architecture,optimization,help wanted