[Performance Optimization]: Decouple File Upload from Grading Pipeline

## Description

The current upload workflow blocks grading and extraction behind file storage and database operations.

### Current Flow

```text
User Uploads File
        ↓
Store File
        ↓
Create Database Records
        ↓
Extract Content
        ↓
Run OCR
        ↓
Run AI Grading
        ↓
Return Result
```

This causes unnecessary latency because grading-related work cannot begin until file storage and database operations have completed.

The workflow should be redesigned so that grading and extraction start immediately after the file is received, while storage-related operations execute independently in the background.

---

## Scope of This Issue

This issue focuses on **one specific optimization only**:

### Start OCR/Extraction Immediately After File Reception

When a question paper or answer sheet is uploaded:

1. Receive file in memory or temporary storage.
2. Trigger OCR/content extraction immediately.
3. Continue file persistence and metadata storage in parallel.
4. Merge results when both tasks complete.

### Expected Flow

```text
User Uploads File
        ↓
Receive File
        ↓
 ┌──────────────┬──────────────┐
 │              │              │
 ↓              ↓              ↓
OCR         File Storage    DB Operations
 │              │              │
 └──────────────┴──────────────┘
        ↓
Continue Processing
        ↓
Grading
        ↓
Response
```

---

## Why This Matters

For large PDFs and answer sheets, file uploads and database writes can consume a significant portion of request time before grading even starts.

By starting extraction immediately:

* OCR begins sooner
* End-to-end grading latency decreases
* Users receive results faster
* Infrastructure resources are utilized more efficiently

---

## Suggested Implementation

Potential approaches:

* FastAPI `asyncio.create_task`
* Background task execution
* Async worker pattern
* Parallel async coroutines using `asyncio.gather`

Example concept:

```python
extraction_task = asyncio.create_task(
    extract_content(file_bytes)
)

storage_task = asyncio.create_task(
    persist_file(file)
)

await asyncio.gather(
    extraction_task,
    storage_task
)
```

---

## Acceptance Criteria

* [ ] OCR/extraction begins immediately after file reception.
* [ ] File storage runs independently of extraction.
* [ ] Database operations do not block OCR startup.
* [ ] Existing grading functionality remains unchanged.
* [ ] No data integrity issues introduced.
* [ ] Performance improvement is measurable and documented.

## Impact

This optimization reduces processing latency by removing unnecessary dependencies between storage operations and extraction, providing a faster grading experience for end users.

**Type:** Performance Optimization
**Difficulty:** Intermediate
**Labels:** `performance`, `backend`, `architecture`, `optimization`, `help wanted`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance Optimization]: Decouple File Upload from Grading Pipeline #13

Description

Current Flow

Scope of This Issue

Start OCR/Extraction Immediately After File Reception

Expected Flow

Why This Matters

Suggested Implementation

Acceptance Criteria

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Performance Optimization]: Decouple File Upload from Grading Pipeline #13

Description

Description

Current Flow

Scope of This Issue

Start OCR/Extraction Immediately After File Reception

Expected Flow

Why This Matters

Suggested Implementation

Acceptance Criteria

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions