Skip to content

⚡ Offload synchronous prediction to threadpool in async endpoint#311

Merged
lgcorzo merged 2 commits into
mainfrom
perf-unblock-predict-endpoint-9304074883955391764
Apr 30, 2026
Merged

⚡ Offload synchronous prediction to threadpool in async endpoint#311
lgcorzo merged 2 commits into
mainfrom
perf-unblock-predict-endpoint-9304074883955391764

Conversation

@lgcorzo
Copy link
Copy Markdown
Owner

@lgcorzo lgcorzo commented Apr 30, 2026

The /predict endpoint was updated to use run_in_threadpool for the synchronous prediction callback, preventing the FastAPI event loop from being blocked. Verification was performed using a benchmark script and ruff checks.


PR created automatically by Jules for task 9304074883955391764 started by @lgcorzo

The `/predict` endpoint is an `async def` function, but it was calling the synchronous `prediction_callback` directly. This blocks the FastAPI event loop, preventing other concurrent requests (like health checks) from being processed.

By using `fastapi.concurrency.run_in_threadpool`, we offload the synchronous prediction task to a separate thread, allowing the event loop to remain responsive and handle other tasks concurrently.

Benchmark results showed that concurrent async tasks are no longer blocked, reducing total processing time for concurrent operations from 4.5s to 2.5s in a simulated scenario.

Co-authored-by: lgcorzo <46710567+lgcorzo@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

- Use `fastapi.concurrency.run_in_threadpool` to execute the synchronous `prediction_callback` in the `/predict` endpoint.
- This prevents the FastAPI event loop from being blocked by CPU-bound or blocking I/O tasks.
- Added a conceptual benchmark script in `tests/performance/benchmark_blocking.py` to demonstrate and verify the performance improvement.
- Ensure all new files are correctly formatted to comply with CI requirements.

Co-authored-by: lgcorzo <46710567+lgcorzo@users.noreply.github.com>
@lgcorzo lgcorzo merged commit 69a6d25 into main Apr 30, 2026
4 checks passed
@lgcorzo lgcorzo deleted the perf-unblock-predict-endpoint-9304074883955391764 branch April 30, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant