Skip to content

[WIP] Implement request pausing#973

Draft
maxdebayser wants to merge 21 commits into
torch-spyre:mainfrom
maxdebayser:decode_holdback
Draft

[WIP] Implement request pausing#973
maxdebayser wants to merge 21 commits into
torch-spyre:mainfrom
maxdebayser:decode_holdback

Conversation

@maxdebayser
Copy link
Copy Markdown
Collaborator

Description

vLLM supports preemption of requests when it runs out of KV cache blocks but the preempted requests must be re-computed from scratch. In sendnn-inference there are other constraints that are tighter than the number of blocks so we want the "pause" or request from the currently executing batch without relinquishing the blocks that it is holding. In this way we can schedule requests optimistically.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

@sducouedic sducouedic marked this pull request as draft May 21, 2026 18:32
maxdebayser and others added 10 commits May 21, 2026 14:33
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
…nto decode_holdback

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
@sducouedic sducouedic changed the title [WIP] Implement request hold back [WIP] Implement request pausing May 28, 2026
sducouedic added 10 commits May 29, 2026 10:13
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants