Request-cost-aware router fairness priority#897
Request-cost-aware router fairness priority#897JagjeevanAK wants to merge 8 commits intovolcano-sh:mainfrom
Conversation
…dded the required V(4) enqueue debug log fields, and added targeted router unit tests for output-token parsing, weight application, historical-usage inclusion, request-size sensitivity, and log-gating behavior Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
Welcome @JagjeevanAK! It looks like this is your first PR to volcano-sh/kthena 🎉 |
There was a problem hiding this comment.
Code Review
This pull request updates the fairness scheduling logic by replacing generic token and request weights with specific input and output token weights. It introduces a new priority calculation that combines historical usage with an estimated request cost based on input tokens and requested output tokens. Comprehensive tests were added to verify these changes. Feedback highlights the need to handle multiple numeric types when parsing token fields and identifies inconsistencies between the new router logic and the existing datastore priority refresh mechanism.
There was a problem hiding this comment.
Pull request overview
This PR updates the router’s fairness enqueue priority to account for the estimated cost of the current request (input + requested output tokens) in addition to the user/model’s historical weighted usage, improving admission ordering for expensive requests while keeping the sliding-window fairness model.
Changes:
- Replace enqueue priority calculation with
historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens). - Add parsing for requested output token budget from
max_completion_tokens/max_tokens. - Add V(4) debug logging for enqueue priority components and introduce unit tests for the new behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Implements cost-aware enqueue priority calculation and adds debug logging for priority components. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests covering output-budget parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
Addressed AI review findings in commit 68c4b4c on this PR branch:
Scope note for maintainers: for issue #757 this change intentionally uses Proposal context is documented on issue #757: #757 (comment) |
|
I would also like to see the proposal doc in this pr |
|
Sure, I have added Proposal in PR description. |
|
Adding label DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the router’s fairness scheduling by making enqueue priority request-cost-aware (estimated input + requested output tokens) while preserving the existing sliding-window historical usage model.
Changes:
- Add request-cost components (input tokens + requested output token budget) into enqueue priority calculation and log detailed enqueue priority at klog V(4).
- Persist a per-request
PriorityOffsetso dequeue-time priority refresh can preserve request-size sensitivity. - Add unit tests covering output-budget parsing and priority composition behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Computes request-cost-aware enqueue priority; adds V(4) enqueue debug log; stores per-request offset for refresh. |
| pkg/kthena-router/datastore/fairness_queue.go | Preserves request-size component across priority refresh/rebuild via PriorityOffset. |
| pkg/kthena-router/datastore/fairness_queue_test.go | Adds tests ensuring offset is preserved across refresh and heap rebuild. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests for output token parsing and request-cost-aware priority behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@JagjeevanAK Mind adding a doc under https://github.com/volcano-sh/kthena/tree/main/docs/proposal |
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
hey can you check now ? CC: @hzxuzhonghu |
There was a problem hiding this comment.
Pull request overview
Enhances Kthena router fairness scheduling so enqueue ordering is request-cost-aware by combining historical sliding-window usage with an estimated cost for the in-flight request, while preserving request-size sensitivity across queue priority refreshes.
Changes:
- Compute enqueue priority as
historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens)and add a V(4) debug log for the priority components. - Persist an enqueue-time
PriorityOffseton queued requests so dequeue-time refresh/rebuild keeps request-size sensitivity. - Add unit tests for output-budget parsing, priority calculation, request-size sensitivity, and offset preservation during refresh/rebuild.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Implements request-cost-aware enqueue priority calculation, logging, and enqueues PriorityOffset. |
| pkg/kthena-router/datastore/fairness_queue.go | Preserves request-cost component during priority refresh/rebuild via PriorityOffset. |
| pkg/kthena-router/router/router_fairness_test.go | Adds unit tests for output token parsing, cost calculation, request-size sensitivity, and log gating. |
| pkg/kthena-router/datastore/fairness_queue_test.go | Adds tests asserting PriorityOffset is preserved across refresh and heap rebuild. |
| docs/proposal/request-cost-aware-fairness-priority.md | Adds an RFC-style proposal documenting the intended behavior and scope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Donot contain merge-commits , i will first take a look at the proposal |
|
Okay sorry for that will keep in mind |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
Critical Review
Overall the design direction is sound — incorporating estimated request cost into enqueue priority is a clear improvement. However, there are a few issues ranging from a correctness bug to a silent behavioral regression that should be addressed before merge.
See inline comments for details.
hzxuzhonghu
left a comment
There was a problem hiding this comment.
Left a few comments inline. The overall direction makes sense to me, but there are a couple of things that need fixing before this can go in.
| _ = flag.Set("v", originalV) | ||
| }() | ||
|
|
||
| _ = flag.Set("v", "0") |
There was a problem hiding this comment.
Mutating the global klog verbosity flag will race if anyone adds t.Parallel() to this or a sibling test. Might be worth a comment saying this test must stay sequential, or just accept the coupling and skip the verbosity toggle.
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
|
Hey @hzxuzhonghu thanks for your review and I have addressed your concerns in recent commit. |
Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>
There was a problem hiding this comment.
Pull request overview
Enhances Kthena router fairness scheduling by making enqueue priority request-cost-aware, combining historical sliding-window usage with an estimate of the current request’s token cost, while preserving request-size sensitivity during priority refresh.
Changes:
- Extend router enqueue priority calculation to add estimated request cost (input tokens + requested output tokens) to historical fairness usage.
- Persist enqueue-time request cost as a
PriorityOffsetso dequeue-time refresh/rebuild keeps request-size sensitivity. - Add unit tests for output-token parsing, priority calculation, refresh/rebuild offset preservation, and log-gating behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/kthena-router/router/router.go | Adds request-cost-aware enqueue priority calculation, output-token parsing helpers, V(4) enqueue debug log, and propagates PriorityOffset into enqueued requests. |
| pkg/kthena-router/router/router_fairness_test.go | New unit tests for requested output token parsing, priority component math, request-size sensitivity, and verbosity-gated logging behavior. |
| pkg/kthena-router/datastore/fairness_queue.go | Adds PriorityOffset to queued requests and preserves it during priority refresh and heap rebuild. |
| pkg/kthena-router/datastore/fairness_queue_test.go | Adds coverage ensuring refresh/rebuild preserve PriorityOffset and avoid unnecessary reinserts when final priority is unchanged. |
| docs/proposal/request-cost-aware-fairness-priority.md | Documents the rationale, formula, configuration scope, refresh behavior, and test plan for the request-cost-aware priority change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR enhances Kthena router fairness scheduling by making enqueue priority “request-cost-aware”, combining historical weighted usage with an estimate of the current request’s cost (input tokens + requested output tokens), while preserving request-size sensitivity during dequeue-time priority refresh.
Changes:
- Add enqueue-time priority computation that sums historical fairness usage with estimated current request cost, and emit a V(4) debug log of the components.
- Persist enqueue-time request cost as a per-request
PriorityOffsetso priority refresh/rebuild keeps request-size sensitivity. - Update/extend unit tests around priority calculation/refresh, and harden e2e rate-limit + metrics tests to reduce flakiness from retries/reconciliation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/kthena-router/router/router.go |
Computes request-cost-aware enqueue priority and logs priority components; stores request cost as PriorityOffset. |
pkg/kthena-router/datastore/fairness_queue.go |
Preserves enqueue-time PriorityOffset when refreshing/rebuilding priorities. |
pkg/kthena-router/datastore/fairness_queue_test.go |
Adds coverage ensuring PriorityOffset is preserved across refresh/rebuild paths. |
pkg/kthena-router/router/router_fairness_test.go |
New unit tests for output-token parsing, priority math, historical usage inclusion, request-size sensitivity, and log-gating invariance. |
test/e2e/router/shared.go |
Refactors rate-limit exhaustion logic and stabilizes metrics assertions by avoiding retrying calls for measured requests. |
docs/proposal/request-cost-aware-fairness-priority.md |
Adds an RFC-style proposal documenting the intended behavior and configuration scope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fairnessTimeout: parseFairnessTimeout(), | ||
| inputTokenWeight: parseEnvFloat("FAIRNESS_INPUT_TOKEN_WEIGHT", 1.0), | ||
| outputTokenWeight: parseEnvFloat("FAIRNESS_OUTPUT_TOKEN_WEIGHT", 1.0), | ||
| priorityTokenWeight: parseEnvFloat("FAIRNESS_PRIORITY_TOKEN_WEIGHT", 1.0), | ||
| priorityRequestNumWeight: parseEnvFloat("FAIRNESS_PRIORITY_REQUEST_NUM_WEIGHT", 0.0), |
|
hey @hzxuzhonghu I think now all the test are passing and now issue as well |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
same critical question, how do you estimate the output token length?
We can implement after the proposal approved
What type of PR is this?
/kind enhancement
What this PR does / why we need it:
This PR makes router fairness enqueue priority request-cost-aware by combining historical weighted usage with the current request’s estimated cost. The goal is to keep the existing sliding-window fairness model while making enqueue ordering reflect both prior usage and the size of the request being admitted.
Which issue(s) this PR fixes:
Fixes #757
Special notes for your reviewer:
The attached RFC plan as posted on issue while workspace and the implementation scope is intentionally narrow: it reuses the existing fairness weights, avoids API/CRD changes, and adds a V(4) enqueue debug log plus unit coverage for output-token parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating behavior.
I used AI assistance while preparing this PR description and implementation summary.
Does this PR introduce a user-facing change?:
Yes.
Router fairness enqueue priority now includes estimated current request cost in addition to historical weighted usage, improving admission ordering for larger requests.
PROPOSAL
I drafted an RFC-style proposal for this issue focused on a narrow fairness improvement in router enqueue priority.
Summary:
This proposal keeps the existing sliding-window fairness model, and adds estimated current request cost at enqueue time so requests are not ranked by historical aggregate alone.
Proposed priority:
Key changes:
Configuration/logging scope:
Compatibility:
Test plan:
Non-goals for this issue: