Request-cost-aware router fairness priority by JagjeevanAK · Pull Request #897 · volcano-sh/kthena

JagjeevanAK · 2026-04-19T01:57:38Z

What type of PR is this?

/kind enhancement

What this PR does / why we need it:

This PR makes router fairness enqueue priority request-cost-aware by combining historical weighted usage with the current request’s estimated cost. The goal is to keep the existing sliding-window fairness model while making enqueue ordering reflect both prior usage and the size of the request being admitted.

Which issue(s) this PR fixes:
Fixes #757

Special notes for your reviewer:

The attached RFC plan as posted on issue while workspace and the implementation scope is intentionally narrow: it reuses the existing fairness weights, avoids API/CRD changes, and adds a V(4) enqueue debug log plus unit coverage for output-token parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating behavior.

I used AI assistance while preparing this PR description and implementation summary.

Does this PR introduce a user-facing change?:

Yes.

Router fairness enqueue priority now includes estimated current request cost in addition to historical weighted usage, improving admission ordering for larger requests.

PROPOSAL

I drafted an RFC-style proposal for this issue focused on a narrow fairness improvement in router enqueue priority.

Summary:
This proposal keeps the existing sliding-window fairness model, and adds estimated current request cost at enqueue time so requests are not ranked by historical aggregate alone.

Proposed priority:

estimatedRequestCost = inputWeight * inputTokens + outputWeight * requestedOutputTokens
finalPriority = historicalWeightedUsage + estimatedRequestCost
Lower numeric value still means higher priority (unchanged queue behavior)

Key changes:

Keep the sliding-window token tracker as the source of historical usage per (user, model)
Update router enqueue priority helper in router.go to use:
- userID
- modelName
- inputTokens
- parsed request body
Resolve requested output tokens in this order:
- max_completion_tokens
- max_tokens
- else 0
Treat missing, invalid, or negative values as 0
Keep post-response accounting unchanged (real usage still updates token tracker after completion)

Configuration/logging scope:

Reuse only FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for this issue
No CRD, CLI flag, or Helm value changes
No expansion into FAIRNESS_PRIORITY_REQUEST_NUM_WEIGHT in this issue
Add one V(4) enqueue debug log with:
- requestID, userID, model, historicalUsage, inputTokens, requestedOutputTokens, estimatedRequestCost, finalPriority
No new metrics required

Compatibility:

No public API/CRD changes
Internal router helper signature changes only
Fairness queue semantics remain the same except for the new request-cost input

Test plan:

Unit tests for output token parsing:
- max_completion_tokens, max_tokens, missing, invalid, negative
Unit tests for weight application and final score calculation
Router-level tests for:
- request-size sensitivity with same historical usage
- historical usage inclusion
- zero/missing output budget fallback to input-only cost
- debug-log gating behavior
Scope is unit tests only (no e2e/load testing in this issue)

Non-goals for this issue:

Cross-model fairness
Request-count weighting redesign
Queue refresh redesign
HA/distributed fairness state

…dded the required V(4) enqueue debug log fields, and added targeted router unit tests for output-token parsing, weight application, historical-usage inclusion, request-size sensitivity, and log-gating behavior Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

volcano-sh-bot · 2026-04-19T01:57:50Z

Welcome @JagjeevanAK! It looks like this is your first PR to volcano-sh/kthena 🎉

gemini-code-assist

Code Review

This pull request updates the fairness scheduling logic by replacing generic token and request weights with specific input and output token weights. It introduces a new priority calculation that combines historical usage with an estimated request cost based on input tokens and requested output tokens. Comprehensive tests were added to verify these changes. Feedback highlights the need to handle multiple numeric types when parsing token fields and identifies inconsistencies between the new router logic and the existing datastore priority refresh mechanism.

Copilot

Pull request overview

This PR updates the router’s fairness enqueue priority to account for the estimated cost of the current request (input + requested output tokens) in addition to the user/model’s historical weighted usage, improving admission ordering for expensive requests while keeping the sliding-window fairness model.

Changes:

Replace enqueue priority calculation with historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens).
Add parsing for requested output token budget from max_completion_tokens / max_tokens.
Add V(4) debug logging for enqueue priority components and introduce unit tests for the new behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/kthena-router/router/router.go	Implements cost-aware enqueue priority calculation and adds debug logging for priority components.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests covering output-budget parsing, weight application, historical usage inclusion, request-size sensitivity, and log-gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

JagjeevanAK · 2026-04-19T02:16:46Z

Addressed AI review findings in commit 68c4b4c on this PR branch:

Requested output-token parsing now accepts float64, int, int64, and json.Number (invalid or negative values still fallback to 0).
Fairness queue refresh/rebuild now preserves enqueue-time request-cost contribution via a stored priority offset, so refresh no longer drops request-size sensitivity.

Scope note for maintainers: for issue #757 this change intentionally uses FAIRNESS_INPUT_TOKEN_WEIGHT and FAIRNESS_OUTPUT_TOKEN_WEIGHT for request-cost-aware enqueue behavior, while broader compatibility/deprecation handling around FAIRNESS_PRIORITY_* is being kept out of this narrowly scoped issue unless maintainers want it folded in here.

Proposal context is documented on issue #757: #757 (comment)

hzxuzhonghu · 2026-04-20T03:19:55Z

I would also like to see the proposal doc in this pr

JagjeevanAK · 2026-04-22T04:48:07Z

Sure, I have added Proposal in PR description.

volcano-sh-bot · 2026-04-22T04:50:23Z

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot

Pull request overview

This PR enhances the router’s fairness scheduling by making enqueue priority request-cost-aware (estimated input + requested output tokens) while preserving the existing sliding-window historical usage model.

Changes:

Add request-cost components (input tokens + requested output token budget) into enqueue priority calculation and log detailed enqueue priority at klog V(4).
Persist a per-request PriorityOffset so dequeue-time priority refresh can preserve request-size sensitivity.
Add unit tests covering output-budget parsing and priority composition behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
pkg/kthena-router/router/router.go	Computes request-cost-aware enqueue priority; adds V(4) enqueue debug log; stores per-request offset for refresh.
pkg/kthena-router/datastore/fairness_queue.go	Preserves request-size component across priority refresh/rebuild via `PriorityOffset`.
pkg/kthena-router/datastore/fairness_queue_test.go	Adds tests ensuring offset is preserved across refresh and heap rebuild.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests for output token parsing and request-cost-aware priority behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hzxuzhonghu · 2026-04-23T08:03:58Z

@JagjeevanAK Mind adding a doc under https://github.com/volcano-sh/kthena/tree/main/docs/proposal

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

JagjeevanAK · 2026-04-23T09:10:58Z

hey can you check now ?

CC: @hzxuzhonghu

Copilot

Pull request overview

Enhances Kthena router fairness scheduling so enqueue ordering is request-cost-aware by combining historical sliding-window usage with an estimated cost for the in-flight request, while preserving request-size sensitivity across queue priority refreshes.

Changes:

Compute enqueue priority as historicalUsage + estimatedRequestCost(inputTokens, requestedOutputTokens) and add a V(4) debug log for the priority components.
Persist an enqueue-time PriorityOffset on queued requests so dequeue-time refresh/rebuild keeps request-size sensitivity.
Add unit tests for output-budget parsing, priority calculation, request-size sensitivity, and offset preservation during refresh/rebuild.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/kthena-router/router/router.go	Implements request-cost-aware enqueue priority calculation, logging, and enqueues `PriorityOffset`.
pkg/kthena-router/datastore/fairness_queue.go	Preserves request-cost component during priority refresh/rebuild via `PriorityOffset`.
pkg/kthena-router/router/router_fairness_test.go	Adds unit tests for output token parsing, cost calculation, request-size sensitivity, and log gating.
pkg/kthena-router/datastore/fairness_queue_test.go	Adds tests asserting `PriorityOffset` is preserved across refresh and heap rebuild.
docs/proposal/request-cost-aware-fairness-priority.md	Adds an RFC-style proposal documenting the intended behavior and scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hzxuzhonghu · 2026-04-23T11:23:33Z

Donot contain merge-commits , i will first take a look at the proposal

JagjeevanAK · 2026-04-23T13:27:43Z

Okay sorry for that will keep in mind

hzxuzhonghu

Critical Review

Overall the design direction is sound — incorporating estimated request cost into enqueue priority is a clear improvement. However, there are a few issues ranging from a correctness bug to a silent behavioral regression that should be addressed before merge.

See inline comments for details.

hzxuzhonghu

Left a few comments inline. The overall direction makes sense to me, but there are a couple of things that need fixing before this can go in.

hzxuzhonghu · 2026-04-24T01:17:57Z

+		_ = flag.Set("v", originalV)
+	}()
+
+	_ = flag.Set("v", "0")


Mutating the global klog verbosity flag will race if anyone adds t.Parallel() to this or a sibling test. Might be worth a comment saying this test must stay sequential, or just accept the coupling and skip the verbosity toggle.

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

JagjeevanAK · 2026-04-25T01:48:51Z

Hey @hzxuzhonghu thanks for your review and I have addressed your concerns in recent commit.

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Copilot

Pull request overview

Enhances Kthena router fairness scheduling by making enqueue priority request-cost-aware, combining historical sliding-window usage with an estimate of the current request’s token cost, while preserving request-size sensitivity during priority refresh.

Changes:

Extend router enqueue priority calculation to add estimated request cost (input tokens + requested output tokens) to historical fairness usage.
Persist enqueue-time request cost as a PriorityOffset so dequeue-time refresh/rebuild keeps request-size sensitivity.
Add unit tests for output-token parsing, priority calculation, refresh/rebuild offset preservation, and log-gating behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pkg/kthena-router/router/router.go	Adds request-cost-aware enqueue priority calculation, output-token parsing helpers, V(4) enqueue debug log, and propagates `PriorityOffset` into enqueued requests.
pkg/kthena-router/router/router_fairness_test.go	New unit tests for requested output token parsing, priority component math, request-size sensitivity, and verbosity-gated logging behavior.
pkg/kthena-router/datastore/fairness_queue.go	Adds `PriorityOffset` to queued requests and preserves it during priority refresh and heap rebuild.
pkg/kthena-router/datastore/fairness_queue_test.go	Adds coverage ensuring refresh/rebuild preserve `PriorityOffset` and avoid unnecessary reinserts when final priority is unchanged.
docs/proposal/request-cost-aware-fairness-priority.md	Documents the rationale, formula, configuration scope, refresh behavior, and test plan for the request-cost-aware priority change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

volcano-sh-bot · 2026-04-28T14:38:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

This PR enhances Kthena router fairness scheduling by making enqueue priority “request-cost-aware”, combining historical weighted usage with an estimate of the current request’s cost (input tokens + requested output tokens), while preserving request-size sensitivity during dequeue-time priority refresh.

Changes:

Add enqueue-time priority computation that sums historical fairness usage with estimated current request cost, and emit a V(4) debug log of the components.
Persist enqueue-time request cost as a per-request PriorityOffset so priority refresh/rebuild keeps request-size sensitivity.
Update/extend unit tests around priority calculation/refresh, and harden e2e rate-limit + metrics tests to reduce flakiness from retries/reconciliation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`pkg/kthena-router/router/router.go`	Computes request-cost-aware enqueue priority and logs priority components; stores request cost as `PriorityOffset`.
`pkg/kthena-router/datastore/fairness_queue.go`	Preserves enqueue-time `PriorityOffset` when refreshing/rebuilding priorities.
`pkg/kthena-router/datastore/fairness_queue_test.go`	Adds coverage ensuring `PriorityOffset` is preserved across refresh/rebuild paths.
`pkg/kthena-router/router/router_fairness_test.go`	New unit tests for output-token parsing, priority math, historical usage inclusion, request-size sensitivity, and log-gating invariance.
`test/e2e/router/shared.go`	Refactors rate-limit exhaustion logic and stabilizes metrics assertions by avoiding retrying calls for measured requests.
`docs/proposal/request-cost-aware-fairness-priority.md`	Adds an RFC-style proposal documenting the intended behavior and configuration scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+		fairnessTimeout:          parseFairnessTimeout(),
+		inputTokenWeight:         parseEnvFloat("FAIRNESS_INPUT_TOKEN_WEIGHT", 1.0),
+		outputTokenWeight:        parseEnvFloat("FAIRNESS_OUTPUT_TOKEN_WEIGHT", 1.0),
+		priorityTokenWeight:      parseEnvFloat("FAIRNESS_PRIORITY_TOKEN_WEIGHT", 1.0),
+		priorityRequestNumWeight: parseEnvFloat("FAIRNESS_PRIORITY_REQUEST_NUM_WEIGHT", 0.0),


JagjeevanAK · 2026-04-28T15:12:39Z

hey @hzxuzhonghu I think now all the test are passing and now issue as well

hzxuzhonghu

same critical question, how do you estimate the output token length?

We can implement after the proposal approved

Copilot AI review requested due to automatic review settings April 19, 2026 01:57

volcano-sh-bot added the kind/enhancement New feature or request label Apr 19, 2026

volcano-sh-bot requested review from YaoZengzeng and hzxuzhonghu April 19, 2026 01:57

volcano-sh-bot added the size/L label Apr 19, 2026

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go Outdated

Comment thread pkg/kthena-router/router/router.go

Copilot started reviewing on behalf of JagjeevanAK April 19, 2026 01:59 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go Outdated

Comment thread pkg/kthena-router/router/router.go Outdated

Fix fairness refresh consistency and token parsing

68c4b4c

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Merge branch 'main' into issue-757-request-cost-aware-fairness-priority

684c059

Copilot AI review requested due to automatic review settings April 22, 2026 04:50

volcano-sh-bot added the do-not-merge/contains-merge-commits label Apr 22, 2026

Copilot started reviewing on behalf of JagjeevanAK April 22, 2026 04:50 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go Outdated

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/datastore/fairness_queue.go Outdated

JagjeevanAK and others added 2 commits April 23, 2026 14:29

Merge branch 'main' into issue-757-request-cost-aware-fairness-priority

4c5b917

doc: proposal docs for cost aware request and fairness priority

7549cc8

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Copilot AI review requested due to automatic review settings April 23, 2026 09:08

volcano-sh-bot added size/XL and removed size/L labels Apr 23, 2026

Copilot started reviewing on behalf of JagjeevanAK April 23, 2026 09:08 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/datastore/fairness_queue_test.go

Comment thread pkg/kthena-router/datastore/fairness_queue.go Outdated

hzxuzhonghu reviewed Apr 24, 2026

View reviewed changes

Updated according to review

f3b5372

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

hzxuzhonghu reviewed Apr 25, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go Outdated

Updated according to review

c65d2f7

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Copilot AI review requested due to automatic review settings April 26, 2026 14:01

Copilot started reviewing on behalf of JagjeevanAK April 26, 2026 14:02 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Comment thread pkg/kthena-router/router/router.go

Comment thread pkg/kthena-router/router/router.go

test fix

c38a0e6

Signed-off-by: Jagjeevan Kashid <jagjeevandev97@gmail.com>

Copilot AI review requested due to automatic review settings April 28, 2026 14:37

Copilot started reviewing on behalf of JagjeevanAK April 28, 2026 14:38 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

hzxuzhonghu reviewed Apr 29, 2026

View reviewed changes

Conversation

JagjeevanAK commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PROPOSAL

Uh oh!

volcano-sh-bot commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

JagjeevanAK commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hzxuzhonghu commented Apr 20, 2026

Uh oh!

JagjeevanAK commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

volcano-sh-bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzxuzhonghu commented Apr 23, 2026

Uh oh!

JagjeevanAK commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzxuzhonghu commented Apr 23, 2026

Uh oh!

JagjeevanAK commented Apr 23, 2026

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Critical Review

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzxuzhonghu Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

JagjeevanAK commented Apr 25, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

volcano-sh-bot commented Apr 28, 2026

JagjeevanAK commented Apr 19, 2026 •

edited

Loading

JagjeevanAK commented Apr 19, 2026 •

edited

Loading

JagjeevanAK commented Apr 22, 2026 •

edited

Loading