Skip to content

Session affinity routing#178

Open
szedan-rh wants to merge 1 commit into
llm-d:mainfrom
szedan-rh:session-affinity-routing
Open

Session affinity routing#178
szedan-rh wants to merge 1 commit into
llm-d:mainfrom
szedan-rh:session-affinity-routing

Conversation

@szedan-rh

Copy link
Copy Markdown
Contributor

The problem: When a user has a multi-turn conversation, each request might land on a different server. That server has to re-process all the previous context from scratch — wasting GPU time and adding latency.

The fix: We stick each conversation to one server.


How:

  1. First request comes in — we give it a session ID (or the client sends one)
  2. We look at which servers are healthy, then use the session ID to pick one
  3. Every future request with that same session ID goes to the same server
  4. The server's cache stays warm — no re-processing

If a server goes down:

  • Only the conversations that were on that server get moved
  • Everyone else is unaffected
  • The moved conversations lose their cache, but it rebuilds naturally

That's it. Same session, same server, faster responses.

@github-actions github-actions Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 18, 2026
@github-actions

Copy link
Copy Markdown

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

@szedan-rh

Copy link
Copy Markdown
Contributor Author

@nirrozenbaum - Could you please review?

@szedan-rh

Copy link
Copy Markdown
Contributor Author

The PR is large because the html visualization how the flow work.

@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 4c60beb to 2a88a36 Compare June 18, 2026 16:18
@github-actions github-actions Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2026
@github-actions

Copy link
Copy Markdown

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

@ronenkat ronenkat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session affinity is an important feature and should be made compatible with IPP which chooses models (which are fewer) and not vLLM nodes.

please also:

  1. add a readme for the plugin.
  2. document code at a function level.

Comment thread pkg/framework/plugins/modelselector/filter/sessionaffinity/plugin.go Outdated
@ronenkat

Copy link
Copy Markdown
Contributor

Please document requirements, for example if storing session id beyond the request context is not allowed, etc..

@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 2a88a36 to 2cfdbb8 Compare June 22, 2026 08:08
@github-actions

Copy link
Copy Markdown

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

@szedan-rh szedan-rh requested a review from ronenkat June 22, 2026 08:09
@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 2cfdbb8 to 89b7845 Compare June 22, 2026 08:22
@github-actions

Copy link
Copy Markdown

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

@ronenkat ronenkat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/plugin.go Outdated
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/plugin.go Outdated
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/plugin.go Outdated
@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 89b7845 to 32a8c55 Compare June 22, 2026 10:11
@github-actions

Copy link
Copy Markdown

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

@ronenkat ronenkat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
We should add a stress test for the session cache to validate that there is no contention on the cache when updating the session cache. Can be in a follow-up.

Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/plugin.go Outdated
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated
@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 32a8c55 to 21a749c Compare June 22, 2026 13:39
@github-actions github-actions Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 22, 2026
@szedan-rh szedan-rh requested a review from ronenkat June 22, 2026 13:50
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated
Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated
@szedan-rh szedan-rh requested a review from ronenkat June 22, 2026 15:02

@ronenkat ronenkat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestions inline

Comment on lines +72 to +74
if c.nowFunc().Sub(entry.lastUsed) > c.ttl {
return "", false
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if c.nowFunc().Sub(entry.lastUsed) > c.ttl {
return "", false
}


// Pass 1: sweep all TTL-expired entries from the tail
elem := c.order.Back()
for elem != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for elem != nil {
for elem != nil && removed < c.minEvictQuantity * 10 {

Introduces a consistent hash routing plugin that maps session IDs to
backends, ensuring the same session always hits the same pod for KV
cache reuse. The plugin implements Filter (for the model-selector
pipeline) and ResponseProcessor (to echo X-Session-Id back to clients).

When no X-Session-Id header is present, a UUID v4 is generated and
returned in the response. Backends with weight <= 0 are excluded from
the hash ring so unhealthy pods are skipped automatically.

Fixes: llm-d#177
Signed-off-by: szedan <szedan@redhat.com>
@szedan-rh szedan-rh force-pushed the session-affinity-routing branch from 21a749c to 8400afa Compare June 22, 2026 20:08
@szedan-rh szedan-rh requested a review from ronenkat June 22, 2026 21:51

@ronenkat ronenkat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@ronenkat

Copy link
Copy Markdown
Contributor

@nirrozenbaum please take a look.
Follow up noted:

  1. Tracking session affinity across IPP pods - Session affinity: cross-node session sharing via shared datastore #187
  2. Stress test on session tracking cache

@nirrozenbaum nirrozenbaum removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants