Session affinity routing by szedan-rh · Pull Request #178 · llm-d/llm-d-inference-payload-processor

szedan-rh · 2026-06-18T15:41:51Z

The problem: When a user has a multi-turn conversation, each request might land on a different server. That server has to re-process all the previous context from scratch — wasting GPU time and adding latency.

The fix: We stick each conversation to one server.

How:

First request comes in — we give it a session ID (or the client sends one)
We look at which servers are healthy, then use the session ID to pick one
Every future request with that same session ID goes to the same server
The server's cache stays warm — no re-processing

If a server goes down:

Only the conversations that were on that server get moved
Everyone else is unaffected
The moved conversations lose their cache, but it rebuilds naturally

That's it. Same session, same server, faster responses.

github-actions · 2026-06-18T15:42:23Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

szedan-rh · 2026-06-18T15:42:25Z

@nirrozenbaum - Could you please review?

szedan-rh · 2026-06-18T15:43:54Z

The PR is large because the html visualization how the flow work.

github-actions · 2026-06-18T16:18:18Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

ronenkat

Session affinity is an important feature and should be made compatible with IPP which chooses models (which are fewer) and not vLLM nodes.

please also:

add a readme for the plugin.
document code at a function level.

ronenkat · 2026-06-21T13:19:24Z

Please document requirements, for example if storing session id beyond the request context is not allowed, etc..

github-actions · 2026-06-22T08:08:56Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

github-actions · 2026-06-22T08:22:17Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

ronenkat

Thank you.

github-actions · 2026-06-22T10:11:39Z

⚠️ Large PR detected

Your PR is large. Please consider breaking it into multiple PRs.

The do-not-merge/hold label has been added and can be removed by the reviewers based on their judgement.

ronenkat

Nice.
We should add a stress test for the session cache to validate that there is no contention on the cache when updating the session cache. Can be in a follow-up.

ronenkat

Added suggestions inline

ronenkat · 2026-06-22T16:01:58Z

+	if c.nowFunc().Sub(entry.lastUsed) > c.ttl {
+		return "", false
+	}


Suggested change

if c.nowFunc().Sub(entry.lastUsed) > c.ttl {

return "", false

}

ronenkat · 2026-06-22T16:06:55Z

+
+	// Pass 1: sweep all TTL-expired entries from the tail
+	elem := c.order.Back()
+	for elem != nil {


Suggested change

for elem != nil {

for elem != nil && removed < c.minEvictQuantity * 10 {

Introduces a consistent hash routing plugin that maps session IDs to backends, ensuring the same session always hits the same pod for KV cache reuse. The plugin implements Filter (for the model-selector pipeline) and ResponseProcessor (to echo X-Session-Id back to clients). When no X-Session-Id header is present, a UUID v4 is generated and returned in the response. Backends with weight <= 0 are excluded from the hash ring so unhealthy pods are skipped automatically. Fixes: llm-d#177 Signed-off-by: szedan <szedan@redhat.com>

ronenkat

Thank you.

ronenkat · 2026-06-23T05:05:58Z

@nirrozenbaum please take a look.
Follow up noted:

Tracking session affinity across IPP pods - Session affinity: cross-node session sharing via shared datastore #187
Stress test on session tracking cache

github-actions Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 18, 2026

szedan-rh force-pushed the session-affinity-routing branch from 4c60beb to 2a88a36 Compare June 18, 2026 16:18

github-actions Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2026

szedan-rh mentioned this pull request Jun 20, 2026

Session-affinity routing for KV cache locality #177

Open

ronenkat reviewed Jun 21, 2026

View reviewed changes

Comment thread pkg/framework/plugins/modelselector/filter/sessionaffinity/plugin.go Outdated

szedan-rh force-pushed the session-affinity-routing branch from 2a88a36 to 2cfdbb8 Compare June 22, 2026 08:08

szedan-rh requested a review from ronenkat June 22, 2026 08:09

szedan-rh force-pushed the session-affinity-routing branch from 2cfdbb8 to 89b7845 Compare June 22, 2026 08:22

ronenkat reviewed Jun 22, 2026

View reviewed changes

szedan-rh force-pushed the session-affinity-routing branch from 89b7845 to 32a8c55 Compare June 22, 2026 10:11

szedan-rh requested a review from ronenkat June 22, 2026 10:13

szedan-rh mentioned this pull request Jun 22, 2026

Session affinity: cross-node session sharing via shared datastore #187

Open

8 tasks

ronenkat reviewed Jun 22, 2026

View reviewed changes

szedan-rh force-pushed the session-affinity-routing branch from 32a8c55 to 21a749c Compare June 22, 2026 13:39

github-actions Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 22, 2026

szedan-rh requested a review from ronenkat June 22, 2026 13:50

ronenkat reviewed Jun 22, 2026

View reviewed changes

Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated

Comment thread pkg/framework/plugins/modelselector/scorer/sessionaffinity/sessioncache.go Outdated

szedan-rh requested a review from ronenkat June 22, 2026 15:02

ronenkat reviewed Jun 22, 2026

View reviewed changes

szedan-rh force-pushed the session-affinity-routing branch from 21a749c to 8400afa Compare June 22, 2026 20:08

szedan-rh requested a review from ronenkat June 22, 2026 21:51

ronenkat approved these changes Jun 23, 2026

View reviewed changes

nirrozenbaum removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2026

	if c.nowFunc().Sub(entry.lastUsed) > c.ttl {
	return "", false
	}

	for elem != nil {
	for elem != nil && removed < c.minEvictQuantity * 10 {

Conversation

szedan-rh commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

szedan-rh commented Jun 18, 2026

Uh oh!

szedan-rh commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

ronenkat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ronenkat commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

ronenkat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

ronenkat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ronenkat left a comment

Choose a reason for hiding this comment

Uh oh!

ronenkat Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

ronenkat Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

ronenkat left a comment

Choose a reason for hiding this comment

Uh oh!

ronenkat commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants