generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 211
Open
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
Motivation
For Active-Active Multi-Replica HA enablement in llm-d's precise prefix-cache indexing, a new sub-system is introduced to handle discovery and subscription to vLLM KVEvent publishers.
For the discovery part, informer-like functionality is required. For standalone deployments, I appended a controller example, but for proper integration with the llm-d inference-scheduler, the logic should be provided through the data-layer.
Proposal
After discussions with @elevran, we propose adding a Pod lifecycle source into the data-layer, for the precise-prefix-cache scorer to subscribe to.
Metadata
Metadata
Assignees
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.