-
Notifications
You must be signed in to change notification settings - Fork 50
CFP-41292 : xDS-controlled L4 LoadBalancer #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
tsotne95
wants to merge
1
commit into
cilium:main
Choose a base branch
from
tsotne95:xds-l4-lb-cfp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| # CFP-41292: xDS-controlled Standalone L4 LoadBalancer | ||
|
|
||
| **SIG:** SIG-LB | ||
|
|
||
| **Begin Design Discussion:** 2025-08-04 | ||
|
|
||
| **Cilium Release:** 1.19 | ||
|
|
||
| **Authors:** Tsotne Chakhvadze, <tsotne@google.com> | ||
|
|
||
| **Status:** Provisional | ||
|
|
||
| ## Summary | ||
|
|
||
| This proposal introduces a powerful, API-driven way to manage **L4 load balancers** in Cilium using the standard xDS protocol. This allows users to dynamically configure L4 VIPs and backends from any xDS-compatible control plane, decoupling load balancing from Kubernetes Services for greater flexibility. This proposal is **exclusively focused on L4 load balancing** and does not include L7 features. This empowers platform builders and users with custom control planes to leverage Cilium's high-performance datapath more effectively for TCP/UDP services. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Cilium's original L4 load balancer (introduced in v1.10) was a powerful feature, but it had design limitations around management and persistence that led to its deprecation. However, the need for a robust, dynamically-configured L4 load balancer that is not tied to Kubernetes Services remains strong within the community. The exploration of a filesystem-based LB configuration ([cilium/cilium#39117](https://github.com/cilium/cilium/pull/39117)) is a clear indicator of this demand. | ||
|
|
||
| By integrating a native xDS client focused specifically on L4 constructs, we can provide a standard, industry-recognized API for this functionality. Previous initiatives introduced an experimental xDS client into Cilium; this proposal aims to build on that work to deliver a complete, stable, and well-supported feature for **L4 load balancing**. This aligns Cilium with modern, API-driven infrastructure practices and opens the door for deeper integration with a wide ecosystem of tools for service discovery and load balancing. | ||
|
|
||
| Furthermore, this provides a path to unify different L4 service discovery mechanisms in the future, such as the MCS (Multi-Cluster Services) service importer, leading to a more maintainable and streamlined codebase. | ||
|
|
||
| ## Goals | ||
|
|
||
| * Enable the programming of an **L4 LoadBalancer**, including a VIP and its backend endpoints, using xDS as the control plane signal. | ||
| * Provide a stable, maintainable, and well-documented **L4 LB feature** that is decoupled from Kubernetes services. | ||
| * Implement the feature in a way that is compatible with and can eventually support the MCS Service Importer, unifying L4 control paths. | ||
| * Complete and stabilize the xDS client integration in Cilium for **L4 load-balancing use cases**. | ||
|
|
||
| ## Non-Goals | ||
|
|
||
| * This CFP is not proposing to build an xDS management server into Cilium. The xDS server is assumed to be an external component. | ||
| * **Support for any L7 xDS features** (e.g., HTTP routing via RDS, traffic splitting) is out of scope. This proposal is **strictly for L4 load-balancing** (TCP/UDP). | ||
| * This feature is not intended to replace Gateway API, GAMMA, or standard Kubernetes Service discovery, but rather to provide an alternative for standalone L4 LB use cases. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Overview | ||
|
|
||
| The Cilium agent will be enhanced to run an xDS client that connects to an external xDS management server. This client will be responsible for fetching **L4 load-balancing configuration** and programming it into Cilium's datapath via StateDB. | ||
|
|
||
| The feature will be disabled by default and can be enabled via a new configuration flag in the Cilium Helm chart / ConfigMap. | ||
|
|
||
| ### Configuration | ||
|
|
||
| A new configuration block will be added to the Cilium configuration: | ||
|
|
||
| ```yaml | ||
| xds-control-plane: | ||
| # Enables the xDS client in the Cilium agent for L4 LB | ||
| enabled: false | ||
| # Address of the external xDS management server | ||
| server-address: "xds.my-company.com:9000" | ||
| # Optional: Path to a client certificate for mTLS authentication | ||
| client-cert-path: "/var/lib/cilium/tls/xds-client.crt" | ||
| # Optional: Path to a client key for mTLS authentication | ||
| client-key-path: "/var/lib/cilium/tls/xds-client.key" | ||
| # Optional: Path to the CA certificate for verifying the xDS server | ||
| ca-cert-path: "/var/lib/cilium/tls/xds-ca.crt" | ||
| ``` | ||
|
|
||
| ### xDS Resource Mapping for L4 Load Balancing | ||
|
|
||
| The Cilium xDS client will subscribe to `Cluster` and `ClusterLoadAssignment` (CLA) resources to configure L4 load balancers. The agent will specifically look for `Cluster` resources containing Cilium-specific metadata to identify them as L4 LB services. | ||
|
|
||
| Here is a concrete example of how a user would define a simple TCP load balancer for a database service: | ||
|
|
||
| ```yaml | ||
| # These resources would be served by an external xDS management server. | ||
|
|
||
| --- | ||
| # Resource 1: The Cluster, defining the L4 service front-end (the VIP). | ||
| # The Cilium agent will watch for Clusters with the 'io.cilium.l4lb' metadata. | ||
| resource: | ||
| '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster | ||
| name: my-database-service | ||
| # This metadata block is the key part for Cilium. It defines the L4 LB properties. | ||
| metadata: | ||
| filter_metadata: | ||
| io.cilium.l4lb: | ||
| vip: "10.1.2.3" | ||
| port: 3306 | ||
| protocol: TCP | ||
| # Tells the xDS client to fetch endpoints via EDS for this cluster. | ||
| type: EDS | ||
| eds_cluster_config: | ||
| service_name: my-database-service # Links this Cluster to its endpoints | ||
|
|
||
| --- | ||
| # Resource 2: The ClusterLoadAssignment, defining the service backends. | ||
| resource: | ||
| '@type': type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment | ||
| cluster_name: my-database-service # Must match the service_name/name above | ||
| endpoints: | ||
| - lb_endpoints: | ||
| - endpoint: | ||
| address: | ||
| socket_address: { address: 192.168.1.10, port_value: 3306 } | ||
| - endpoint: | ||
| address: | ||
| socket_address: { address: 192.168.1.11, port_value: 3306 } | ||
| ``` | ||
|
|
||
| This information will be used to populate the `statedb.DB` with `tables.Services` and `tables.Backends`, which the datapath already uses for service translation. | ||
|
|
||
| ### User Experience | ||
|
|
||
| The user experience will be analogous to the file-based L4 LB proposal, but far more powerful and scalable. Instead of managing a local JSON file on each node, users will manage a centralized set of standard `Cluster` and `ClusterLoadAssignment` resources in their xDS management server. This provides a centralized and dynamic control plane for managing potentially thousands of L4 services and backends across a fleet of Cilium-managed nodes. | ||
|
|
||
| ## Impacts / Key Questions | ||
|
|
||
| ### Impact: New Agent Component | ||
|
|
||
| This introduces a new, long-running gRPC client to the Cilium agent. | ||
| * **Resource Usage**: This will add a baseline CPU and memory overhead to the agent for maintaining the gRPC connection and processing xDS updates. | ||
| * **Complexity**: It adds a new component that needs to be maintained, including its connection logic, error handling, and security. | ||
|
|
||
| ### Key Question: Connection Management and Security | ||
|
|
||
| * How should the agent handle startup if the xDS server is unavailable? Should it fail, or start and retry in the background? (Proposal: Start and retry, to avoid blocking agent startup). | ||
| * What is the strategy for securing the connection to the xDS server? (Proposal: Support mTLS, configurable via the agent settings). | ||
|
|
||
| ### Key Question: xDS Schema and Mapping | ||
|
|
||
| * The exact mapping from xDS `Cluster` and `ClusterLoadAssignment` fields to Cilium's service/backend model needs to be precisely defined. For example, how do we represent a VIP, which is not a standard field in the `Cluster` resource? | ||
| * **Option 1: Use `metadata`:** We could use the `metadata` field on the `Cluster` resource to carry the VIP address and other Cilium-specific configuration. This is a common xDS pattern for extensibility and is flexible, but requires the user to structure the metadata correctly. | ||
| * **Option 2: Define a custom xDS resource:** This is a much heavier lift and would require upstream xDS changes, so it is not preferred. | ||
|
|
||
| We will proceed with **Option 1** and clearly document the expected `metadata` structure, as shown in the example above. | ||
|
|
||
| ## Future Milestones | ||
|
|
||
| ### MCS Service Importer Integration | ||
|
|
||
| Once this feature is stable, the MCS (Multi-Cluster Services) service importer can be refactored to use this xDS client as its backend for L4 service discovery, rather than directly programming services itself. This would unify the standalone and multi-cluster L4 LB implementation paths, reducing complexity and improving maintainability. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand more on in what ways would you be able to refactor MCS support with XDS? Are you talking about Cilium ClusterMesh or third party implementations (like GKE I'm assuming) that would use this to do their own non ClusterMesh MCS implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsotne95 -- I think this refers to the Google-specific MCS implementation, which is a detail that doesn't make sense for an open-source upstream proposal. I know it make sense for our systems, but its probably not very relevant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using xDS for the MCS flow is meant to streamline any Multi-Cluster Services implementation in Cilium - not just the Google/GKE variant. The current upstream MCS controller (pkg/clustermesh/mcsapi) constructs
ServiceImportsand then directly programs derived Services and backends to drive Cilium’s datapath. Once the xDS-based L4 LB is available, that controller could instead translate theServiceImportstate intoClusterandClusterLoadAssignmentresources and feed them through the same xDS client used by standalone L4 LB. This removes the special-case service-programming logic in the MCS code path and lets both ClusterMesh and third‑party MCS solutions reuse a single, well-documented control-plane interface.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok -- that context really helps.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional context!
One useful thing with the current implementation is that ServiceImport implemented using a derived Service technically inherently brings an additional feature allowing users to target the derived Service with a third party controller not directly supporting MCS-API, for instance one useful example is the Prometheus operator.
I also reckon that since this exist https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/ipallocator/ipallocator.go it seems possible to allocate an IP which removes an additional complexity of allocating the service IPs for a native implementation. But what's the advantage of using a XDS implementation vs having the agent watching for ServiceImport directly anyway?
Also FYI there's all the logic to sync the backends across clusters which is tied to the global derived Service, this is definitely not an easy task...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the questions. The CFP’s primary goal is a standalone
xDS‑controlled L4 load balancer. The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert.As I see, Cilium’s upstream MCS support has dedicated controllers that synthesize a “derived” Service and mirrored EndpointSlices to drive the datapath. The
mcsAPIServiceReconcilercreates a new Service annotated as global so ClusterMesh logic programs the VIP and backends. A companionmcsAPIEndpointSliceMirrorReconcilercopies each local EndpointSlice into that derived Service. (https://github.com/cilium/cilium/blob/main/pkg/clustermesh/mcsapi/README.md)Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources instead of constructing derived Services and mirrored slices. The agent’s xDS client would program VIPs and endpoints directly, so the mentioned controllers above could be removed or greatly simplified.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure I got that, but the CFP quote several time MCS as a possible future benefit but I am a bit skeptic on whether or not implementing MCS that way is actually a good choice.
My main question is assuming we want to have a native implementation of MCS-API that does not create derived Service (which is already a big question mark), why that would be easier to do that with a xDS server vs just making the cilum-agent start watching the ServiceImport resources and associated EndpointSlice? I don't really get why we would want to rely on xDS here, it seems to go against your goal of "unify different L4 service discovery mechanisms" if for instance load balacing for Service still rely on watching the kube-apiserver directly (+ the ClusterMesh controlplanes for Global Services not directly related to MCS) and that for MCS there would be some xDS server involved instead of watching kube-apiserver + the ClusterMesh controlplanes for other clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I'm pretty sure we don't know each other, so I don't like your tone.
I want to start by saying that I don’t share your view here. The suggestion to simply have the agent watch ServiceImport/EndpointSlice directly misses the larger architectural picture and, in my opinion, works against the goal of simplifying and unifying service discovery in Cilium.
One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate
Service,MCS, andClusterMeshcodepaths in the agent.Protocol semantics you'd otherwise re-implement.
xDS gives you ACK/NACK, versioning, and delta (incremental) updates. Those are valuable under churn (endpoint scale-out/scale-in across clusters) and during rollouts. Doing raw informer watches in every agent means you have to home-grow back-pressure and consistency logic (or accept more flapping). With xDS, these semantics are built-in and widely tested.
Less load on the kube-apiserver.
With
watch directly in every agent, N agents watch MCS objects and each performs similar merges. With xDS, a central (or sharded) translator watches ServiceImport/EndpointSlice once, computes the desired VIP/backends, and fans out compact deltas to agents. This reduces apiserver fan-out and gives you a place to shard/cache if a fleet grows.Cleaner separation of concerns.
The producer (MCS translator) knows Kubernetes types. the consumer (agent) only knows
ClustersandEndpoints(CDS/EDS). That decouples Kubernetes schema evolution from the agent. It also keeps the agent's L4 logic identical whether the source is a local Service, a ClusterMesh export, or MCS.Extensibility you’ll need later.
The moment you want per-cluster weights, locality hints, failover policies, or to ingest services from non-Kubernetes registries, you don’t touch the agent—only emit the same xDS resources. That’s exactly why xDS exists across service-mesh ecosystems.
I believe this makes the reasoning behind my proposal clear. I don't intend to keep going in circles on this point - my position is that xDS provides the unification and scalability properties we want, while duplicating watchers in the agent does not.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if it appeared that way, I am genuinely interested to know what you are proposing might (since it's a future goal) impact the ClusterMesh area and what you wrote here is very helpful to me to know what this is about.
Ah ok this was the main point that I was missing! The CFP seems to suggest that the xDS server would be an optional thing though, but I am assuming that ideally you would want this to become the primary way to do load balancing in Cilium and eventually sunset the per agent Service/EndpointSlice watchers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are very good questions!
I think one of the motivators was the feature request by some users at the Dev Summit for integration via xDS -- as this is a common control plane used in many places (I think this statement is still true excluding my own biases :-) ). It makes Cilium an even more useful infrastructure building block as brings capabilities that will be hard to replicate in K8s API infra such as dynamic load-balancing weights.
That being said, I am agreed that we don't want to create a partial solution in the project, so the some e2e solution will need to exist. I think the scope of such a thing will be something that requires discussion. My opinion is that the server should be mostly off-the-shelf and meet the basic user journeys, rather than trying to build an expansive feature at least at this point.
Whether or not MCS will reuse this mechanism -- I do know there are some drawbacks to the derived Service approach in that it causes extra load on the API server.
Regarding how we support this in the project -- I view this completing the experimental xDS client that exists today. We should have strict requirements that if this doesn't get usage, then we can remove it entirely to avoid paying a maintenance cost.