Skip to content

CFP-44188: Vtep Improvements with CRD#92

Open
parlakisik wants to merge 1 commit into
cilium:mainfrom
parlakisik:feature/vtep-improvements
Open

CFP-44188: Vtep Improvements with CRD#92
parlakisik wants to merge 1 commit into
cilium:mainfrom
parlakisik:feature/vtep-improvements

Conversation

@parlakisik
Copy link
Copy Markdown

Replace the static CLI flag-based VTEP configuration with a cluster-scoped CiliumVTEPConfig CRD that supports dynamic updates, per-node assignment via nodeSelector, and per-endpoint status reporting.

Signed-off-by: Murat Parlakisik parlakisik@gmail.com

Replace the static CLI flag-based VTEP configuration with a cluster-scoped CiliumVTEPConfig CRD that supports dynamic updates,
per-node assignment via nodeSelector, and per-endpoint status reporting.

Signed-off-by: Murat Parlakisik <parlakisik@gmail.com>
@parlakisik parlakisik force-pushed the feature/vtep-improvements branch from e1f2862 to d0c2ee8 Compare April 8, 2026 08:00
@xmulligan
Copy link
Copy Markdown
Member

cc: @cilium/sig-datapath

Comment on lines +52 to +58
If user doesnt want to manage BGP or L2 annoutment to send traffic to some network via external gateway.
The VTEP approach offers a fundamentally simpler model:

Pods send traffic via the existing VXLAN overlay directly to an external
vtep endpoint. No BGP sessions to configure and maintain. No L2 announcement
policies. No route redistribution. The Cilium agent simply encapsulates
traffic destined for external CIDRs and sends it to a known VTEP endpoint.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BGP and L2 announcements features are about advertising addresses on connected networks in order to configure remote clusters how to transmit towards Cilium. This feature rather seems to be about how to configure Cilium in order to route traffic from Cilium. Are they really equivalent?


6. It updates Linux routing table entries for VTEP CIDRs.

7. It writes per-endpoint status back to the CRD's `.status` subresource.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general statement, we avoid putting logic into cilium-agent to update .status because as you scale up, this causes significant load and conflicts on kube-apiserver due to competing agents attempting to make similar updates, often at the same time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is a case for understanding the tradeoff with @cilium/sig-scalability , especially size of targeted environments, and then considering how we might gain the desired operational visibility without introducing scalability concerns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two high level thoughts, related, which I wonder about for the overall architecture. I don't have a strong opinion on these approaches, but they seem within the possible design space, so they're worth considering:

  1. Could Cilium integrate natively to the Linux stack to delegate routing of this traffic to the Linux routing table, then have another component sync the desired state into the kernel routing table?
  2. Alternatively if this is difficult due to VNI selection, could perhaps datapath plugins provide an alternative integration mode?

- Changed endpoints → `UpdateEntry()` (overwrite)
- Removed CIDRs → `DeleteByCIDR()`

6. It updates Linux routing table entries for VTEP CIDRs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems like the crux of the functionality proposed here (+ the encap/decap config). I wonder whether we really want to have a dedicated VTEP CRD for this or whether it's time to consider a more generic "routing" CRD (even if initially focused just on the VTEP use case). I had a draft for such an idea about three years ago, but it never got traction so I didn't end up posting it publicly. But if this is interesting, maybe I can dust it off.


The LPM Trie is strictly more capable. Existing configurations with uniform
prefix lengths produce identical routing behavior.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key question: How will this integrate with the broader architecture? Ideally whatever we come up with, it has a path to eventually integrate with all other GA features where applicable.

Specifically, consider how does this interface with masquerading, egress gateway, encryption?


Replace the static CLI flag-based VTEP configuration with a cluster-scoped
`CiliumVTEPConfig` CRD that supports dynamic updates, per-node assignment via
`nodeSelector`, and per-endpoint status reporting. This enables production use
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "endpoint" here a "VTEP endpoint"? I would suggest not using shorthand, because endpoint is already overloaded (twice - k8s and cilium have established meanings for this word which are not fully aligned)


### CRD API

```yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't review the API now since there's plenty of other open questions to consider first, but the API will need to be reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants