CFP-44188: Vtep Improvements with CRD#92
Conversation
Replace the static CLI flag-based VTEP configuration with a cluster-scoped CiliumVTEPConfig CRD that supports dynamic updates, per-node assignment via nodeSelector, and per-endpoint status reporting. Signed-off-by: Murat Parlakisik <parlakisik@gmail.com>
e1f2862 to
d0c2ee8
Compare
|
cc: @cilium/sig-datapath |
| If user doesnt want to manage BGP or L2 annoutment to send traffic to some network via external gateway. | ||
| The VTEP approach offers a fundamentally simpler model: | ||
|
|
||
| Pods send traffic via the existing VXLAN overlay directly to an external | ||
| vtep endpoint. No BGP sessions to configure and maintain. No L2 announcement | ||
| policies. No route redistribution. The Cilium agent simply encapsulates | ||
| traffic destined for external CIDRs and sends it to a known VTEP endpoint. |
There was a problem hiding this comment.
BGP and L2 announcements features are about advertising addresses on connected networks in order to configure remote clusters how to transmit towards Cilium. This feature rather seems to be about how to configure Cilium in order to route traffic from Cilium. Are they really equivalent?
|
|
||
| 6. It updates Linux routing table entries for VTEP CIDRs. | ||
|
|
||
| 7. It writes per-endpoint status back to the CRD's `.status` subresource. |
There was a problem hiding this comment.
As a general statement, we avoid putting logic into cilium-agent to update .status because as you scale up, this causes significant load and conflicts on kube-apiserver due to competing agents attempting to make similar updates, often at the same time.
There was a problem hiding this comment.
Probably this is a case for understanding the tradeoff with @cilium/sig-scalability , especially size of targeted environments, and then considering how we might gain the desired operational visibility without introducing scalability concerns.
There was a problem hiding this comment.
I have two high level thoughts, related, which I wonder about for the overall architecture. I don't have a strong opinion on these approaches, but they seem within the possible design space, so they're worth considering:
- Could Cilium integrate natively to the Linux stack to delegate routing of this traffic to the Linux routing table, then have another component sync the desired state into the kernel routing table?
- Alternatively if this is difficult due to VNI selection, could perhaps datapath plugins provide an alternative integration mode?
| - Changed endpoints → `UpdateEntry()` (overwrite) | ||
| - Removed CIDRs → `DeleteByCIDR()` | ||
|
|
||
| 6. It updates Linux routing table entries for VTEP CIDRs. |
There was a problem hiding this comment.
This actually seems like the crux of the functionality proposed here (+ the encap/decap config). I wonder whether we really want to have a dedicated VTEP CRD for this or whether it's time to consider a more generic "routing" CRD (even if initially focused just on the VTEP use case). I had a draft for such an idea about three years ago, but it never got traction so I didn't end up posting it publicly. But if this is interesting, maybe I can dust it off.
|
|
||
| The LPM Trie is strictly more capable. Existing configurations with uniform | ||
| prefix lengths produce identical routing behavior. | ||
|
|
There was a problem hiding this comment.
Key question: How will this integrate with the broader architecture? Ideally whatever we come up with, it has a path to eventually integrate with all other GA features where applicable.
Specifically, consider how does this interface with masquerading, egress gateway, encryption?
|
|
||
| Replace the static CLI flag-based VTEP configuration with a cluster-scoped | ||
| `CiliumVTEPConfig` CRD that supports dynamic updates, per-node assignment via | ||
| `nodeSelector`, and per-endpoint status reporting. This enables production use |
There was a problem hiding this comment.
is "endpoint" here a "VTEP endpoint"? I would suggest not using shorthand, because endpoint is already overloaded (twice - k8s and cilium have established meanings for this word which are not fully aligned)
|
|
||
| ### CRD API | ||
|
|
||
| ```yaml |
There was a problem hiding this comment.
I won't review the API now since there's plenty of other open questions to consider first, but the API will need to be reviewed.
Replace the static CLI flag-based VTEP configuration with a cluster-scoped CiliumVTEPConfig CRD that supports dynamic updates, per-node assignment via nodeSelector, and per-endpoint status reporting.
Signed-off-by: Murat Parlakisik parlakisik@gmail.com