From 1c15f83df58d0525ae7624d002f75a5c92faecaf Mon Sep 17 00:00:00 2001 From: Trung Nguyen Date: Tue, 28 Oct 2025 09:49:27 -0500 Subject: [PATCH 1/2] Add Cilium Feature Proposal for oci cloud provider design. Signed-off-by: Trung Nguyen --- cilium/CFP-42453-oci-cloud-provider-design.md | 173 ++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 cilium/CFP-42453-oci-cloud-provider-design.md diff --git a/cilium/CFP-42453-oci-cloud-provider-design.md b/cilium/CFP-42453-oci-cloud-provider-design.md new file mode 100644 index 0000000..b1a0811 --- /dev/null +++ b/cilium/CFP-42453-oci-cloud-provider-design.md @@ -0,0 +1,173 @@ +# CFP-42453: OCI Cloud Provider Design + +**SIG: SIG-COMMUNITY** + +**Begin Design Discussion:** 2025-10-15 + +**Cilium Release:** X.XX + +**Authors:** Trung Nguyen + +**Status:** Draft + +## Summary + +This document provides details for integration with OCI Oracle Kubernetes Engine to implement a Cilium Direct Routing solution. + +## Motivation + +Many customers are requesting support for Cilium on OCI. This feature proposal will provide guidance on various implementation options. + +## Goals + +* Discuss possible solutions for integrating OCI with Cilium IPAM solutions for Direct Routing +* Determine short term solutions for providing an integration with Cilium +* Determine long term solutions for providing an integration with Cilium where users can specify which VNIC a pod can route out of (see examples below for details) + +## Non-Goals + +* Timelines of integrations + +## Proposal + +### Background + +OCI Oracle Kubernetes Engine (OKE) provides controls for the user to determine how many VNICs should be attached to a node and which VNIC a pod should route out of. + +*Note: The existing OCI OKE solution uses a pre-attach model, where the Cloud Controller Manager preallocates all VNICs/IPs and does not perform any detaches (detaches happen upon node termination), and then the CNI only sets up OS level network routing to the pods* + +#### Example 1 + +A node has 1 VNIC with 256 IPs attached to it. Host and Pod traffic will route out of this VNIC. + +``` ++--------------------------------------+ +| Kubernetes Node | +| | +| +--------+ | +| | VNIC 1 | | +| +--------+ | +| | | +| +--------+ | +| | Pod A | | +| +--------+ | +| | +| Pod A routes traffic via VNIC1 | ++--------------------------------------+ +``` + +#### Example 2 + +A node has 3 VNIC. Each VNIC has 256 IPs attached. Host traffic routes out of the Primary VNIC (via the default route). And a user can use multus to specify that Pod B can route out of VNIC 2 or VNIC 3 depending on host routing rules. + +``` ++--------------------------------------------------+ +| Kubernetes Node | +| | +| +--------+ +--------+ +--------+ | +| | VNIC 1 | | VNIC 2 | | VNIC 3 | | +| +--------+ +--------+ +--------+ | +| | | | | +| | +------+ | | +| | |Pod B |---------- | +| (Node Traffic) +------+ | +| | +| - Pod B can route traffic via VNIC 2 or VNIC 3 | +| - Node-level traffic exits via VNIC 1 | ++--------------------------------------------------+ +``` + +### Overview + +Cilium documents several existing [IPAM solutions](https://docs.cilium.io/en/stable/network/kubernetes/ipam/): + +* Out-of-tree solution that attaches an IP CIDR block and sets `v1.node.spec.podCIDR`. Cilium's "kubernetes" IPAM configuration knows how to process `v1.node.spec.podCIDR` +* In-tree solution that extends the Cilium controller to make OCI calls to attach VNICs/IPs and populate IPAM +* Out-of-tree solution that populates the Cilium IPAM Custom Resource +* Out-of-tree Delegated IPAM binary that is compatible with Cilium + +#### Kubernetes IPAM solution + +There will be a component outside the scope of Cilium (e.g. a leader elected component like Cloud Controller Manager or a per-node daemonset) that attaches a CIDR block to the Primary VNIC of the node and populates the `v1.Node.spec.podCIDR` field. + +Cilium has a built-in `kubernetes` IPAM solution, which provides a simple, cloud agnostic solution (implementation-wise, the solution just populates the v1.Node object, so it does not make any assumptions about what CNI is being used). + +However, only 1 CIDR block is attachable (`v1.Node.spec.podCIDRs` does not allow you to attach multiple blocks, except to have one IPv4 block and IPv6 block). This solution will not be compatible with requirements for using multiple VNICs (OCI has a basic (non-Cilium) offering that supports separating node traffic from pod traffic onto separate VNICs). + +Pros: +- Very Simple +- All Cloud Provider changes are CNI Agnostic + +Cons: +- Only one CIDR block can be attached +- Will not fit the multi-VNIC model + +#### In-tree Extending the Cilium Operator + +The CiliumNode CRD will be updated to have Oracle related fields and the Cilium Operator will be extended to have access to the OCI-Go-SDK. It will perform the VNIC/IP attaches and the CNI IPAM will be updated to know how to route out of specific VNICs. + +Pros: +- can support a multi-VNIC model +- requires change in a single component (Cilium Operator) + +Cons: +- requires changes/coordination between Cloud Provider and Cilium + +#### Out-of-tree solution to populate Cilium IPAM Custom Resource + +*Note: This is similar to the previous solution. However, since OKE already has a process to attach VNICs/IPs, we can use this existing functionality.* + +The Cilium Agent will generate the Custom Resource object (via `--auto-create-cilium-node-resource`), and a component outside the scope of Cilium (e.g. the Cloud Controller Manager) will attach IP Addresses and populate the Custom Resource objects. + +*Note: This solution will require a CNI change to add the option to choose which VNIC to route out of* + +Pros: +- with CNI changes, this solution can support a multi-VNIC model + +Cons: +- requires changes in two separate components (Cloud Controller Manager and CNI) + +#### Out-of-tree Delegated IPAM binary + +OCI OKE already has a built-in process (in the Cloud Controller Manager) to attach the appropriate VNICs/IPs. OCI OKE's existing CNI IPAM plugin has the capability to choose a VNIC to route out of and would be modified to become compatible with Cilium as a delegated IPAM plugin. + +Pros: +- can support a multi-VNIC model +- requires change in a single component (CNI binary) + +Cons: +- requires changes/coordination between Cloud Provider and Cilium to test IPAM + +## Impacts / Key Questions + +_List crucial impacts and key questions. They likely require discussion and are required to understand the trade-offs of the CFP. During the lifecycle of a CFP, discussion on design aspects can be moved into this section. After reading through this section, it should be possible to understand any potentially negative or controversial impact of this CFP. It should also be possible to derive the key design questions: X vs Y._ + +### Impact: Integration with OCI + +### Impact: Increased Maintenanc + +Depending on the chosen solution, there may be additional maintenance and coordination required. + +For in-tree solutions, the controller will need to be updated. + +For out-of-tree solutions, there may be additional integration testing. + +### Key Question: Cilium Testing + +How does Cilium perform CNI testing? What solution will require the least amount of maintenance/coordination? + +### Key Question: Cloud Provider Support Model + +What does the support model look like for Cloud Providers for integrated pieces, like the Cilium Operator? + +### Key Question: Multi-VNIC Solution + +Is there any negative impact from implementing a Multi-VNIC solution? + +## Future Milestones + +### Multi-VNIC Solution + +Long term, OKE will provide a multi-NIC solution, where users can route pods out of VNICs. + +TODO: Provide more details on this From 0fb7ea6c272b59c6848cd143fb23c2c0320beefe Mon Sep 17 00:00:00 2001 From: Trung Nguyen Date: Mon, 11 May 2026 17:57:46 -0400 Subject: [PATCH 2/2] Update CFP with OCI's short term and long term evaluations Signed-off-by: Trung Nguyen --- cilium/CFP-42453-oci-cloud-provider-design.md | 32 ++++++++++++------- 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/cilium/CFP-42453-oci-cloud-provider-design.md b/cilium/CFP-42453-oci-cloud-provider-design.md index b1a0811..884a2f4 100644 --- a/cilium/CFP-42453-oci-cloud-provider-design.md +++ b/cilium/CFP-42453-oci-cloud-provider-design.md @@ -144,30 +144,38 @@ _List crucial impacts and key questions. They likely require discussion and are ### Impact: Integration with OCI -### Impact: Increased Maintenanc - -Depending on the chosen solution, there may be additional maintenance and coordination required. - -For in-tree solutions, the controller will need to be updated. - -For out-of-tree solutions, there may be additional integration testing. +The Kubernetes IPAM solution requires the least effort to integrate with Cilium while also providing a Kubernetes Agnostic solution. All other solutions require additional maintenance or coordination to implement. ### Key Question: Cilium Testing -How does Cilium perform CNI testing? What solution will require the least amount of maintenance/coordination? +> How does Cilium perform CNI testing? What solution will require the least amount of maintenance/coordination? + +Cloud providers are responsible for their own solutions. ### Key Question: Cloud Provider Support Model -What does the support model look like for Cloud Providers for integrated pieces, like the Cilium Operator? +> What does the support model look like for Cloud Providers for integrated pieces, like the Cilium Operator? + +The team suggests not implementing in-tree solutions into the Cilium Operator because of the required coordination. ### Key Question: Multi-VNIC Solution -Is there any negative impact from implementing a Multi-VNIC solution? +> Is there any negative impact from implementing a Multi-VNIC solution? + +Cilium does not have strong support for Multi-VNIC solutions right now. ## Future Milestones +## Single-VNIC Solution + +As a first pass, OCI will implement the Kubernetes IPAM solution. + +Ultimately, the simplicity of the solution along with the CNI agnostic behavior made this the most attractive short term solution. + ### Multi-VNIC Solution -Long term, OKE will provide a multi-NIC solution, where users can route pods out of VNICs. +Long term, OKE will reevalute multi-NIC solutions, where users can route pods out of VNICs, when Dynamic Resource Allocation (DRA) provides better support. + +[DRANet](https://github.com/google/dranet) is a possible future solution (it would need to incorporate [consumable capacity](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#consumable-capacity)). -TODO: Provide more details on this +[Node Resource Interface](https://github.com/containerd/nri) is another possible alternative. \ No newline at end of file