Skip to content

CFP-45836: Multicast Egress as CiliumEgressGatewayPolicy Extension#95

Open
kkroo wants to merge 1 commit into
cilium:mainfrom
kkroo:cfp-45836-multicast-egress
Open

CFP-45836: Multicast Egress as CiliumEgressGatewayPolicy Extension#95
kkroo wants to merge 1 commit into
cilium:mainfrom
kkroo:cfp-45836-multicast-egress

Conversation

@kkroo
Copy link
Copy Markdown

@kkroo kkroo commented May 8, 2026

This CFP proposes extending CiliumEgressGatewayPolicy.destinationCIDRs to accept multicast CIDRs (e.g., 232.0.0.0/4). When matched, the BPF datapath redirects via the existing VXLAN egress-gateway path to the selected gateway node, SNATs source IP to egressIP, and clone_redirects to the gateway's primary interface — letting the host kernel handle MAC-layer multicast and the upstream network handle PIM-SSM tree distribution.

No new CRD type, no new fields. Same operator UX as unicast egress.

Why now

The existing multicast-enabled feature handles cluster-internal pod-to-pod fanout via VXLAN. CiliumEgressGatewayPolicy handles unicast egress to external networks with SNAT. There's a gap between the two: pods publishing SSM multicast to upstream multicast-aware networks (PIM-SSM peers, AMT relays per RFC 7450, middle-mile AMT tunneling per draft-zzhang-mboned-dynamic-internet-mcast-tunnel-00) have no path. Today their multicast packets are dropped between the pod's veth and the host's eth0.

Use cases include live media distribution (SSM video CDN-style), financial market data feeds (most exchanges deliver via SSM), and internal enterprise multicast trees.

Why CEGP extension vs. new CRD

Reading the existing code in bpf/lib/egress_gateway.h, bpf/bpf_host.c:1425, bpf/bpf_overlay.c, and pkg/egressgateway/policy.go showed the surface area is much smaller than initially scoped:

  • The cilium_egress_gw_policy_v4 map is already an LPM_TRIE — a single 232.0.0.0/4 entry covers SSM with one map key. No schema migration.
  • The userspace controller already accepts multicast CIDRs in dstCIDRs []netip.Prefix without filtering.
  • bpf/bpf_lxc.c:1715 already has multicast detection for the multicast-enabled feature, with fall-through to standard egress when no in-cluster subscribers exist. No pod-side hook changes needed.
  • The actual EGW lookup in bpf_host.c:1425 → egress_gw_handle_request → egress_gw_handle_packet is already LPM and would already match multicast policy entries.

The substantive datapath work is small: confirm lookup_ip4_remote_endpoint returns a sensible identity for multicast destinations (probably WORLD, but verify), and add a clone_redirect-instead-of-fib_lookup branch on the gateway-node from-overlay path. Estimated 2–3 weeks of focused work.

Issue

cilium/cilium#45836

Coexistence with multicast-enabled

The two features operate on disjoint subscriber-sets but share the policy / map infrastructure. When both are enabled and a publisher pod sends to a group with both local-pod subscribers and a CEGP egress entry, both deliveries happen — packets replicate to local pods (existing path) AND emit on the egress-gateway node's eth0 (new path). The two paths don't conflict because subscriber types are disjoint.

Asks

Comment on the design, especially:

  1. The decision to extend CiliumEgressGatewayPolicy rather than create a parallel CiliumMulticastEgressPolicy type.
  2. Whether lookup_ip4_remote_endpoint returns WORLD identity for multicast destinations today, or if we need an explicit IN_MULTICAST(daddr) → WORLD carve-out before the existing identity_is_cluster() check.
  3. Whether to skip the CT path for multicast egress (Key Question 2 in the CFP body) or leave it alone.
  4. Right SIG to engage — sig-datapath? sig-policy? Both?

Also: is anyone already working on multicast egress upstream? Is Isovalent's enterprise IP Multicast feature on a path to OSS contribution that would obviate this?

Happy to iterate before implementation begins.

Extends CiliumEgressGatewayPolicy.destinationCIDRs to accept multicast
CIDRs (e.g., 232.0.0.0/4). On match, the BPF datapath redirects via the
existing VXLAN egress-gateway path to the selected gateway node, SNATs
source IP to egressIP, and clone_redirects to the gateway's primary
interface — letting the host kernel handle MAC-layer multicast.

No new CRD type. Same operator UX as unicast egress. Designed atop the
existing LPM_TRIE map; minimal datapath additions on origin
(bpf_host.c:1425) and gateway (bpf_overlay.c:359 + new clone_redirect
branch).

Issue: cilium/cilium#45836

Signed-off-by: Omar Ramadan <ramadan@blockcast.net>
Signed-off-by: Omar Ramadan <omar@blockcast.net>
@kkroo kkroo force-pushed the cfp-45836-multicast-egress branch from eb06f72 to 9fb4bc3 Compare May 8, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant