Skip to content

Conversation

@simcod
Copy link
Contributor

@simcod simcod commented Nov 12, 2025

Description

This PR adds the new MEP-19: Zone Awareness in metal-stack.io.

@metal-robot metal-robot bot added the area: documentation Affects the documentation area. label Nov 12, 2025
@netlify
Copy link

netlify bot commented Nov 12, 2025

Deploy Preview for metal-stack-io ready!

Name Link
🔨 Latest commit 3750c0c
🔍 Latest deploy log https://app.netlify.com/projects/metal-stack-io/deploys/693949f19e38e90008cbaad6
😎 Deploy Preview https://deploy-preview-147--metal-stack-io.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@Gerrit91 Gerrit91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up. Here is a first set of suggestions, mainly for the introductory sections.

To support explicit region and zone concepts in metal-stack, several functional and architectural requirements must be met. The following considerations focus primarily on the Kubernetes integration and cluster topology aspects:
- Proper spreading of worker nodes and control plane components across [multiple zones](https://kubernetes.io/docs/setup/best-practices/multiple-zones/) and regions must be possible.
- Nodes that belong to the same Kubernetes cluster must have the capability to communicate directly with each other, even if they are located in different partitions, provided that network configurations allow this communication using their respective Node CIDRs.
- It must be possible for nodes within a single Kubernetes cluster to use different Node CIDR ranges, depending on their partition or zone assignment. Major cloud providers use node groups to configure Node CIRDs differently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? In GCP this is not the case, node IPs are not in different CIDR ranges.

Suggested change
- It must be possible for nodes within a single Kubernetes cluster to use different Node CIDR ranges, depending on their partition or zone assignment. Major cloud providers use node groups to configure Node CIRDs differently.
- It must be possible for nodes within a single Kubernetes cluster to use different Node CIDR ranges, depending on their partition or zone assignment. Major cloud providers use node groups to configure Node CIDRs differently.

- Zones stay separate failure domains (e.g. a failure in the EVPN control-plane of one zone should not affect the other to avoid EVPN fate-sharing)

## Criteria
- Number of hops: for communication btw. worker nodes, to the internet and to the storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduction sentence is necessary. Which criteria do we talk about?

@Gerrit91 Gerrit91 changed the title MEP-19: Zone Awareness in metal-stack.io MEP-19: Zone Awareness Nov 17, 2025
@Gerrit91 Gerrit91 added the triage This should be talked about in the next planning. label Nov 17, 2025
@iljarotar iljarotar moved this to In Progress in Development Nov 17, 2025
@metal-robot metal-robot bot removed the triage This should be talked about in the next planning. label Nov 17, 2025
@Gerrit91 Gerrit91 moved this from In Progress to Upcoming in Development Dec 1, 2025
@dhilgarth
Copy link

Some feedback from "the outside": This MEP so far focuses heavily on how the traffic between the zones could be routed. What seems to be missing is how zone awareness would work on a conceptual level, how would it look like in the CLI etc.

Some questions:

  • Would it be a "zone spreading" with a configuration like "in each zone, have at least X nodes"? Or is it rather: "Have at least X nodes in zone A" and if I want to ensure that in each zones a minimum number of nodes, I would have one such configuration per zone?
  • What about the firewalls? In the context of Gardener and also Cluster API, each Kubernetes cluster has its own firewall machine.
    • It's unclear, whether "firewall" is synonymous with the firewall created by metal-stack for a cluster?
    • It's unclear where the metal-stack firewall of a cluster is placed in the context of multiple zones, whether it still will be a single firewall or multiple
    • It's unclear how traffic would be routed if just a single firewall exists, in terms of reducing cross-zone hops

@majst01
Copy link
Contributor

majst01 commented Dec 4, 2025

Some feedback from "the outside": This MEP so far focuses heavily on how the traffic between the zones could be routed. What seems to be missing is how zone awareness would work on a conceptual level, how would it look like in the CLI etc.

Some questions:

  • Would it be a "zone spreading" with a configuration like "in each zone, have at least X nodes"? Or is it rather: "Have at least X nodes in zone A" and if I want to ensure that in each zones a minimum number of nodes, I would have one such configuration per zone?

Spreading of machines is a actually done only on a partition level, which would not change with this MEP. Instead the caller must decide how in how many partitions of a zone machines should be created. So the logic you are referring to is up to a higher level, like CAPI or Gardener.

  • What about the firewalls? In the context of Gardener and also Cluster API, each Kubernetes cluster has its own firewall machine.

    • It's unclear, whether "firewall" is synonymous with the firewall created by metal-stack for a cluster?

Firewall references to a firewall created by metal-stack.

  • It's unclear where the metal-stack firewall of a cluster is placed in the context of multiple zones, whether it still will be a single firewall or multiple

metal-stack could create multiple firewalls per cluster, there is one open feature called "distance aware routing" which must be implemented during this MEP.

  • It's unclear how traffic would be routed if just a single firewall exists, in terms of reducing cross-zone hops

simcod and others added 5 commits December 10, 2025 11:04
Co-authored-by: Gerrit <Gerrit91@users.noreply.github.com>
Co-authored-by: Gerrit <Gerrit91@users.noreply.github.com>
Co-authored-by: Gerrit <Gerrit91@users.noreply.github.com>
Co-authored-by: Gerrit <Gerrit91@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: documentation Affects the documentation area.

Projects

Status: Upcoming

Development

Successfully merging this pull request may close these issues.

6 participants