Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
478 changes: 161 additions & 317 deletions AGENTS.md

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Contributing

Thanks for your interest in improving CAPCS.

## Issues

File bugs and feature requests in the
[GitHub issue tracker](https://github.com/cloudscale-ch/cluster-api-provider-cloudscale/issues).
Please include the CAPCS version, the Kubernetes version of your management
cluster, and the relevant CRD YAML when reporting a bug. If you are unsure
whether a problem is a bug, open an issue anyway — it is easier to redirect
than to discover later.

## Pull requests

1. Fork the repository and create a feature branch off `main`.
2. Make your change. Tests live next to the code; new behavior needs a test.
3. Run `make test` and `make lint` locally.
4. For changes that touch reconcilers or templates, run at least
`make test-e2e-lifecycle` against a cloudscale.ch project.
5. Open a PR against `main`. Keep the title short and the description
focused on the *why*.

Commit messages loosely follow
[Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`,
`chore:`, `docs:`). Match the style of recent commits.

## Development setup

See [docs/development.md](docs/development.md) for architecture, Tilt setup,
test layers, and make targets.

## Questions

Open an issue — there is no separate chat channel.
205 changes: 35 additions & 170 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,195 +4,60 @@
[![Release](https://img.shields.io/github/v/release/cloudscale-ch/cluster-api-provider-cloudscale)](https://github.com/cloudscale-ch/cluster-api-provider-cloudscale/releases/latest)

Kubernetes [Cluster API](https://cluster-api.sigs.k8s.io/) infrastructure provider
for [cloudscale.ch](https://www.cloudscale.ch).
for [cloudscale.ch](https://www.cloudscale.ch). CAPCS provisions the cloudscale-specific
infrastructure — servers, networks, load balancers, floating IPs, server groups —
that Cluster API uses to build and manage workload Kubernetes clusters.

New to Cluster API? Read the upstream
[concepts](https://cluster-api.sigs.k8s.io/user/concepts.html) and
[quick start](https://cluster-api.sigs.k8s.io/user/quick-start.html) first; this
project only documents what is cloudscale-specific.

## Features

- **CloudscaleCluster**: Multi-network management (managed or pre-existing), Load Balancer (public or private VIP),
Floating IP
support
- **CloudscaleMachine**: Server provisioning with cloud-init and configurable network interfaces
- **CloudscaleMachineTemplate**: Immutable machine templates for KubeadmControlPlane/MachineDeployment
- Three CRDs: `CloudscaleCluster`, `CloudscaleMachine`, `CloudscaleMachineTemplate`
- Managed or pre-existing networks; public or private load balancer VIPs;
floating IPs (IPv4/IPv6); anti-affinity server groups
- Supported regions: `lpg`, `rma`
- HA control plane; `MachineDeployment` autoscaling including
[scale-from-zero](https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling)
via capacity reported on `CloudscaleMachineTemplate`
- Four cluster templates: `default`, `fip`, `pre-existing-network`,
`public-lb-private-nodes`

## Prerequisites

- A Kubernetes cluster to use as a management cluster ([kind](https://kind.sigs.k8s.io/) works)
- [clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start#install-clusterctl)
- A [cloudscale.ch](https://www.cloudscale.ch) account and API token
- A custom image imported into cloudscale. Images can e.g. be generated
using [image-builder Openstack](https://image-builder.sigs.k8s.io/)
- cloudscale.ch account and API token
- A custom OS image imported into your cloudscale.ch project, e.g. built with
[image-builder for OpenStack](https://image-builder.sigs.k8s.io/)
- A management Kubernetes cluster ([kind](https://kind.sigs.k8s.io/) works) and
[clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start#install-clusterctl)

## Quickstart

### Initialize the management cluster

```bash
export CLOUDSCALE_API_TOKEN=<your-api-token>

clusterctl init --infrastructure cloudscale-ch-cloudscale
```

### Generate and apply a workload cluster

Set the [required environment variables](#environment-variables), then generate and apply the cluster manifest:

```bash
clusterctl generate cluster my-cluster \
--infrastructure cloudscale-ch-cloudscale \
--kubernetes-version v1.36.0 \
--control-plane-machine-count 1 \
--worker-machine-count 2 \
--infrastructure cloudscale-ch-cloudscale --kubernetes-version v1.36.0 \
--control-plane-machine-count 1 --worker-machine-count 2 \
| kubectl apply -f -
```

This uses the default template (public nodes, managed network). See [Cluster Templates](#cluster-templates) for other
network topologies.

Watch the cluster come up:

```bash
clusterctl describe cluster my-cluster
```

## Environment Variables

| Variable | Description | Example |
|-------------------------------------------|-----------------------------------------|-----------------------------------|
| `CLOUDSCALE_API_TOKEN` | cloudscale.ch API token | `abc123...` |
| `CLOUDSCALE_SSH_PUBLIC_KEY` | SSH public key added to nodes | `ssh-ed25519 AAAA...` |
| `CLOUDSCALE_REGION` | cloudscale.ch region | `lpg` or `rma` |
| `CLOUDSCALE_MACHINE_IMAGE` | Server image for nodes | `custom:ubuntu-2404-kube-v1.xx.x` |
| `CLOUDSCALE_CONTROL_PLANE_MACHINE_FLAVOR` | Flavor for control plane nodes | `flex-4-2` |
| `CLOUDSCALE_WORKER_MACHINE_FLAVOR` | Flavor for worker nodes | `flex-4-2` |
| `CLOUDSCALE_ROOT_VOLUME_SIZE` | Root volume size in GB | `50` |
| `CLOUDSCALE_NETWORK_UUID` | Pre-Existing cloudscale.ch network UUID | `2db69ba3-...` |

> **Note:** `CLOUDSCALE_NETWORK_UUID` is required by the `fip`, `public-lb-private-nodes`, and `pre-existing-network`
> template flavors. It is not needed for the default template.

## Cluster Templates

CAPCS ships several cluster templates for different network topologies. Use `clusterctl generate cluster` with the
`--flavor` flag to select one:

```bash
clusterctl generate cluster my-cluster \
--infrastructure cloudscale-ch-cloudscale \
--kubernetes-version v1.36.0 \
--control-plane-machine-count 1 \
--worker-machine-count 2 \
--flavor <flavor-name> \
| kubectl apply -f -
```
The default template uses a managed network and a public load balancer.
[Getting Started](docs/getting-started.md) lists the required environment
variables and the other template flavors.

| Flavor | Network | CP Endpoint | Node Connectivity | Extra Env Vars | Notes |
|---------------------------|---------------------------|-----------------------|-------------------|---------------------------|----------------------|
| *(default)* | Managed (`172.18.0.0/24`) | Public LB (DualStack) | Public + cluster | — | |
| `fip` | Pre-Existing | Floating IP (IPv4) | Public + cluster | `CLOUDSCALE_NETWORK_UUID` | |
| `public-lb-private-nodes` | Pre-Existing + NAT | Public LB | Private only | `CLOUDSCALE_NETWORK_UUID` | Requires NAT gateway |
| `pre-existing-network` | Pre-Existing | Public LB (DualStack) | Public + cluster | `CLOUDSCALE_NETWORK_UUID` | |
## Documentation

The default `networks[].cidr` is `172.18.0.0/24` so it does not overlap with the default Cilium
cluster-pool IPAM range `10.0.0.0/8`. If you override `networks[].cidr` to a range inside
`10.0.0.0/8`, make sure to configure your CNI's IP range correctly. Overlapping
ranges may break for example control-plane LB's health checks.

## Development

This is a kubebuilder-scaffolded project. For new APIs, Webhooks, etc. [kubebuilder](https://book.kubebuilder.io/)
commands should be used.

```bash
# Run tests
make test

# Generate manifests
make manifests

# Generate code
make generate

# Run E2E tests (requires CLOUDSCALE_API_TOKEN)
make test-e2e
```

### E2E Tests

E2E tests are built on the [CAPI e2e test framework](https://pkg.go.dev/sigs.k8s.io/cluster-api/test/e2e)
(Ginkgo-based) and provision real clusters on cloudscale.ch. Tests use Ginkgo labels for
filtering and are split into suites of increasing cost, scheduled accordingly:

| Suite | Label | Description | ~Duration | Schedule | Make target |
|-------------------------|---------------------------|------------------------------------------------------------------------------------------|-----------|----------|------------------------------------|
| Lifecycle | `lifecycle` | 1 CP + 1 worker: create, validate cloudscale resources, delete | ~5 min | Nightly | `test-e2e-lifecycle` |
| HA lifecycle | `ha` | 3 CP + 2 workers with anti-affinity server groups | ~8 min | Weekly | `test-e2e-ha` |
| Cluster upgrade | `upgrade` | Rolling K8s version upgrade (v1.34 → v1.35) | ~25 min | Weekly | `test-e2e-upgrade` |
| Self-hosted | `self-hosted` | clusterctl move (pivot) to workload cluster. Requires container image in public registry | ~13 min | Weekly | `test-e2e-self-hosted` |
| MD remediation | `md-remediation` | MachineHealthCheck auto-replacement of unhealthy workers | ~6 min | Weekly | `test-e2e-md-remediation` |
| Pre-Existing networking | `pre-existing-networking` | Pre-Existing network: public-LB + private-nodes and floating-IP variants | ~30 min | Weekly | `test-e2e-pre-existing-networking` |
| Conformance (fast) | `conformance` | K8s conformance, skip Serial tests | ~55 min | Weekly | `test-e2e-conformance-fast` |
| Conformance (full) | `conformance` | Full K8s conformance including Serial tests | ~120 min | Biweekly | `test-e2e-conformance` |

Durations are approximate from a real CI run; conformance varies with cluster size.

**Why this split?** The single-CP lifecycle test is the cheapest smoke test and runs
nightly to catch regressions early. HA, upgrade, self-hosted, and remediation tests are more
resource-intensive and run weekly. Private networking tests require `CLOUDSCALE_NETWORK_UUID` to be set and are
skipped otherwise. Full K8s conformance is the most expensive and runs biweekly
(1st + 15th of month). All suites can be triggered manually via the `test-e2e.yml` workflow
dispatch. E2E tests share a concurrency group so only one suite runs at a time.

Any run involving the self-hosted spec requires the container image to be published to our registry. The self-hosted
spec moves the management cluster to the first workload cluster. That workload cluster doesn't have access to the
locally
built images and therefore needs a published container image.

For PRs, no e2e test is automatically run. It is advised to run them locally before submitting, as well as for a
reviewer
to run them locally and/or manually triggering the workflow **after** reviewing the code is safe.

### Tilt

The easiest way to work on this provider is by using the
[Tilt setup](https://cluster-api.sigs.k8s.io/developer/core/tilt.html) of Cluster-API.

Refer to the linked documentation on how to set up your local tilt. This requires cloning
[Cluster-API core](https://github.com/kubernetes-sigs/cluster-api) to your host. The necessary commands need to be
executed in the
Cluster-API core repository (**not** in this repository).

An example `tilt-settings.yaml`, which should also be placed in the Cluster-API core repository, is provided here:

```yaml
default_registry: "" # change if you use a remote image registry
provider_repos:
# This refers to your provider directory and loads settings
# from `tilt-provider.yaml`
- path/to/local/clone/cluster-api-provider-cloudscale
enable_providers:
- cloudscale
- kubeadm-bootstrap
- kubeadm-control-plane
deploy_cert_manager: true
kustomize_substitutions:
CLOUDSCALE_API_TOKEN: "INSERT_TOKEN_HERE"
CLOUDSCALE_SSH_PUBLIC_KEY: "INSERT_SSH_PUBLIC_KEY_HERE"
CLOUDSCALE_REGION: "lpg"
CLOUDSCALE_CONTROL_PLANE_MACHINE_FLAVOR: "flex-4-2"
CLOUDSCALE_WORKER_MACHINE_FLAVOR: "flex-4-2"
CLOUDSCALE_MACHINE_IMAGE: "IMAGE_NAME"
CLOUDSCALE_ROOT_VOLUME_SIZE: "50"
# Required for pre-existing network flavors (fip, public-lb-private-nodes, pre-existing-network):
# CLOUDSCALE_NETWORK_UUID: "UUID_HERE"
extra_args:
cloudscale:
- "--zap-log-level=5"
template_dirs:
docker:
- ./test/infrastructure/docker/templates
cloudscale:
- path/to/local/clone/cluster-api-provider-cloudscale/templates
```
| If you are… | Start here |
|-------------------------------------|----------------------------------------------------------------------------------------------------------------|
| New to Cluster API, or new to CAPCS | [Getting Started](docs/getting-started.md) |
| Looking up a CRD field | `kubectl explain cloudscalecluster.spec` (or the generated CRDs under [`config/crd/bases/`](config/crd/bases)) |
| Hitting an error | [Troubleshooting](docs/troubleshooting.md) |
| Contributing to CAPCS | [Development](docs/development.md), [CONTRIBUTING.md](CONTRIBUTING.md) |
| Cutting a release | [Releasing](docs/releasing.md), [Testing releases](docs/testing-releases.md) |

## License

Expand Down
33 changes: 24 additions & 9 deletions api/v1beta2/cloudscalecluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,17 @@ const (

// CloudscaleClusterSpec defines the desired state of CloudscaleCluster
type CloudscaleClusterSpec struct {
// Region is the cloudscale.ch region (e.g., "rma", "lpg").
// Region is the cloudscale.ch region the cluster is provisioned in.
// Determines the default zone and the set of available flavors.
// Immutable after cluster creation.
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum=rma;lpg
Region string `json:"region"`

// Zone is the cloudscale.ch zone (e.g., "rma1", "lpg1").
// Defaults to region + "1" if not specified.
// Zone is the cloudscale.ch zone within Region.
// Defaults to Region + "1" (e.g., "rma1", "lpg1"). Set explicitly only when
// the region offers multiple zones and you need to pin the cluster to one.
// Immutable after cluster creation.
// +optional
Zone string `json:"zone,omitempty"`

Expand Down Expand Up @@ -86,13 +90,16 @@ type CloudscaleClusterSpec struct {
FloatingIP *FloatingIPSpec `json:"floatingIP,omitempty"`
}

// CloudscaleCredentialsReference references a Secret containing the API token.
// CloudscaleCredentialsReference references a Secret holding the cloudscale.ch
// API token used to provision this cluster's infrastructure. The Secret must
// contain a key named "token" with the raw token string as its value.
type CloudscaleCredentialsReference struct {
// Name is the name of the Secret.
// +kubebuilder:validation:Required
Name string `json:"name"`

// Namespace is the namespace of the Secret. Defaults to the cluster namespace.
// Namespace is the namespace of the Secret. Defaults to the
// CloudscaleCluster's own namespace if unset.
// +optional
Namespace string `json:"namespace,omitempty"`
}
Expand Down Expand Up @@ -138,18 +145,24 @@ type LoadBalancerSpec struct {
// +optional
Enabled *bool `json:"enabled,omitempty"`

// Algorithm is the load balancing algorithm.
// Algorithm is the cloudscale.ch load-balancing algorithm.
// - "round_robin" (default): rotate requests across healthy backends.
// - "least_connections": send each request to the backend with the fewest active connections.
// - "source_ip": hash the client IP so the same client lands on the same backend.
// +kubebuilder:validation:Enum=round_robin;least_connections;source_ip
// +kubebuilder:default="round_robin"
// +optional
Algorithm string `json:"algorithm,omitempty"`

// Flavor is the load balancer flavor (size).
// Flavor is the cloudscale.ch load balancer flavor slug. Defaults to
// "lb-standard".
// +kubebuilder:default="lb-standard"
// +optional
Flavor string `json:"flavor,omitempty"`

// APIServerPort is the port for the Kubernetes API server.
// APIServerPort is the LB listener port exposed for the Kubernetes API
// server. Defaults to 6443. The pool always targets the API server on the
// control plane nodes' 6443.
// +kubebuilder:default=6443
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
Expand Down Expand Up @@ -309,7 +322,9 @@ func (s *CloudscaleClusterStatus) GetNetworkStatus(name string) *NetworkStatus {
// +kubebuilder:printcolumn:name="Region",type="string",JSONPath=".spec.region",description="cloudscale.ch region"
// +kubebuilder:printcolumn:name="Endpoint",type="string",JSONPath=".spec.controlPlaneEndpoint.host",description="Control plane endpoint"

// CloudscaleCluster is the Schema for the cloudscaleclusters API
// CloudscaleCluster is the cloudscale.ch infrastructure for a CAPI Cluster.
// It owns the networks, control-plane load balancer, optional floating IP, and
// server groups that back the cluster's machines.
type CloudscaleCluster struct {
metav1.TypeMeta `json:",inline"`

Expand Down
Loading