The buildkitd-autoscaler is a TCP proxy and Kubernetes controller designed to automatically scale-to-zero a
buildkitd deployment.
It is primarily build and tested for deployments of earthbuild/buildkit but does is generally agnostic and should work with vanilla moby/buildkit as well.
It will scale buildkitd StatefulSet from 0 to 1 when the first TCP connection is received and back from 1
to 0 when all connections are closed and an idle timeout period has elapsed.
This helps in optimizing resource usage by running buildkitd only when it's actively needed while remaining
agnostic to the buildkit client (a CI pipeline or a user connecting via ingress to the buildkit backend).
Since buildkitd is deployed as a StatefulSet with a PVC in this chart's design, scaling to zero does not
disrupt the buildkit cache.
The application can be configured using command-line flags or environment variables. Environment variables take precedence over command-line flags.
| Flag | Environment Variable | Description | Default |
|---|---|---|---|
--listen-addr |
PROXY_LISTEN_ADDR |
Proxy listen address and port | :8080 |
--sts-name |
BUILDKITD_STATEFULSET_NAME |
Name of the buildkitd StatefulSet | buildkitd |
--sts-namespace |
BUILDKITD_STATEFULSET_NAMESPACE |
Namespace of the buildkitd StatefulSet | default |
--headless-service-name |
BUILDKITD_HEADLESS_SERVICE_NAME |
Name of the buildkitd Headless Service | buildkitd-headless |
--target-port |
BUILDKITD_TARGET_PORT |
Target port on buildkitd pods | 8372 |
--idle-timeout |
SCALE_DOWN_IDLE_TIMEOUT |
Duration for scale-down idle timer | 2m0s |
--kubeconfig |
KUBECONFIG_PATH |
Path to kubeconfig file (for local development) | (none) |
--ready-wait-timeout |
READY_WAIT_TIMEOUT |
Timeout for waiting for StatefulSet to be ready | 5m0s |
Note on READY_WAIT_TIMEOUT: This is not a direct flag but an internal constant (waitForReadyTimeout in main.go) set to 5 minutes. It defines how long the autoscaler will wait for the buildkitd StatefulSet to report 1 ready replica after scaling up.
The service is best deployed to a Kubernetes cluster using the provided Helm chart.
Prerequisites:
- Helm CLI installed.
- A running Kubernetes cluster.
- Your Docker image for the autoscaler pushed to a container registry.
The chart is published to GitHub Container Registry and can be installed directly:
# Install the chart
helm install my-buildkitd-autoscaler oci://ghcr.io/earthbuild/charts/buildkitd-stack \
--version 1.0.0 \
--namespace buildkitd-scaler-system \
--create-namespace \
--set image.repository=your-repo/buildkitd-autoscaler \
--set image.tag=v1.0.0 \
--set autoscalerConfig.buildkitdStatefulSetNamespace=default \
--set autoscalerConfig.buildkitdStatefulSetName=buildkitdTo see available versions, you can use:
# List available versions (requires GitHub CLI or API access)
gh api repos/earthbuild/buildkitd-proxy/packages/container/charts%2Fbuildkitd-stack/versionsChart Location:
The Helm chart is located in the helm/buildkitd-stack/ directory.
Installation Steps:
-
Configure Values:
- The primary way to configure the chart is by creating a custom
values.yamlfile or by setting values via the--setflag during installation. - Navigate to the chart directory:
cd helm/buildkitd-stack/ - Review
values.yamlfor all available options. Key configuration sections:
Autoscaler Configuration: *
autoscaler.image.repository: Docker image repository for the autoscaler (e.g.,your-repo/buildkitd-proxy). *autoscaler.image.tag: Tag of your autoscaler Docker image. *autoscaler.namespaceOverride: Target namespace for deployment. Ifautoscaler.namespace.createis true, Helm will create this namespace. *autoscaler.autoscalerConfig.proxyListenAddr: Proxy listen address and port (default::8372). *autoscaler.autoscalerConfig.scaleDownIdleTimeout: Duration for scale-down idle timer (default:2m0s). *autoscaler.autoscalerConfig.readyWaitTimeout: Timeout for waiting for buildkitd to become ready (default:5m0s). *autoscaler.autoscalerConfig.logLevel: Log level for the autoscaler (default:debug). *autoscaler.service.type: Service type for the autoscaler (ClusterIP,NodePort,LoadBalancer). *autoscaler.service.port: External port for the autoscaler service (default:8372). *autoscaler.resources: CPU/memory requests and limits for the autoscaler pod.Buildkitd Configuration: *
buildkitd.replicaCount: Initial number of replicas (default:0for scale-to-zero). *buildkitd.image.repository: Docker image repository for buildkitd (default:earthly/buildkitd). *buildkitd.image.tag: Tag of the buildkitd Docker image (default:v0.8.15). *buildkitd.persistence.enabled: Enable persistent volume for buildkitd cache (default:true). *buildkitd.persistence.size: Size of the persistent volume (default:50Gi). *buildkitd.persistence.storageClassName: Storage class for the persistent volume. *buildkitd.service.port: Port for buildkitd gRPC service (default:8372). *buildkitd.resources: CPU/memory requests and limits for buildkitd pods. *buildkitd.podAnnotations: Pod annotations for buildkitd (useful for Istio integration). *buildkitd.extraEnvVars: Additional environment variables for buildkitd. *buildkitd.initContainers: Init containers for buildkitd (e.g., for multi-arch support). *buildkitd.nodeSelector: Node selection constraints. *buildkitd.tolerations: Tolerations for pod scheduling. *buildkitd.affinity: Affinity rules for pod scheduling. - The primary way to configure the chart is by creating a custom
-
Install the Chart: Once you have your configuration ready (e.g., in a
my-custom-values.yamlfile or as--setparameters):# Example installation: helm install my-buildkitd-autoscaler ./helm/buildkitd-stack \ --namespace buildkitd-scaler-system \ --create-namespace \ -f my-custom-values.yaml # Optional: if you have a custom values file
-
Replace
my-buildkitd-autoscalerwith your desired release name. -
Replace
buildkitd-scaler-systemwith your target namespace. -
If not using a custom values file, use
--setfor each parameter you need to override, for example:helm install my-buildkitd-autoscaler ./helm/buildkitd-stack \ --namespace buildkitd-scaler-system \ --create-namespace \ --set autoscaler.image.repository=your-repo/buildkitd-proxy \ --set autoscaler.image.tag=v0.1.0 \ --set autoscaler.autoscalerConfig.scaleDownIdleTimeout=5m0s \ --set buildkitd.persistence.size=100Gi
-
Upgrading the Chart:
From OCI registry:
helm upgrade my-buildkitd-autoscaler oci://ghcr.io/earthbuild/charts/buildkitd-stack \
--version 1.1.0 \
--namespace buildkitd-scaler-system \
-f my-custom-values.yaml # Or using --setFrom local source:
helm upgrade my-buildkitd-autoscaler ./helm/buildkitd-stack \
--namespace buildkitd-scaler-system \
-f my-custom-values.yaml # Or using --setUninstalling the Chart:
helm uninstall my-buildkitd-autoscaler --namespace buildkitd-scaler-system(The old manual deployment instructions using raw Kubernetes manifests from deploy/kubernetes/ are now superseded by the Helm chart. The deploy/kubernetes/ directory can be removed after confirming the Helm chart is satisfactory.)
Once deployed, the buildkitd-autoscaler service (e.g., buildkitd-proxy-service as defined in deploy/kubernetes/05-service.yaml) will listen for TCP connections on its configured port (default :8080).
- Clients (e.g.,
docker build --builder tcp://<buildkitd-proxy-service-ip>:<port>) should be configured to connect to this proxy service. - When the first client connects, the autoscaler will:
- Scale the target
buildkitdStatefulSet to 1 replica (if it's currently at 0). - Wait for the
buildkitdpod to become ready. - Proxy the connection to the
buildkitdpod (e.g.,buildkitd-0.buildkitd-headless.default.svc.cluster.local:8372).
- Scale the target
- Subsequent connections will be proxied directly as long as at least one
buildkitdpod is ready. - When the last client disconnects, an idle timer (default 2 minutes) starts.
- If no new connections are made before the timer expires, the autoscaler will scale the
buildkitdStatefulSet back down to 0 replicas.
For a detailed end-to-end testing scenario, refer to E2E_TESTING.md.
When running in an Istio-enabled cluster, you may need to configure the sidecar proxy to allow buildkitd to access the correct network interface (see this issue).
Configure the necessary annotation through the Helm chart values:
# values.yaml
buildkitd:
podAnnotations:
# Allow buildkitd to access the outside world through the correct interface
traffic.sidecar.istio.io/kubevirtInterfaces: "cni0"Or set it directly during installation:
helm install my-buildkitd-autoscaler ./helm/buildkitd-stack \
--set buildkitd.podAnnotations."traffic\.sidecar\.istio\.io/kubevirtInterfaces"="cni0"To determine the correct interface for your cluster (cni0 in this example), you can exec into a running buildkitd pod and check the available interfaces:
kubectl exec -it buildkitd-0 -n <namespace> -c buildkitd -- ip addrAdditionally, note that the buildkitd grpc traffic does not work with envoy proxy's strict http2 settings so it appears to be necessary to handle buildkitd traffic as TCP traffic not HTTP/2 traffic in your istio mesh.
The best way to develop the entire application stack in this repository is to use tilt.
Simply:
tilt upTo start everything in your k8s-context-of-choice with automatic reloading of application and chart changes.
To build the Go binary locally:
go build .This will produce a buildkitd-autoscaler (or go-buildkitd-proxy based on go.mod) executable in the current directory.
To build a single multi-platform OCI image for linux/arm64 and linux/amd64:
earthly +imageRemember to replace your-repo with your actual Docker repository/namespace. It's recommended to use a specific version tag instead of latest for production deployments.
This service is currently a Proof of Concept (PoC) primarily focused on:
- Scaling a single
buildkitdinstance (StatefulSet with pod namebuildkitd-0) from 0-to-1 and 1-to-0. - Basic TCP connection counting for triggering scaling events.
Default resource values are not provided to allow deployment in all environments. You will need to monitor the autoscaler's performance under your specific load and adjust these values accordingly.
In order to enable support for multiple architectures, the node must have QEMU enabled. The easiest way to this is to use tonistiigi/binfmt. To ensure buildkit is always running on a node where QEMU is enabled, this can be run as an initContainer, e.g.
- N-Instance Load Balancing: Extend the proxy to support scaling to N
buildkitdinstances and distribute load among them (e.g., round-robin, least connections). This would likely involve more sophisticated service discovery and routing. - More Sophisticated Readiness/Liveness Probes: Implement more detailed health checks for the
buildkitdinstances beyond just pod readiness. - Metrics and Observability: Expose Prometheus metrics for active connections, scaling events, proxy latency, etc.
- Advanced Configuration Options: More granular control over scaling behavior, timeouts, and Kubernetes interactions.
- Horizontal Pod Autoscaler (HPA) Integration: Explore integration with HPA for more dynamic scaling based on custom metrics if N-instance support is added.
- Leader Election for Proxy HA: If running multiple instances of the autoscaler proxy for HA, implement leader election to ensure only one instance actively manages the scaling of the
buildkitdStatefulSet.