A Kubernetes operator that enforces SLOs, guards pod best practices, and automates namespace lifecycle management.
KubeGuard brings SRE discipline to your cluster as a set of Custom Resource Definitions (CRDs) and controllers that run inside the cluster, continuously reconciling desired state.
KubeGuard's guardrails map cleanly onto GPU-heavy LLM inference clusters. See docs/llm-guardrails.md for concrete policies and SLO examples.
Define Service Level Objectives as Kubernetes resources. KubeGuard monitors them, calculates error budgets in real-time, and auto-generates Prometheus PrometheusRule alert rules.
apiVersion: kubeguard.bukx.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: checkout-availability
namespace: production
spec:
service: checkout-api
description: "Checkout API availability SLO"
target: "99.9"
window: 30d
indicator:
type: availability
totalQuery: 'sum(rate(http_requests_total{service="checkout-api"}[5m]))'
errorQuery: 'sum(rate(http_requests_total{service="checkout-api",code=~"5.."}[5m]))'
alerting:
burnRateThresholds:
- severity: critical
shortWindow: 5m
longWindow: 1h
factor: 14.4
- severity: warning
shortWindow: 30m
longWindow: 6h
factor: 6What it does:
- Calculates current error budget remaining (%)
- Generates multi-window, multi-burn-rate Prometheus alerts
- Updates
.statuswith real-time SLO compliance - Emits Kubernetes events on SLO breaches
A validating and mutating admission webhook that enforces pod security best practices:
| Check | Action | Default |
|---|---|---|
| Missing resource limits | Reject or inject defaults | Reject |
| Missing liveness/readiness probes | Warn | Warn |
| Running as root | Reject | Reject |
| Missing security context | Mutate β inject runAsNonRoot: true |
Mutate |
| Privileged containers | Reject | Reject |
| Latest tag images | Warn | Warn |
Configure policies per namespace via PodGuardPolicy CRD:
apiVersion: kubeguard.bukx.dev/v1alpha1
kind: PodGuardPolicy
metadata:
name: production-policy
namespace: production
spec:
enforcement: strict # strict | permissive | audit
rules:
requireResourceLimits: true
requireProbes: true
disallowRoot: true
disallowPrivileged: true
disallowLatestTag: true
maxCPULimit: "2"
maxMemoryLimit: "4Gi"Automate namespace provisioning with RBAC, quotas, network policies, and monitoring β all from a single CR:
apiVersion: kubeguard.bukx.dev/v1alpha1
kind: ManagedNamespace
metadata:
name: team-payments
spec:
owner: payments-team@company.com
environment: production
expiresAt: "2025-12-31T00:00:00Z" # Optional TTL
resourceQuota:
hard:
cpu: "20"
memory: "40Gi"
pods: "100"
limitRange:
defaultLimit:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
networkPolicy:
allowIngress: true
allowEgress: true
allowedNamespaces:
- istio-system
- monitoring
rbac:
- role: admin
subjects:
- kind: Group
name: payments-team
- role: view
subjects:
- kind: Group
name: platform-team
monitoring:
enablePrometheus: true
alertEmail: payments-team@company.comWhat it does:
- Creates the namespace with labels and annotations
- Applies ResourceQuota and LimitRange
- Creates NetworkPolicy for isolation
- Binds RBAC roles to specified subjects
- Sets up Prometheus ServiceMonitor
- Auto-deletes expired namespaces (with warning events)
βββββββββββββββββββββββββββββββββββ
β Kubernetes API Server β
βββββββ¬ββββββββββββββ¬ββββββββββββββ
β β
Watch CRDs β β Admission Webhooks
β β
ββββββββββββΌβββββββ ββββββΌβββββββββββββββ
β Controllers β β Pod Guardian β
β β β (Webhook Server) β
β βββββββββββββββ β β β
β β SLO β β β β’ Validate pods β
β β Controller β β β β’ Mutate defaults β
β βββββββββββββββ€ β β β’ Enforce policies β
β β Namespace β β ββββββββββββββββββββββ
β β Lifecycle β β
β β Controller β β
β βββββββββββββββ€ β
β β PodGuard β β
β β Policy Ctrl β β
β βββββββββββββββ β
ββββββββββ¬βββββββββ
β
Generates β
β
ββββββββββΌβββββββββ
β Prometheus β
β Alert Rules β
β (PrometheusRuleβ
β resources) β
βββββββββββββββββββ
- Kubernetes cluster (v1.25+)
kubectlconfiguredcert-managerinstalled (for webhook TLS)- Prometheus Operator (optional, for SLO alerts)
# Install CRDs
kubectl apply -f config/crd/bases/
# Deploy the operator
kubectl apply -f config/manager/
# Deploy webhook configuration
kubectl apply -f config/webhook/
# Create a sample SLO
kubectl apply -f config/samples/slo.yaml
# Create a managed namespace
kubectl apply -f config/samples/managed-namespace.yaml
# Create a pod guard policy
kubectl apply -f config/samples/pod-guard-policy.yaml# Build
make build
# Run tests
make test
# Build container image
make docker-build IMG=bukx/kubeguard:latest
# Push to registry
make docker-push IMG=bukx/kubeguard:latest
# Deploy to cluster
make deploy IMG=bukx/kubeguard:latestKubeGuard exposes Prometheus metrics:
| Metric | Type | Description |
|---|---|---|
kubeguard_slo_error_budget_remaining |
Gauge | Error budget remaining (0-1) |
kubeguard_slo_compliance |
Gauge | Current SLO compliance (0-1) |
kubeguard_pods_rejected_total |
Counter | Pods rejected by admission webhook |
kubeguard_pods_mutated_total |
Counter | Pods mutated by admission webhook |
kubeguard_namespaces_managed |
Gauge | Number of managed namespaces |
kubeguard_namespaces_expired_total |
Counter | Namespaces cleaned up due to TTL |
βββ api/v1alpha1/ # CRD type definitions
β βββ slo_types.go
β βββ podguardpolicy_types.go
β βββ managednamespace_types.go
β βββ groupversion_info.go
βββ internal/
β βββ controller/ # Reconciliation controllers
β β βββ slo_controller.go
β β βββ namespace_controller.go
β β βββ podguard_controller.go
β βββ webhook/ # Admission webhook handlers
β βββ pod_validator.go
βββ config/
β βββ crd/bases/ # Generated CRD manifests
β βββ rbac/ # RBAC for the operator
β βββ manager/ # Operator deployment
β βββ webhook/ # Webhook configuration
β βββ samples/ # Example CRs
βββ .github/workflows/ # CI pipeline
βββ Dockerfile
βββ Makefile
βββ go.mod
PRs welcome! Please:
- Fork the repo
- Create a feature branch
- Write tests for new functionality
- Submit a PR
Apache License 2.0