Practical, task-oriented guide for deploying, operating, and maintaining the ai-stack Helm chart. For architecture overview and configuration reference, see README.md.
- Installation
- Day-1 Setup
- Working with Models
- RAG (Retrieval-Augmented Generation)
- Async Document Ingestion
- External LLM Providers
- GPU Acceleration
- Agentic Workloads (LangGraph)
- MCP Tool Integration (MCPO)
- PostgreSQL Modes
- Ingress and TLS
- Authentication with Authelia (SSO / OIDC)
- Networking and Security
- Observability
- Scaling
- Upgrading
- GitOps with ArgoCD
- EU Compliance
- Troubleshooting
- Uninstall
Lab mode deploys a single-replica stack with relaxed resource limits, suitable for development and evaluation.
Prerequisites:
- Kubernetes 1.27+ cluster (minikube, kind, k3s, or managed)
- Helm 3.12+
- At least 8 GB RAM available in the cluster
- A default StorageClass (or use
emptyDirfor ephemeral testing)
Install:
# Create the namespace and install with lab defaults
helm install ai-stack . -n ai-stack --create-namespaceLab with GPU:
helm install ai-stack . -n ai-stack --create-namespace \
--set ollama.gpu.enabled=trueProduction mode enables HA replicas, autoscaling, TLS ingress, and observability.
Additional prerequisites:
- NVIDIA GPU Operator (for Ollama GPU acceleration)
- Prometheus Operator CRDs (for ServiceMonitor resources)
- cert-manager (for automated TLS provisioning)
- An ingress controller (Envoy Gateway or NGINX)
Install:
helm install ai-stack . -n ai-stack --create-namespace \
-f values.yaml -f values-prod.yamlCustomize before installing:
- Copy
values-prod.yamltovalues-prod-override.yaml - Edit your overrides (hostname, storage class, resource limits)
- Install with both files:
helm install ai-stack . -n ai-stack --create-namespace \
-f values.yaml -f values-prod.yaml -f values-prod-override.yamlFor environments without internet access:
- Mirror container images to your internal registry:
# List all images used by the chart
helm template ai-stack . | grep "image:" | sort -u
# Pull, tag, and push each image to your registry
docker pull ghcr.io/open-webui/open-webui:v0.8.10
docker tag ghcr.io/open-webui/open-webui:v0.8.10 registry.internal/open-webui:v0.8.10
docker push registry.internal/open-webui:v0.8.10
# Repeat for all images...- Override image repositories in your values file:
openwebui:
image:
repository: registry.internal/open-webui
tag: "v0.8.10"
ollama:
image:
repository: registry.internal/ollama
tag: "0.18.2"
# ... repeat for all components- Configure image pull secrets if your registry requires authentication:
global:
imagePullSecrets:
- name: my-registry-secret- Pre-download Ollama models and load them into the PVC, since
ollama pullrequires internet access. See Section 3.
Zarf automates air-gapped deployments by packaging the Helm chart, all container images, and configuration into a single signed, declarative tarball. This eliminates the manual image mirroring described in Section 1.3.
The repository includes a zarf.yaml package definition with the core stack as a required component and optional components (Workbench, LangGraph, MCPO, OTel Collector) that can be selected at deploy time.
Prerequisites:
- Zarf CLI installed on the build machine (internet-connected)
- Zarf initialized on the target cluster (
zarf init) - Kubernetes 1.27+ on the target cluster
Step 1 — Build the package (internet-connected machine):
cd ai-stack/
zarf package create --confirmThis produces a file like zarf-package-ai-stack-amd64-1.0.0.tar.zst (~15-25 GB depending on selected components). Zarf automatically pulls all images listed in zarf.yaml and bundles them alongside the Helm chart.
Step 2 — Transfer the package:
Copy the .tar.zst file to the air-gapped environment via USB, S3 bucket, or any out-of-band transfer method.
Step 3 — Initialize Zarf on the target cluster (one-time):
If Zarf has not been initialized on the target cluster yet:
zarf init --confirmThis deploys an in-cluster registry and injector that Zarf uses to serve images.
Step 4 — Deploy:
# Deploy with defaults (core stack only)
zarf package deploy zarf-package-ai-stack-amd64-1.0.0.tar.zst --confirm
# Deploy with optional components
zarf package deploy zarf-package-ai-stack-amd64-1.0.0.tar.zst \
--components="ai-stack,langgraph,mcpo" --confirmZarf pushes the images to the in-cluster registry and runs helm install with the image references rewritten to point at the local registry.
Step 5 — Load Ollama models:
Zarf handles images, but Ollama models must still be loaded manually in an air-gapped cluster. On the internet-connected machine:
# Pull the model locally
ollama pull llama3.2
ollama pull nomic-embed-text
# Export to a tarball
# Models are stored under ~/.ollama/models/
tar czf ollama-models.tar.gz -C ~/.ollama models/On the air-gapped cluster:
# Copy the models into the Ollama PVC
kubectl cp ollama-models.tar.gz ai-stack/ai-stack-ollama-0:/tmp/
kubectl exec -n ai-stack ai-stack-ollama-0 -- \
tar xzf /tmp/ollama-models.tar.gz -C /root/.ollama/
kubectl exec -n ai-stack ai-stack-ollama-0 -- rm /tmp/ollama-models.tar.gz
# Restart Ollama to pick up the models
kubectl rollout restart -n ai-stack deploy/ai-stack-ollamaUpgrading:
Build a new package with the updated chart/images and redeploy:
zarf package deploy zarf-package-ai-stack-amd64-<new-version>.tar.zst --confirmZarf performs a helm upgrade under the hood.
After installation, Ollama starts with no models. Pull a chat model and an embedding model:
# Chat model
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull llama3.2
# Embedding model (required for RAG)
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull nomic-embed-textFor larger models (requires more RAM/VRAM):
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull qwen3:14b
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull deepseek-r1:14bPort-forward (lab):
kubectl port-forward -n ai-stack svc/ai-stack-openwebui 8080:8080
# Open http://localhost:8080Via ingress (production):
If ingress is configured, access via the hostname defined in your values (e.g., https://ai.example.com).
On first access, Open WebUI prompts you to create an admin account. This account controls:
- User management and permissions
- Model access control
- System settings and configuration
- Pipeline and tool management
Important: The first account created automatically becomes the admin. Do this immediately after deployment in production.
# All pods should be Running
kubectl get pods -n ai-stack
# NetworkPolicies should be present for each component
kubectl get networkpolicies -n ai-stack
# Secrets should be auto-generated
kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack
# ServiceAccounts per component
kubectl get serviceaccounts -n ai-stack
# Run Helm tests (connectivity checks)
helm test ai-stack -n ai-stack# List models loaded in Ollama
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama list# Pull any model from the Ollama library
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull <model-name>
# Examples
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull mistral
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull codellama:13b
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull llama3.2-vision:11bModel storage: Models are stored in the Ollama PVC (/root/.ollama). Ensure the PVC is large enough — a 14B parameter model typically requires 9-10 GB of storage. The default lab PVC is 50 GB; production is 200 GB.
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama rm <model-name>In Open WebUI, go to Admin Panel > Settings > Models and configure the default model. Users can still select other available models from the model picker.
RAG allows the AI to answer questions using your own documents. The stack includes all components needed: Tika (document parsing), Qdrant (vector storage), and Ollama (embeddings).
- Open the Open WebUI chat interface
- Click the + button or drag and drop files into the chat
- Supported formats: PDF, DOCX, PPTX, XLSX, TXT, HTML, Markdown, and more (via Tika)
- Documents are automatically extracted, chunked, embedded, and stored in Qdrant
The default embedding model is nomic-embed-text. To change it:
openwebui:
env:
RAG_EMBEDDING_MODEL: "bge-m3"Then pull the new model:
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull bge-m3Upgrade the release:
helm upgrade ai-stack . -n ai-stackNote: Changing the embedding model requires re-embedding all existing documents, as vector dimensions and representations differ between models.
Adjust these parameters in your values override:
openwebui:
env:
# Larger chunks = more context per retrieval, but fewer distinct matches
RAG_CHUNK_SIZE: "1500"
# Overlap prevents splitting relevant content at chunk boundaries
RAG_CHUNK_OVERLAP: "100"
# Number of top matching chunks to include in the prompt
RAG_TOP_K: "5"
# Minimum similarity score (0.0 = return all, higher = stricter)
RAG_RELEVANCE_THRESHOLD: "0.0"Guidelines:
| Scenario | Chunk Size | Overlap | Top K |
|---|---|---|---|
| Short, factual documents | 500-800 | 50 | 3-5 |
| Long technical documents | 1500-2000 | 100-200 | 5-8 |
| Legal/regulatory text | 1000-1500 | 200 | 8-10 |
| Code repositories | 800-1200 | 100 | 5-7 |
Web search via SearXNG is enabled by default. It allows the AI to search the internet for answers when document retrieval is insufficient.
To use web search in a conversation, type a question and enable the "Web Search" toggle in the chat interface, or configure it as the default behavior in Admin Panel settings.
For bulk document processing or integration with external systems, use the async ingestion worker instead of the UI upload.
ingestionWorker:
enabled: true
valkey:
persistence:
enabled: true # Persist task queue across restartshelm upgrade ai-stack . -n ai-stackConnect to Valkey and submit tasks via XADD:
# Port-forward to Valkey
kubectl port-forward -n ai-stack svc/ai-stack-valkey 6379:6379
# Submit an ingestion task
redis-cli -p 6379 XADD ingestion:documents '*' \
task_id "doc-001" \
file_url "https://example.com/report.pdf" \
filename "report.pdf"Or from within the cluster (e.g., from a script or application):
import redis
r = redis.Redis(host='ai-stack-valkey', port=6379)
r.xadd('ingestion:documents', {
'task_id': 'doc-001',
'file_url': 'https://example.com/report.pdf',
'filename': 'report.pdf'
})# Check status of a specific task
redis-cli -p 6379 HGETALL ingestion:status:doc-001
# List recent messages in the stream
redis-cli -p 6379 XRANGE ingestion:documents - + COUNT 10
# Check consumer group lag
redis-cli -p 6379 XINFO GROUPS ingestion:documentsStatus values: queued → processing → completed | failed
Add cloud-hosted models alongside local Ollama inference. Users see all models in the Open WebUI model picker.
externalAPIs:
enabled: true
providers:
- name: openai
baseUrl: "https://api.openai.com/v1"
apiKey: "sk-..."externalAPIs:
enabled: true
providers:
- name: azure-openai
baseUrl: "https://<resource>.openai.azure.com/openai/deployments/<deployment>"
apiKey: "<your-azure-key>"externalAPIs:
enabled: true
providers:
- name: anthropic
baseUrl: "https://api.anthropic.com/v1"
apiKey: "sk-ant-..."Note: Anthropic API integration requires Open WebUI v0.6+ with the Anthropic API translation layer, or an Open WebUI function for protocol translation.
For production, never store API keys in values files. Use existing Kubernetes Secrets (created by ESO, Vault, or manually):
externalAPIs:
enabled: true
providers:
- name: openai
baseUrl: "https://api.openai.com/v1"
existingSecret:
name: "openai-api-key" # Must exist in the release namespace
key: "api-key" # Key within the SecretPrerequisites: NVIDIA GPU Operator must be installed in the cluster.
ollama:
gpu:
enabled: true
count: 1 # Number of GPUs to allocate
resourceName: nvidia.com/gpu # Resource name from GPU operatorhelm upgrade ai-stack . -n ai-stackThe Workbench provides a JupyterLab environment with CUDA and PyTorch for ML experimentation:
workbench:
enabled: true
gpu:
enabled: true
count: 1Access the Workbench:
# Get the auto-generated token
kubectl get secret -n ai-stack ai-stack-workbench-secret \
-o jsonpath='{.data.token}' | base64 -d
# Port-forward
kubectl port-forward -n ai-stack svc/ai-stack-workbench 8888:8888
# Open http://localhost:8888 and enter the token# Check Ollama GPU detection
kubectl exec -n ai-stack deploy/ai-stack-ollama -- nvidia-smi
# Check Workbench GPU access
kubectl exec -n ai-stack deploy/ai-stack-workbench -- python3 -c \
"import torch; print(f'CUDA available: {torch.cuda.is_available()}, Devices: {torch.cuda.device_count()}')"LangGraph enables stateful, multi-step agentic workflows with tool calling and checkpoint persistence.
LangGraph requires PostgreSQL for checkpoint storage:
langgraph:
enabled: true
postgres:
enabled: true
mode: standalone # Use 'cnpg' for production HAhelm upgrade ai-stack . -n ai-stackOption A: Custom image (recommended)
- Create your graph code following the LangGraph documentation
- Build the image:
langgraph build -t my-registry/my-graphs:latest
docker push my-registry/my-graphs:latest- Override the image in values:
langgraph:
image:
repository: my-registry/my-graphs
tag: "latest"Option B: Volume mount
Place graph code in the /deps/graphs persistent volume:
kubectl cp my-graph.py ai-stack/ai-stack-langgraph-<pod>:/deps/graphs/# Port-forward
kubectl port-forward -n ai-stack svc/ai-stack-langgraph 8000:8000
# Health check
curl http://localhost:8000/ok
# List available assistants
curl http://localhost:8000/assistants \
-H "x-api-key: $(kubectl get secret -n ai-stack ai-stack-langgraph-secret -o jsonpath='{.data.api-key}' | base64 -d)"MCPO bridges Model Context Protocol (MCP) servers to OpenAPI endpoints that Open WebUI can consume as tools.
mcpo:
enabled: trueAdd MCP server definitions in your values:
mcpo:
enabled: true
config:
mcpServers:
# Local filesystem access
filesystem:
command: "npx"
args:
- "-y"
- "@modelcontextprotocol/server-filesystem"
- "/data"
# Remote SSE-based MCP server
remote-tools:
url: "https://mcp.example.com/sse"
type: "sse"After deploying, configure Open WebUI to use the MCPO endpoint as an OpenAPI tool source under Admin Panel > Settings > Tools.
Single-instance PostgreSQL — no HA, suitable for development:
postgres:
enabled: true
mode: standalonePrerequisites: Install the CloudNativePG operator (v1.25+):
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm install cnpg cnpg/cloudnative-pg -n cnpg-system --create-namespaceThen configure:
postgres:
enabled: true
mode: cnpg
tls:
mode: require
cnpg:
instances: 3 # 1 primary + 2 replicas
storage:
size: 50Gi
pooler:
enabled: true # PgBouncer connection pooling
monitoring:
enabled: true # Prometheus metricsCNPG provides:
- Streaming replication with automated failover
- Rolling updates without downtime
- Automated TLS certificate provisioning
- PgBouncer connection pooling
- Prometheus metrics endpoint
Use your own PostgreSQL (RDS, Cloud SQL, Supabase, etc.):
postgres:
enabled: true
mode: external
database: "langgraph"
user: "langgraph"
tls:
mode: require
external:
host: "my-rds.abc123.us-east-1.rds.amazonaws.com"
port: 5432
existingSecret:
name: "rds-password"
key: "password"openwebui:
ingress:
enabled: true
className: "nginx"
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
hosts:
- host: ai.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: ai-tls
hosts:
- ai.example.comopenwebui:
ingress:
enabled: true
className: "envoy"
annotations:
gateway.envoyproxy.io/tls-terminate: "true"
gateway.envoyproxy.io/timeout: "300s"
gateway.envoyproxy.io/request-body-max-size: "50m"
gateway.envoyproxy.io/rate-limit-local: "60"
gateway.envoyproxy.io/rate-limit-burst: "20"
hosts:
- host: ai.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: ai-tls
hosts:
- ai.example.comAdd the cert-manager annotation to your ingress:
openwebui:
ingress:
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"This automatically provisions and renews TLS certificates from Let's Encrypt.
Authelia is an optional OIDC identity provider that replaces Open WebUI's built-in authentication with SSO and optional MFA. When enabled, Open WebUI is automatically configured as an OIDC client.
authelia:
enabled: true
domain: "example.com"
oidc:
clientId: "openwebui"
issuerUrl: "https://auth.example.com"The chart auto-generates secrets for JWT, session, storage encryption, and the OIDC client secret. Open WebUI's OAUTH_* environment variables are injected automatically.
Authelia uses a file-based authentication backend by default. Generate a password hash and mount a custom users_database.yml:
# Generate an Argon2 password hash
docker run --rm ghcr.io/authelia/authelia:4.39 \
authelia crypto hash generate argon2 --password 'your-password'Create a users_database.yml:
users:
admin:
displayname: "Admin User"
email: admin@example.com
password: "$argon2id$v=19$m=65536,t=3,p=4$..." # paste hash here
groups:
- adminsMount it by overriding the ConfigMap or using a Helm post-renderer.
authelia:
enabled: true
defaultPolicy: "two_factor"Users will be prompted to register a TOTP device on their first login.
For production, switch from SQLite to PostgreSQL:
authelia:
enabled: true
storage: "postgres"
postgres:
enabled: trueAuthelia creates its tables in a dedicated authelia schema within the shared PostgreSQL database.
Authelia must be reachable by user browsers for OIDC redirects:
authelia:
ingress:
enabled: true
className: "envoy"
hosts:
- host: auth.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: auth-tls
hosts:
- auth.example.comAfter deploying, verify the OIDC discovery endpoint and login flow:
# Check Authelia health
kubectl exec -n ai-stack deploy/ai-stack-authelia -- wget -qO- http://localhost:9091/api/health
# Verify OIDC discovery
kubectl port-forward -n ai-stack svc/ai-stack-authelia 9091:9091
curl -s http://localhost:9091/.well-known/openid-configuration | jq .issuerOpen WebUI should redirect to Authelia's login page when accessed.
The chart deploys default-deny NetworkPolicies with per-component allowlists. This means:
- All inbound traffic is denied unless explicitly allowed
- All outbound traffic is denied unless explicitly allowed
- Each component only communicates with the services it needs
To verify:
kubectl get networkpolicies -n ai-stack
kubectl describe networkpolicy ai-stack-openwebui -n ai-stackTo disable (not recommended for production):
global:
networkPolicy:
enabled: falseAll pods run with PSA restricted baseline:
runAsNonRoot: true(except Ollama — GPU exception)readOnlyRootFilesystem: true(where supported)allowPrivilegeEscalation: falsecapabilities: drop: [ALL]seccompProfile: RuntimeDefault
Enforce at the namespace level:
kubectl label namespace ai-stack \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/warn=restrictedSecrets are auto-generated on first install with 64-byte random keys and annotated with helm.sh/resource-policy: keep to survive upgrades.
View generated secrets:
kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack
# Decode a specific secret value
kubectl get secret -n ai-stack ai-stack-qdrant-secret \
-o jsonpath='{.data.api-key}' | base64 -dUse external secrets (production):
Override auto-generated secrets with your own values:
qdrant:
apiKey: "my-externally-managed-key"Or reference pre-existing Kubernetes Secrets (e.g., from External Secrets Operator or Vault CSI):
externalAPIs:
providers:
- name: openai
existingSecret:
name: "vault-openai-secret"
key: "api-key"- Generate new secret values
- Update the Kubernetes Secret directly:
kubectl create secret generic ai-stack-qdrant-secret \
-n ai-stack \
--from-literal=api-key="$(openssl rand -base64 48)" \
--dry-run=client -o yaml | kubectl apply -f -- Restart affected pods to pick up the new secret:
kubectl rollout restart -n ai-stack deploy/ai-stack-qdrant
kubectl rollout restart -n ai-stack deploy/ai-stack-openwebuiglobal:
otel:
enabled: true
endpoint: "http://otel-collector.observability.svc.cluster.local:4317"This deploys an OTel Collector and injects OTEL_* environment variables into all component pods. The collector pipeline includes:
- OTLP gRPC and HTTP receivers
- Batch processing and memory limiting
- Kubernetes metadata enrichment
- GenAI semantic convention processing
- PII redaction (GDPR compliance)
Prerequisite: Prometheus Operator CRDs must be installed.
global:
serviceMonitor:
enabled: true
labels:
release: prometheus # Match your Prometheus operator selectorThe OTel Collector automatically redacts:
- Email addresses
- Social security numbers (Austrian VSNR format)
- Credit card numbers
To add custom redaction patterns:
otelCollector:
redaction:
enabled: true
blockedPatterns:
# Default patterns
- '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
- '\b\d{4}\s?\d{6}\b'
- '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
# Custom: phone numbers
- '\+?\d{1,3}[\s-]?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}'HPA is available for stateless components. Enable in your values:
openwebui:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
tika:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4Verify HPA status:
kubectl get hpa -n ai-stackFor components without HPA:
# Scale Tika for heavy document processing
kubectl scale -n ai-stack deploy/ai-stack-tika --replicas=3
# Scale ingestion workers for bulk ingestion
kubectl scale -n ai-stack deploy/ai-stack-ingestion-worker --replicas=4Note: Stateful components (Ollama, Qdrant) use ReadWriteOnce PVCs and cannot be scaled beyond 1 replica without operator support (e.g., Qdrant distributed mode) or shared storage.
Adjust resource requests and limits per component. Example for a high-traffic production deployment:
openwebui:
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
ollama:
resources:
requests:
cpu: "4"
memory: 16Gi
limits:
cpu: "16"
memory: 64GiTip: Set requests to match actual steady-state usage and limits to handle peak load. Monitor with Prometheus/Grafana to right-size over time.
# Review what will change
helm diff upgrade ai-stack . -n ai-stack # requires helm-diff plugin
# Apply the upgrade
helm upgrade ai-stack . -n ai-stack
# With production overlay
helm upgrade ai-stack . -n ai-stack -f values.yaml -f values-prod.yamlSecrets annotated with helm.sh/resource-policy: keep survive upgrades. PVCs are also retained.
To update a single component without changing the chart:
helm upgrade ai-stack . -n ai-stack \
--set ollama.image.tag="0.18.0"Or update the tag in your values file and run helm upgrade.
For stateless components with multiple replicas, rolling updates happen automatically. Ensure:
replicaCount >= 2or HPA is enabled withminReplicas >= 2- Pod Disruption Budgets are configured (automatic for Ollama and Qdrant)
- Readiness probes are passing before old pods are terminated
# Watch the rollout
kubectl rollout status -n ai-stack deploy/ai-stack-openwebuiManage ai-stack declaratively with ArgoCD. The repo ships two ready-to-use Application manifests under argocd/.
Prerequisites:
- ArgoCD installed in the cluster (namespace
argocd) - Repository credentials configured in ArgoCD (Settings > Repositories) so ArgoCD can pull from
https://github.com/rmednitzer/ai-stack.git
The lab application enables automated sync with self-healing and pruning — changes pushed to main are applied automatically.
kubectl apply -f argocd/application-lab.yamlKey settings in argocd/application-lab.yaml:
| Setting | Value | Purpose |
|---|---|---|
syncPolicy.automated.selfHeal |
true |
Reverts manual drift automatically |
syncPolicy.automated.prune |
true |
Deletes resources removed from the chart |
valueFiles |
values.yaml |
Uses default (lab) values only |
CreateNamespace |
true |
ArgoCD creates the ai-stack namespace |
Verify the application synced successfully:
# ArgoCD CLI
argocd app get ai-stack-lab
# Or via kubectl
kubectl get application ai-stack-lab -n argocd -o jsonpath='{.status.sync.status}'The production application uses manual sync for change-control compliance. ArgoCD detects when the repo is out-of-sync, but an operator must explicitly trigger the sync.
kubectl apply -f argocd/application-prod.yamlKey settings in argocd/application-prod.yaml:
| Setting | Value | Purpose |
|---|---|---|
syncPolicy.automated |
(omitted) | Manual sync required |
valueFiles |
values.yaml, values-prod.yaml |
Layers production overrides |
CreateNamespace |
false |
Namespace managed externally |
ApplyOutOfSyncOnly |
true |
Only syncs changed resources |
Sync workflow:
# 1. Check what changed
argocd app diff ai-stack-prod
# 2. Sync after review
argocd app sync ai-stack-prod
# 3. Monitor rollout
argocd app wait ai-stack-prod --healthThe production manifest also configures Slack notifications via argocd-notifications for sync success, failure, and health degradation events. Update the annotation values to match your Slack channel:
notifications.argoproj.io/subscribe.on-sync-succeeded.slack: ai-stack-alerts
notifications.argoproj.io/subscribe.on-sync-failed.slack: ai-stack-alerts
notifications.argoproj.io/subscribe.on-health-degraded.slack: ai-stack-alertsChange the target branch or repo:
spec:
source:
repoURL: https://github.com/your-org/ai-stack.git
targetRevision: release/v2 # Branch, tag, or commit SHAAdd per-cluster overrides without forking the chart:
spec:
source:
helm:
valueFiles:
- values.yaml
- values-prod.yaml
parameters:
- name: openwebui.ingress.hosts[0].host
value: ai.my-cluster.example.com
- name: ollama.gpu.enabled
value: "true"Use a dedicated AppProject (recommended for production):
spec:
project: ai-stack # Instead of "default"Create the AppProject to restrict allowed namespaces, cluster resources, and source repos:
argocd proj create ai-stack \
--src https://github.com/rmednitzer/ai-stack.git \
--dest https://kubernetes.default.svc,ai-stack \
--allow-cluster-resource /NamespaceBoth manifests ignore diffs on:
- Deployment replicas — prevents HPA-managed replica counts from showing as drift
- Secret data — prevents Helm-generated secrets from triggering constant out-of-sync status
Add additional ignore rules as needed:
ignoreDifferences:
- group: ""
kind: ConfigMap
jsonPointers:
- /data/custom-keyBoth applications set revisionHistoryLimit (5 for lab, 10 for production) so you can roll back to a previous sync:
# List sync history
argocd app history ai-stack-prod
# Roll back to a specific revision
argocd app rollback ai-stack-prod <HISTORY_ID>The resources-finalizer.argocd.argoproj.io finalizer ensures all managed resources are cleaned up if the Application is deleted. Secrets and PVCs annotated with helm.sh/resource-policy: keep are still retained.
This section covers EU regulatory compliance tasks. For the full compliance framework analysis, see EU_COMPLIANCE_CHECK.md. For detailed templates and procedures, see docs/compliance/.
AI Act Art. 50(1) requires informing users when they interact with an AI system. The chart includes a configurable banner:
# values.yaml or values-prod.yaml
openwebui:
env:
WEBUI_BANNER_TEXT: "You are interacting with an AI-powered assistant. Responses are generated by a large language model and may not always be accurate."
WEBUI_BANNER_DISMISSIBLE: "true"Customise the text for your deployment. Set WEBUI_BANNER_TEXT: "" to disable.
GDPR Art. 5(1)(e) requires storage limitation. Define and enforce retention periods for all personal data categories. See EU_OPERATIONS_GUIDE §1 Data Retention Policy for recommended retention periods and automated purge scripts.
When enabling external LLM providers (externalAPIs.enabled=true), complete
the pre-enablement checklist in
EU_OPERATIONS_GUIDE §2 External API Provider Governance,
including:
- Data Processing Agreement (DPA) with each provider
- International transfer assessment (SCCs, adequacy decision)
- ROPA update (PA-06 in docs/compliance/ROPA_TEMPLATE.md)
- Privacy notice update
NIS2 Art. 21(2)(h) requires cryptography policies. Ensure PVCs containing personal data use an encrypted StorageClass. See EU_OPERATIONS_GUIDE §3 Encryption at Rest.
# Use an encrypted storage class
global:
storageClass: "gp3-encrypted" # or "zfs-encrypted", etc.Complete the following before production deployment:
| Document | Location | Status |
|---|---|---|
| Data Protection Impact Assessment | docs/compliance/DPIA_TEMPLATE.md | Template — complete before deployment |
| Records of Processing Activities | docs/compliance/ROPA_TEMPLATE.md | Template — complete before deployment |
| Incident Response Playbook | docs/compliance/INCIDENT_RESPONSE.md | Template — fill contact directory |
| Data Subject Rights Procedures | docs/compliance/DSAR_PROCEDURES.md | Template — establish intake channels |
| EU Operations Guide | docs/compliance/EU_OPERATIONS_GUIDE.md | Reference — review all sections |
| Security Policy / CVD | SECURITY.md | Template — set security contact email |
| EU Compliance Check | EU_COMPLIANCE_CHECK.md | Complete — review and track gaps |
Start with the first-line command below, then jump to the linked subsection for the full treatment.
| Symptom | Most likely cause | First-line command | See |
|---|---|---|---|
Pod stuck in Pending |
Insufficient resources, GPU unavailable, unschedulable | kubectl describe pod -n ai-stack <pod> |
§19.1 |
| Ollama pod OOMKilled | Model larger than memory limit | kubectl describe pod -n ai-stack -l app.kubernetes.io/component=ollama | grep -A2 OOM |
§19.2 |
| Open WebUI returns "model not found" / connection refused | Ollama pod not ready or DNS / NetworkPolicy blocking | kubectl exec -n ai-stack deploy/ai-stack-openwebui -- wget -qO- http://ai-stack-ollama:11434/ |
§19.3 |
| Cross-component traffic fails with services present | NetworkPolicy default-deny allowlist too strict | kubectl get networkpolicies -n ai-stack -o wide |
§19.4 |
PVC stuck in Pending |
Missing StorageClass, no capacity, access-mode mismatch | kubectl describe pvc -n ai-stack <pvc> |
§19.5 |
| Secret missing after upgrade | Secrets are only generated on install | kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack |
§19.6 |
helm test failing |
Enabled service unreachable over TCP/HTTP | helm test ai-stack -n ai-stack --logs |
§19.7 |
| Ollama running on CPU despite GPU enabled | NVIDIA device plugin missing or resource not advertised | kubectl describe node <node> | grep nvidia |
§19.8 |
kubectl describe pod -n ai-stack <pod-name>Common causes:
- Insufficient resources: Increase node capacity or reduce resource requests
- No matching node selector/tolerations: Check
global.nodeSelectorandglobal.tolerations - GPU requested but unavailable: Ensure the NVIDIA GPU Operator is installed and GPUs are free
Ollama may OOM when loading large models. Solutions:
- Increase memory limits:
ollama:
resources:
limits:
memory: 64Gi # Match model requirements-
Use smaller quantized models:
llama3.2:3binstead ofllama3.2:70b -
Reduce keep-alive time to unload idle models faster:
ollama:
env:
OLLAMA_KEEP_ALIVE: "1m"- Check Ollama is running:
kubectl get pods -n ai-stack -l app.kubernetes.io/component=ollama - Check the service exists:
kubectl get svc -n ai-stack -l app.kubernetes.io/component=ollama - Test DNS resolution from Open WebUI pod:
kubectl exec -n ai-stack deploy/ai-stack-openwebui -- \
wget -qO- http://ai-stack-ollama:11434/- Check NetworkPolicy allows the connection:
kubectl describe networkpolicy -n ai-stack | grep -A 5 ollamaSymptom: Components cannot communicate even though services exist.
- Verify policies are correct:
kubectl get networkpolicies -n ai-stack -o wide- Temporarily disable to confirm it's a policy issue (lab only):
helm upgrade ai-stack . -n ai-stack --set global.networkPolicy.enabled=false- If traffic works with policies disabled, check the specific component's policy rules in
templates/common/networkpolicies.yaml.
kubectl describe pvc -n ai-stack <pvc-name>Common causes:
- No StorageClass: Set
global.storageClassto a valid class - Insufficient storage capacity: Check available storage in the cluster
- Access mode mismatch: Ensure the StorageClass supports
ReadWriteOnce
Secrets are only generated on helm install, not on helm upgrade. If secrets are missing:
# Check if secrets exist
kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack
# If missing, they may have been accidentally deleted.
# Uninstall and reinstall (data in PVCs is preserved):
helm uninstall ai-stack -n ai-stack
helm install ai-stack . -n ai-stackImportant: PVCs with helm.sh/resource-policy: keep are not deleted on uninstall.
# Run tests with verbose output
helm test ai-stack -n ai-stack --logs
# Check the test pod logs
kubectl logs -n ai-stack ai-stack-connection-testTests verify TCP and HTTP connectivity to all enabled services.
# Check NVIDIA device plugin is running
kubectl get pods -n gpu-operator
# Check node GPU resources
kubectl describe node <node-name> | grep nvidia
# Check Ollama logs for GPU detection
kubectl logs -n ai-stack deploy/ai-stack-ollama | grep -i gpuEnsure:
- NVIDIA GPU Operator is installed and healthy
- Node has
nvidia.com/gpuresources advertised - The pod's
resources.limitsincludesnvidia.com/gpu: 1
# Remove the Helm release (PVCs are retained)
helm uninstall ai-stack -n ai-stack
# To also delete PVCs and all data (irreversible):
kubectl delete pvc -n ai-stack -l app.kubernetes.io/part-of=ai-stack
# Delete the namespace
kubectl delete namespace ai-stackWarning: Deleting PVCs destroys all stored models, documents, vector embeddings, and configuration. Back up first if needed.