ai-stack How-To Guide

Practical, task-oriented guide for deploying, operating, and maintaining the ai-stack Helm chart. For architecture overview and configuration reference, see README.md.

Installation
- Lab environment
- Production environment
- Air-gapped / offline install
- Air-gapped install with Zarf
Day-1 Setup
- Pull your first models
- Access Open WebUI
- Create your admin account
- Verify the deployment
Working with Models
- List available models
- Pull additional models
- Remove a model
- Set a default model
RAG (Retrieval-Augmented Generation)
- Upload documents via the UI
- Configure the embedding model
- Tune chunking and retrieval
- Enable web search
Async Document Ingestion
- Enable the ingestion worker
- Enqueue documents programmatically
- Monitor ingestion status
External LLM Providers
- Add OpenAI
- Add Azure OpenAI
- Add Anthropic (Claude)
- Use an external secret manager
GPU Acceleration
- Enable GPU for Ollama
- Enable the GPU Workbench
- Verify GPU access
Agentic Workloads (LangGraph)
- Enable LangGraph with PostgreSQL
- Deploy a custom graph
- Test the LangGraph API
MCP Tool Integration (MCPO)
- Enable MCPO
- Configure MCP servers
PostgreSQL Modes
- Standalone (lab)
- CloudNativePG (production HA)
- External managed database
Ingress and TLS
- Expose Open WebUI with NGINX
- Expose Open WebUI with Envoy Gateway
- Automated TLS with cert-manager
Authentication with Authelia (SSO / OIDC)
- Enable Authelia
- Create users
- Enable MFA (two-factor)
- Use PostgreSQL as storage backend
- Expose Authelia via ingress
- Verify OIDC integration
Networking and Security
- Network policies
- Pod security
- Secret management
- Rotate secrets
Observability
- Enable OpenTelemetry
- Enable Prometheus ServiceMonitors
- PII redaction
Scaling
- Horizontal Pod Autoscaling
- Manual scaling
- Resource tuning
Upgrading
- Upgrade the chart
- Upgrade individual component images
- Upgrade with zero downtime
GitOps with ArgoCD
- Deploy the lab application
- Deploy the production application
- Customizing the application manifests
- Ignore differences
- Disaster recovery
EU Compliance
- AI transparency disclosure
- Data retention
- External API provider governance
- Encryption at rest
- Compliance documentation
Troubleshooting
- Pods stuck in Pending
- Ollama out of memory
- Open WebUI cannot reach Ollama
- NetworkPolicy blocking traffic
- PVC stuck in Pending
- Secrets not generated
- Helm test failures
- GPU not detected
Uninstall

1. Installation

1.1 Lab Environment

Lab mode deploys a single-replica stack with relaxed resource limits, suitable for development and evaluation.

Prerequisites:

Kubernetes 1.27+ cluster (minikube, kind, k3s, or managed)
Helm 3.12+
At least 8 GB RAM available in the cluster
A default StorageClass (or use emptyDir for ephemeral testing)

Install:

# Create the namespace and install with lab defaults
helm install ai-stack . -n ai-stack --create-namespace

Lab with GPU:

helm install ai-stack . -n ai-stack --create-namespace \
  --set ollama.gpu.enabled=true

1.2 Production Environment

Production mode enables HA replicas, autoscaling, TLS ingress, and observability.

Additional prerequisites:

NVIDIA GPU Operator (for Ollama GPU acceleration)
Prometheus Operator CRDs (for ServiceMonitor resources)
cert-manager (for automated TLS provisioning)
An ingress controller (Envoy Gateway or NGINX)

Install:

helm install ai-stack . -n ai-stack --create-namespace \
  -f values.yaml -f values-prod.yaml

Customize before installing:

Copy values-prod.yaml to values-prod-override.yaml
Edit your overrides (hostname, storage class, resource limits)
Install with both files:

helm install ai-stack . -n ai-stack --create-namespace \
  -f values.yaml -f values-prod.yaml -f values-prod-override.yaml

1.3 Air-gapped / Offline Install

For environments without internet access:

Mirror container images to your internal registry:

# List all images used by the chart
helm template ai-stack . | grep "image:" | sort -u

# Pull, tag, and push each image to your registry
docker pull ghcr.io/open-webui/open-webui:v0.8.10
docker tag ghcr.io/open-webui/open-webui:v0.8.10 registry.internal/open-webui:v0.8.10
docker push registry.internal/open-webui:v0.8.10
# Repeat for all images...

Override image repositories in your values file:

openwebui:
  image:
    repository: registry.internal/open-webui
    tag: "v0.8.10"
ollama:
  image:
    repository: registry.internal/ollama
    tag: "0.18.2"
# ... repeat for all components

Configure image pull secrets if your registry requires authentication:

global:
  imagePullSecrets:
    - name: my-registry-secret

Pre-download Ollama models and load them into the PVC, since ollama pull requires internet access. See Section 3.

1.4 Air-gapped Install with Zarf

Zarf automates air-gapped deployments by packaging the Helm chart, all container images, and configuration into a single signed, declarative tarball. This eliminates the manual image mirroring described in Section 1.3.

The repository includes a zarf.yaml package definition with the core stack as a required component and optional components (Workbench, LangGraph, MCPO, OTel Collector) that can be selected at deploy time.

Prerequisites:

Zarf CLI installed on the build machine (internet-connected)
Zarf initialized on the target cluster (zarf init)
Kubernetes 1.27+ on the target cluster

Step 1 — Build the package (internet-connected machine):

cd ai-stack/
zarf package create --confirm

This produces a file like zarf-package-ai-stack-amd64-1.0.0.tar.zst (~15-25 GB depending on selected components). Zarf automatically pulls all images listed in zarf.yaml and bundles them alongside the Helm chart.

Step 2 — Transfer the package:

Copy the .tar.zst file to the air-gapped environment via USB, S3 bucket, or any out-of-band transfer method.

Step 3 — Initialize Zarf on the target cluster (one-time):

If Zarf has not been initialized on the target cluster yet:

zarf init --confirm

This deploys an in-cluster registry and injector that Zarf uses to serve images.

Step 4 — Deploy:

# Deploy with defaults (core stack only)
zarf package deploy zarf-package-ai-stack-amd64-1.0.0.tar.zst --confirm

# Deploy with optional components
zarf package deploy zarf-package-ai-stack-amd64-1.0.0.tar.zst \
  --components="ai-stack,langgraph,mcpo" --confirm

Zarf pushes the images to the in-cluster registry and runs helm install with the image references rewritten to point at the local registry.

Step 5 — Load Ollama models:

Zarf handles images, but Ollama models must still be loaded manually in an air-gapped cluster. On the internet-connected machine:

# Pull the model locally
ollama pull llama3.2
ollama pull nomic-embed-text

# Export to a tarball
# Models are stored under ~/.ollama/models/
tar czf ollama-models.tar.gz -C ~/.ollama models/

On the air-gapped cluster:

# Copy the models into the Ollama PVC
kubectl cp ollama-models.tar.gz ai-stack/ai-stack-ollama-0:/tmp/
kubectl exec -n ai-stack ai-stack-ollama-0 -- \
  tar xzf /tmp/ollama-models.tar.gz -C /root/.ollama/
kubectl exec -n ai-stack ai-stack-ollama-0 -- rm /tmp/ollama-models.tar.gz

# Restart Ollama to pick up the models
kubectl rollout restart -n ai-stack deploy/ai-stack-ollama

Upgrading:

Build a new package with the updated chart/images and redeploy:

zarf package deploy zarf-package-ai-stack-amd64-<new-version>.tar.zst --confirm

Zarf performs a helm upgrade under the hood.

2. Day-1 Setup

2.1 Pull Your First Models

After installation, Ollama starts with no models. Pull a chat model and an embedding model:

# Chat model
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull llama3.2

# Embedding model (required for RAG)
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull nomic-embed-text

For larger models (requires more RAM/VRAM):

kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull qwen3:14b
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull deepseek-r1:14b

2.2 Access Open WebUI

Port-forward (lab):

kubectl port-forward -n ai-stack svc/ai-stack-openwebui 8080:8080
# Open http://localhost:8080

Via ingress (production):

If ingress is configured, access via the hostname defined in your values (e.g., https://ai.example.com).

2.3 Create Your Admin Account

On first access, Open WebUI prompts you to create an admin account. This account controls:

User management and permissions
Model access control
System settings and configuration
Pipeline and tool management

Important: The first account created automatically becomes the admin. Do this immediately after deployment in production.

2.4 Verify the Deployment

# All pods should be Running
kubectl get pods -n ai-stack

# NetworkPolicies should be present for each component
kubectl get networkpolicies -n ai-stack

# Secrets should be auto-generated
kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack

# ServiceAccounts per component
kubectl get serviceaccounts -n ai-stack

# Run Helm tests (connectivity checks)
helm test ai-stack -n ai-stack

3. Working with Models

3.1 List Available Models

# List models loaded in Ollama
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama list

3.2 Pull Additional Models

# Pull any model from the Ollama library
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull <model-name>

# Examples
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull mistral
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull codellama:13b
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull llama3.2-vision:11b

Model storage: Models are stored in the Ollama PVC (/root/.ollama). Ensure the PVC is large enough — a 14B parameter model typically requires 9-10 GB of storage. The default lab PVC is 50 GB; production is 200 GB.

3.3 Remove a Model

kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama rm <model-name>

3.4 Set a Default Model

In Open WebUI, go to Admin Panel > Settings > Models and configure the default model. Users can still select other available models from the model picker.

4. RAG (Retrieval-Augmented Generation)

RAG allows the AI to answer questions using your own documents. The stack includes all components needed: Tika (document parsing), Qdrant (vector storage), and Ollama (embeddings).

4.1 Upload Documents via the UI

Open the Open WebUI chat interface
Click the + button or drag and drop files into the chat
Supported formats: PDF, DOCX, PPTX, XLSX, TXT, HTML, Markdown, and more (via Tika)
Documents are automatically extracted, chunked, embedded, and stored in Qdrant

4.2 Configure the Embedding Model

The default embedding model is nomic-embed-text. To change it:

openwebui:
  env:
    RAG_EMBEDDING_MODEL: "bge-m3"

Then pull the new model:

kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull bge-m3

Upgrade the release:

helm upgrade ai-stack . -n ai-stack

Note: Changing the embedding model requires re-embedding all existing documents, as vector dimensions and representations differ between models.

4.3 Tune Chunking and Retrieval

Adjust these parameters in your values override:

openwebui:
  env:
    # Larger chunks = more context per retrieval, but fewer distinct matches
    RAG_CHUNK_SIZE: "1500"
    # Overlap prevents splitting relevant content at chunk boundaries
    RAG_CHUNK_OVERLAP: "100"
    # Number of top matching chunks to include in the prompt
    RAG_TOP_K: "5"
    # Minimum similarity score (0.0 = return all, higher = stricter)
    RAG_RELEVANCE_THRESHOLD: "0.0"

Guidelines:

Scenario	Chunk Size	Overlap	Top K
Short, factual documents	500-800	50	3-5
Long technical documents	1500-2000	100-200	5-8
Legal/regulatory text	1000-1500	200	8-10
Code repositories	800-1200	100	5-7

4.4 Enable Web Search

Web search via SearXNG is enabled by default. It allows the AI to search the internet for answers when document retrieval is insufficient.

To use web search in a conversation, type a question and enable the "Web Search" toggle in the chat interface, or configure it as the default behavior in Admin Panel settings.

5. Async Document Ingestion

For bulk document processing or integration with external systems, use the async ingestion worker instead of the UI upload.

5.1 Enable the Ingestion Worker

ingestionWorker:
  enabled: true
valkey:
  persistence:
    enabled: true  # Persist task queue across restarts

helm upgrade ai-stack . -n ai-stack

5.2 Enqueue Documents Programmatically

Connect to Valkey and submit tasks via XADD:

# Port-forward to Valkey
kubectl port-forward -n ai-stack svc/ai-stack-valkey 6379:6379

# Submit an ingestion task
redis-cli -p 6379 XADD ingestion:documents '*' \
  task_id "doc-001" \
  file_url "https://example.com/report.pdf" \
  filename "report.pdf"

Or from within the cluster (e.g., from a script or application):

import redis

r = redis.Redis(host='ai-stack-valkey', port=6379)
r.xadd('ingestion:documents', {
    'task_id': 'doc-001',
    'file_url': 'https://example.com/report.pdf',
    'filename': 'report.pdf'
})

5.3 Monitor Ingestion Status

# Check status of a specific task
redis-cli -p 6379 HGETALL ingestion:status:doc-001

# List recent messages in the stream
redis-cli -p 6379 XRANGE ingestion:documents - + COUNT 10

# Check consumer group lag
redis-cli -p 6379 XINFO GROUPS ingestion:documents

Status values: queued → processing → completed | failed

6. External LLM Providers

Add cloud-hosted models alongside local Ollama inference. Users see all models in the Open WebUI model picker.

6.1 Add OpenAI

externalAPIs:
  enabled: true
  providers:
    - name: openai
      baseUrl: "https://api.openai.com/v1"
      apiKey: "sk-..."

6.2 Add Azure OpenAI

externalAPIs:
  enabled: true
  providers:
    - name: azure-openai
      baseUrl: "https://<resource>.openai.azure.com/openai/deployments/<deployment>"
      apiKey: "<your-azure-key>"

6.3 Add Anthropic (Claude)

externalAPIs:
  enabled: true
  providers:
    - name: anthropic
      baseUrl: "https://api.anthropic.com/v1"
      apiKey: "sk-ant-..."

Note: Anthropic API integration requires Open WebUI v0.6+ with the Anthropic API translation layer, or an Open WebUI function for protocol translation.

6.4 Use an External Secret Manager

For production, never store API keys in values files. Use existing Kubernetes Secrets (created by ESO, Vault, or manually):

externalAPIs:
  enabled: true
  providers:
    - name: openai
      baseUrl: "https://api.openai.com/v1"
      existingSecret:
        name: "openai-api-key"    # Must exist in the release namespace
        key: "api-key"            # Key within the Secret

7. GPU Acceleration

7.1 Enable GPU for Ollama

Prerequisites: NVIDIA GPU Operator must be installed in the cluster.

ollama:
  gpu:
    enabled: true
    count: 1                      # Number of GPUs to allocate
    resourceName: nvidia.com/gpu  # Resource name from GPU operator

helm upgrade ai-stack . -n ai-stack

7.2 Enable the GPU Workbench

The Workbench provides a JupyterLab environment with CUDA and PyTorch for ML experimentation:

workbench:
  enabled: true
  gpu:
    enabled: true
    count: 1

Access the Workbench:

# Get the auto-generated token
kubectl get secret -n ai-stack ai-stack-workbench-secret \
  -o jsonpath='{.data.token}' | base64 -d

# Port-forward
kubectl port-forward -n ai-stack svc/ai-stack-workbench 8888:8888
# Open http://localhost:8888 and enter the token

7.3 Verify GPU Access

# Check Ollama GPU detection
kubectl exec -n ai-stack deploy/ai-stack-ollama -- nvidia-smi

# Check Workbench GPU access
kubectl exec -n ai-stack deploy/ai-stack-workbench -- python3 -c \
  "import torch; print(f'CUDA available: {torch.cuda.is_available()}, Devices: {torch.cuda.device_count()}')"

8. Agentic Workloads (LangGraph)

LangGraph enables stateful, multi-step agentic workflows with tool calling and checkpoint persistence.

8.1 Enable LangGraph with PostgreSQL

LangGraph requires PostgreSQL for checkpoint storage:

langgraph:
  enabled: true
postgres:
  enabled: true
  mode: standalone  # Use 'cnpg' for production HA

helm upgrade ai-stack . -n ai-stack

8.2 Deploy a Custom Graph

Option A: Custom image (recommended)

Create your graph code following the LangGraph documentation
Build the image:

langgraph build -t my-registry/my-graphs:latest
docker push my-registry/my-graphs:latest

Override the image in values:

langgraph:
  image:
    repository: my-registry/my-graphs
    tag: "latest"

Option B: Volume mount

Place graph code in the /deps/graphs persistent volume:

kubectl cp my-graph.py ai-stack/ai-stack-langgraph-<pod>:/deps/graphs/

8.3 Test the LangGraph API

# Port-forward
kubectl port-forward -n ai-stack svc/ai-stack-langgraph 8000:8000

# Health check
curl http://localhost:8000/ok

# List available assistants
curl http://localhost:8000/assistants \
  -H "x-api-key: $(kubectl get secret -n ai-stack ai-stack-langgraph-secret -o jsonpath='{.data.api-key}' | base64 -d)"

9. MCP Tool Integration (MCPO)

MCPO bridges Model Context Protocol (MCP) servers to OpenAPI endpoints that Open WebUI can consume as tools.

9.1 Enable MCPO

mcpo:
  enabled: true

9.2 Configure MCP Servers

Add MCP server definitions in your values:

mcpo:
  enabled: true
  config:
    mcpServers:
      # Local filesystem access
      filesystem:
        command: "npx"
        args:
          - "-y"
          - "@modelcontextprotocol/server-filesystem"
          - "/data"
      # Remote SSE-based MCP server
      remote-tools:
        url: "https://mcp.example.com/sse"
        type: "sse"

After deploying, configure Open WebUI to use the MCPO endpoint as an OpenAPI tool source under Admin Panel > Settings > Tools.

10. PostgreSQL Modes

10.1 Standalone (Lab)

Single-instance PostgreSQL — no HA, suitable for development:

postgres:
  enabled: true
  mode: standalone

10.2 CloudNativePG (Production HA)

Prerequisites: Install the CloudNativePG operator (v1.25+):

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm install cnpg cnpg/cloudnative-pg -n cnpg-system --create-namespace

Then configure:

postgres:
  enabled: true
  mode: cnpg
  tls:
    mode: require
  cnpg:
    instances: 3          # 1 primary + 2 replicas
    storage:
      size: 50Gi
    pooler:
      enabled: true       # PgBouncer connection pooling
    monitoring:
      enabled: true       # Prometheus metrics

CNPG provides:

Streaming replication with automated failover
Rolling updates without downtime
Automated TLS certificate provisioning
PgBouncer connection pooling
Prometheus metrics endpoint

10.3 External Managed Database

Use your own PostgreSQL (RDS, Cloud SQL, Supabase, etc.):

postgres:
  enabled: true
  mode: external
  database: "langgraph"
  user: "langgraph"
  tls:
    mode: require
  external:
    host: "my-rds.abc123.us-east-1.rds.amazonaws.com"
    port: 5432
    existingSecret:
      name: "rds-password"
      key: "password"

11. Ingress and TLS

11.1 Expose Open WebUI with NGINX

openwebui:
  ingress:
    enabled: true
    className: "nginx"
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "50m"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    hosts:
      - host: ai.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: ai-tls
        hosts:
          - ai.example.com

11.2 Expose Open WebUI with Envoy Gateway

openwebui:
  ingress:
    enabled: true
    className: "envoy"
    annotations:
      gateway.envoyproxy.io/tls-terminate: "true"
      gateway.envoyproxy.io/timeout: "300s"
      gateway.envoyproxy.io/request-body-max-size: "50m"
      gateway.envoyproxy.io/rate-limit-local: "60"
      gateway.envoyproxy.io/rate-limit-burst: "20"
    hosts:
      - host: ai.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: ai-tls
        hosts:
          - ai.example.com

11.3 Automated TLS with cert-manager

Add the cert-manager annotation to your ingress:

openwebui:
  ingress:
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"

This automatically provisions and renews TLS certificates from Let's Encrypt.

12. Authentication with Authelia (SSO / OIDC)

Authelia is an optional OIDC identity provider that replaces Open WebUI's built-in authentication with SSO and optional MFA. When enabled, Open WebUI is automatically configured as an OIDC client.

12.1 Enable Authelia

authelia:
  enabled: true
  domain: "example.com"
  oidc:
    clientId: "openwebui"
    issuerUrl: "https://auth.example.com"

The chart auto-generates secrets for JWT, session, storage encryption, and the OIDC client secret. Open WebUI's OAUTH_* environment variables are injected automatically.

12.2 Create users

Authelia uses a file-based authentication backend by default. Generate a password hash and mount a custom users_database.yml:

# Generate an Argon2 password hash
docker run --rm ghcr.io/authelia/authelia:4.39 \
  authelia crypto hash generate argon2 --password 'your-password'

Create a users_database.yml:

users:
  admin:
    displayname: "Admin User"
    email: admin@example.com
    password: "$argon2id$v=19$m=65536,t=3,p=4$..."  # paste hash here
    groups:
      - admins

Mount it by overriding the ConfigMap or using a Helm post-renderer.

12.3 Enable MFA (two-factor)

authelia:
  enabled: true
  defaultPolicy: "two_factor"

Users will be prompted to register a TOTP device on their first login.

12.4 Use PostgreSQL as storage backend

For production, switch from SQLite to PostgreSQL:

authelia:
  enabled: true
  storage: "postgres"
postgres:
  enabled: true

Authelia creates its tables in a dedicated authelia schema within the shared PostgreSQL database.

12.5 Expose Authelia via ingress

Authelia must be reachable by user browsers for OIDC redirects:

authelia:
  ingress:
    enabled: true
    className: "envoy"
    hosts:
      - host: auth.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: auth-tls
        hosts:
          - auth.example.com

12.6 Verify OIDC integration

After deploying, verify the OIDC discovery endpoint and login flow:

# Check Authelia health
kubectl exec -n ai-stack deploy/ai-stack-authelia -- wget -qO- http://localhost:9091/api/health

# Verify OIDC discovery
kubectl port-forward -n ai-stack svc/ai-stack-authelia 9091:9091
curl -s http://localhost:9091/.well-known/openid-configuration | jq .issuer

Open WebUI should redirect to Authelia's login page when accessed.

13. Networking and Security

13.1 Network Policies

The chart deploys default-deny NetworkPolicies with per-component allowlists. This means:

All inbound traffic is denied unless explicitly allowed
All outbound traffic is denied unless explicitly allowed
Each component only communicates with the services it needs

To verify:

kubectl get networkpolicies -n ai-stack
kubectl describe networkpolicy ai-stack-openwebui -n ai-stack

To disable (not recommended for production):

global:
  networkPolicy:
    enabled: false

13.2 Pod Security

All pods run with PSA restricted baseline:

runAsNonRoot: true (except Ollama — GPU exception)
readOnlyRootFilesystem: true (where supported)
allowPrivilegeEscalation: false
capabilities: drop: [ALL]
seccompProfile: RuntimeDefault

Enforce at the namespace level:

kubectl label namespace ai-stack \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted

13.3 Secret Management

Secrets are auto-generated on first install with 64-byte random keys and annotated with helm.sh/resource-policy: keep to survive upgrades.

View generated secrets:

kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack

# Decode a specific secret value
kubectl get secret -n ai-stack ai-stack-qdrant-secret \
  -o jsonpath='{.data.api-key}' | base64 -d

Use external secrets (production):

Override auto-generated secrets with your own values:

qdrant:
  apiKey: "my-externally-managed-key"

Or reference pre-existing Kubernetes Secrets (e.g., from External Secrets Operator or Vault CSI):

externalAPIs:
  providers:
    - name: openai
      existingSecret:
        name: "vault-openai-secret"
        key: "api-key"

13.4 Rotate Secrets

Generate new secret values
Update the Kubernetes Secret directly:

kubectl create secret generic ai-stack-qdrant-secret \
  -n ai-stack \
  --from-literal=api-key="$(openssl rand -base64 48)" \
  --dry-run=client -o yaml | kubectl apply -f -

Restart affected pods to pick up the new secret:

kubectl rollout restart -n ai-stack deploy/ai-stack-qdrant
kubectl rollout restart -n ai-stack deploy/ai-stack-openwebui

14. Observability

14.1 Enable OpenTelemetry

global:
  otel:
    enabled: true
    endpoint: "http://otel-collector.observability.svc.cluster.local:4317"

This deploys an OTel Collector and injects OTEL_* environment variables into all component pods. The collector pipeline includes:

OTLP gRPC and HTTP receivers
Batch processing and memory limiting
Kubernetes metadata enrichment
GenAI semantic convention processing
PII redaction (GDPR compliance)

14.2 Enable Prometheus ServiceMonitors

Prerequisite: Prometheus Operator CRDs must be installed.

global:
  serviceMonitor:
    enabled: true
    labels:
      release: prometheus  # Match your Prometheus operator selector

14.3 PII Redaction

The OTel Collector automatically redacts:

Email addresses
Social security numbers (Austrian VSNR format)
Credit card numbers

To add custom redaction patterns:

otelCollector:
  redaction:
    enabled: true
    blockedPatterns:
      # Default patterns
      - '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
      - '\b\d{4}\s?\d{6}\b'
      - '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
      # Custom: phone numbers
      - '\+?\d{1,3}[\s-]?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}'

15. Scaling

15.1 Horizontal Pod Autoscaling

HPA is available for stateless components. Enable in your values:

openwebui:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    targetCPUUtilizationPercentage: 70
    targetMemoryUtilizationPercentage: 80

tika:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 4

Verify HPA status:

kubectl get hpa -n ai-stack

15.2 Manual Scaling

For components without HPA:

# Scale Tika for heavy document processing
kubectl scale -n ai-stack deploy/ai-stack-tika --replicas=3

# Scale ingestion workers for bulk ingestion
kubectl scale -n ai-stack deploy/ai-stack-ingestion-worker --replicas=4

Note: Stateful components (Ollama, Qdrant) use ReadWriteOnce PVCs and cannot be scaled beyond 1 replica without operator support (e.g., Qdrant distributed mode) or shared storage.

15.3 Resource Tuning

Adjust resource requests and limits per component. Example for a high-traffic production deployment:

openwebui:
  resources:
    requests:
      cpu: "1"
      memory: 2Gi
    limits:
      cpu: "4"
      memory: 8Gi

ollama:
  resources:
    requests:
      cpu: "4"
      memory: 16Gi
    limits:
      cpu: "16"
      memory: 64Gi

Tip: Set requests to match actual steady-state usage and limits to handle peak load. Monitor with Prometheus/Grafana to right-size over time.

16. Upgrading

16.1 Upgrade the Chart

# Review what will change
helm diff upgrade ai-stack . -n ai-stack  # requires helm-diff plugin

# Apply the upgrade
helm upgrade ai-stack . -n ai-stack

# With production overlay
helm upgrade ai-stack . -n ai-stack -f values.yaml -f values-prod.yaml

Secrets annotated with helm.sh/resource-policy: keep survive upgrades. PVCs are also retained.

16.2 Upgrade Individual Component Images

To update a single component without changing the chart:

helm upgrade ai-stack . -n ai-stack \
  --set ollama.image.tag="0.18.0"

Or update the tag in your values file and run helm upgrade.

16.3 Upgrade with Zero Downtime

For stateless components with multiple replicas, rolling updates happen automatically. Ensure:

replicaCount >= 2 or HPA is enabled with minReplicas >= 2
Pod Disruption Budgets are configured (automatic for Ollama and Qdrant)
Readiness probes are passing before old pods are terminated

# Watch the rollout
kubectl rollout status -n ai-stack deploy/ai-stack-openwebui

17. GitOps with ArgoCD

Manage ai-stack declaratively with ArgoCD. The repo ships two ready-to-use Application manifests under argocd/.

Prerequisites:

ArgoCD installed in the cluster (namespace argocd)
Repository credentials configured in ArgoCD (Settings > Repositories) so ArgoCD can pull from https://github.com/rmednitzer/ai-stack.git

17.1 Deploy the Lab Application

The lab application enables automated sync with self-healing and pruning — changes pushed to main are applied automatically.

kubectl apply -f argocd/application-lab.yaml

Key settings in argocd/application-lab.yaml:

Setting	Value	Purpose
`syncPolicy.automated.selfHeal`	`true`	Reverts manual drift automatically
`syncPolicy.automated.prune`	`true`	Deletes resources removed from the chart
`valueFiles`	`values.yaml`	Uses default (lab) values only
`CreateNamespace`	`true`	ArgoCD creates the `ai-stack` namespace

Verify the application synced successfully:

# ArgoCD CLI
argocd app get ai-stack-lab

# Or via kubectl
kubectl get application ai-stack-lab -n argocd -o jsonpath='{.status.sync.status}'

17.2 Deploy the Production Application

The production application uses manual sync for change-control compliance. ArgoCD detects when the repo is out-of-sync, but an operator must explicitly trigger the sync.

kubectl apply -f argocd/application-prod.yaml

Key settings in argocd/application-prod.yaml:

Setting	Value	Purpose
`syncPolicy.automated`	(omitted)	Manual sync required
`valueFiles`	`values.yaml`, `values-prod.yaml`	Layers production overrides
`CreateNamespace`	`false`	Namespace managed externally
`ApplyOutOfSyncOnly`	`true`	Only syncs changed resources

Sync workflow:

# 1. Check what changed
argocd app diff ai-stack-prod

# 2. Sync after review
argocd app sync ai-stack-prod

# 3. Monitor rollout
argocd app wait ai-stack-prod --health

The production manifest also configures Slack notifications via argocd-notifications for sync success, failure, and health degradation events. Update the annotation values to match your Slack channel:

notifications.argoproj.io/subscribe.on-sync-succeeded.slack: ai-stack-alerts
notifications.argoproj.io/subscribe.on-sync-failed.slack: ai-stack-alerts
notifications.argoproj.io/subscribe.on-health-degraded.slack: ai-stack-alerts

17.3 Customizing the Application Manifests

Change the target branch or repo:

spec:
  source:
    repoURL: https://github.com/your-org/ai-stack.git
    targetRevision: release/v2   # Branch, tag, or commit SHA

Add per-cluster overrides without forking the chart:

spec:
  source:
    helm:
      valueFiles:
        - values.yaml
        - values-prod.yaml
      parameters:
        - name: openwebui.ingress.hosts[0].host
          value: ai.my-cluster.example.com
        - name: ollama.gpu.enabled
          value: "true"

Use a dedicated AppProject (recommended for production):

spec:
  project: ai-stack  # Instead of "default"

Create the AppProject to restrict allowed namespaces, cluster resources, and source repos:

argocd proj create ai-stack \
  --src https://github.com/rmednitzer/ai-stack.git \
  --dest https://kubernetes.default.svc,ai-stack \
  --allow-cluster-resource /Namespace

17.4 Ignore Differences

Both manifests ignore diffs on:

Deployment replicas — prevents HPA-managed replica counts from showing as drift
Secret data — prevents Helm-generated secrets from triggering constant out-of-sync status

Add additional ignore rules as needed:

ignoreDifferences:
  - group: ""
    kind: ConfigMap
    jsonPointers:
      - /data/custom-key

17.5 Disaster Recovery

Both applications set revisionHistoryLimit (5 for lab, 10 for production) so you can roll back to a previous sync:

# List sync history
argocd app history ai-stack-prod

# Roll back to a specific revision
argocd app rollback ai-stack-prod <HISTORY_ID>

The resources-finalizer.argocd.argoproj.io finalizer ensures all managed resources are cleaned up if the Application is deleted. Secrets and PVCs annotated with helm.sh/resource-policy: keep are still retained.

18. EU Compliance

This section covers EU regulatory compliance tasks. For the full compliance framework analysis, see EU_COMPLIANCE_CHECK.md. For detailed templates and procedures, see docs/compliance/.

18.1 AI Transparency Disclosure

AI Act Art. 50(1) requires informing users when they interact with an AI system. The chart includes a configurable banner:

# values.yaml or values-prod.yaml
openwebui:
  env:
    WEBUI_BANNER_TEXT: "You are interacting with an AI-powered assistant. Responses are generated by a large language model and may not always be accurate."
    WEBUI_BANNER_DISMISSIBLE: "true"

Customise the text for your deployment. Set WEBUI_BANNER_TEXT: "" to disable.

18.2 Data Retention

GDPR Art. 5(1)(e) requires storage limitation. Define and enforce retention periods for all personal data categories. See EU_OPERATIONS_GUIDE §1 Data Retention Policy for recommended retention periods and automated purge scripts.

18.3 External API Provider Governance

When enabling external LLM providers (externalAPIs.enabled=true), complete the pre-enablement checklist in EU_OPERATIONS_GUIDE §2 External API Provider Governance, including:

Data Processing Agreement (DPA) with each provider
International transfer assessment (SCCs, adequacy decision)
ROPA update (PA-06 in docs/compliance/ROPA_TEMPLATE.md)
Privacy notice update

18.4 Encryption at Rest

NIS2 Art. 21(2)(h) requires cryptography policies. Ensure PVCs containing personal data use an encrypted StorageClass. See EU_OPERATIONS_GUIDE §3 Encryption at Rest.

# Use an encrypted storage class
global:
  storageClass: "gp3-encrypted"  # or "zfs-encrypted", etc.

18.5 Compliance Documentation

Complete the following before production deployment:

Document	Location	Status
Data Protection Impact Assessment	docs/compliance/DPIA_TEMPLATE.md	Template — complete before deployment
Records of Processing Activities	docs/compliance/ROPA_TEMPLATE.md	Template — complete before deployment
Incident Response Playbook	docs/compliance/INCIDENT_RESPONSE.md	Template — fill contact directory
Data Subject Rights Procedures	docs/compliance/DSAR_PROCEDURES.md	Template — establish intake channels
EU Operations Guide	docs/compliance/EU_OPERATIONS_GUIDE.md	Reference — review all sections
Security Policy / CVD	SECURITY.md	Template — set security contact email
EU Compliance Check	EU_COMPLIANCE_CHECK.md	Complete — review and track gaps

19. Troubleshooting

Quick Reference — Symptom → Diagnosis

Start with the first-line command below, then jump to the linked subsection for the full treatment.

Symptom	Most likely cause	First-line command	See
Pod stuck in `Pending`	Insufficient resources, GPU unavailable, unschedulable	`kubectl describe pod -n ai-stack <pod>`	§19.1
Ollama pod OOMKilled	Model larger than memory limit	`kubectl describe pod -n ai-stack -l app.kubernetes.io/component=ollama \| grep -A2 OOM`	§19.2
Open WebUI returns "model not found" / connection refused	Ollama pod not ready or DNS / NetworkPolicy blocking	`kubectl exec -n ai-stack deploy/ai-stack-openwebui -- wget -qO- http://ai-stack-ollama:11434/`	§19.3
Cross-component traffic fails with services present	NetworkPolicy default-deny allowlist too strict	`kubectl get networkpolicies -n ai-stack -o wide`	§19.4
PVC stuck in `Pending`	Missing StorageClass, no capacity, access-mode mismatch	`kubectl describe pvc -n ai-stack <pvc>`	§19.5
Secret missing after upgrade	Secrets are only generated on install	`kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack`	§19.6
`helm test` failing	Enabled service unreachable over TCP/HTTP	`helm test ai-stack -n ai-stack --logs`	§19.7
Ollama running on CPU despite GPU enabled	NVIDIA device plugin missing or resource not advertised	`kubectl describe node <node> \| grep nvidia`	§19.8

19.1 Pods Stuck in Pending

kubectl describe pod -n ai-stack <pod-name>

Common causes:

Insufficient resources: Increase node capacity or reduce resource requests
No matching node selector/tolerations: Check global.nodeSelector and global.tolerations
GPU requested but unavailable: Ensure the NVIDIA GPU Operator is installed and GPUs are free

19.2 Ollama Out of Memory

Ollama may OOM when loading large models. Solutions:

Increase memory limits:

ollama:
  resources:
    limits:
      memory: 64Gi  # Match model requirements

Use smaller quantized models: llama3.2:3b instead of llama3.2:70b
Reduce keep-alive time to unload idle models faster:

ollama:
  env:
    OLLAMA_KEEP_ALIVE: "1m"

19.3 Open WebUI Cannot Reach Ollama

Check Ollama is running: kubectl get pods -n ai-stack -l app.kubernetes.io/component=ollama
Check the service exists: kubectl get svc -n ai-stack -l app.kubernetes.io/component=ollama
Test DNS resolution from Open WebUI pod:

kubectl exec -n ai-stack deploy/ai-stack-openwebui -- \
  wget -qO- http://ai-stack-ollama:11434/

Check NetworkPolicy allows the connection:

kubectl describe networkpolicy -n ai-stack | grep -A 5 ollama

19.4 NetworkPolicy Blocking Traffic

Symptom: Components cannot communicate even though services exist.

Verify policies are correct:

kubectl get networkpolicies -n ai-stack -o wide

Temporarily disable to confirm it's a policy issue (lab only):

helm upgrade ai-stack . -n ai-stack --set global.networkPolicy.enabled=false

If traffic works with policies disabled, check the specific component's policy rules in templates/common/networkpolicies.yaml.

19.5 PVC Stuck in Pending

kubectl describe pvc -n ai-stack <pvc-name>

Common causes:

No StorageClass: Set global.storageClass to a valid class
Insufficient storage capacity: Check available storage in the cluster
Access mode mismatch: Ensure the StorageClass supports ReadWriteOnce

19.6 Secrets Not Generated

Secrets are only generated on helm install, not on helm upgrade. If secrets are missing:

# Check if secrets exist
kubectl get secrets -n ai-stack -l app.kubernetes.io/part-of=ai-stack

# If missing, they may have been accidentally deleted.
# Uninstall and reinstall (data in PVCs is preserved):
helm uninstall ai-stack -n ai-stack
helm install ai-stack . -n ai-stack

Important: PVCs with helm.sh/resource-policy: keep are not deleted on uninstall.

19.7 Helm Test Failures

# Run tests with verbose output
helm test ai-stack -n ai-stack --logs

# Check the test pod logs
kubectl logs -n ai-stack ai-stack-connection-test

Tests verify TCP and HTTP connectivity to all enabled services.

19.8 GPU Not Detected

# Check NVIDIA device plugin is running
kubectl get pods -n gpu-operator

# Check node GPU resources
kubectl describe node <node-name> | grep nvidia

# Check Ollama logs for GPU detection
kubectl logs -n ai-stack deploy/ai-stack-ollama | grep -i gpu

Ensure:

NVIDIA GPU Operator is installed and healthy
Node has nvidia.com/gpu resources advertised
The pod's resources.limits includes nvidia.com/gpu: 1

20. Uninstall

# Remove the Helm release (PVCs are retained)
helm uninstall ai-stack -n ai-stack

# To also delete PVCs and all data (irreversible):
kubectl delete pvc -n ai-stack -l app.kubernetes.io/part-of=ai-stack

# Delete the namespace
kubectl delete namespace ai-stack

Warning: Deleting PVCs destroys all stored models, documents, vector embeddings, and configuration. Back up first if needed.

FilesExpand file tree

HOWTO.md

Latest commit

History

HOWTO.md

File metadata and controls

ai-stack How-To Guide

Table of Contents

1. Installation

1.1 Lab Environment

1.2 Production Environment

1.3 Air-gapped / Offline Install

1.4 Air-gapped Install with Zarf

2. Day-1 Setup

2.1 Pull Your First Models

2.2 Access Open WebUI

2.3 Create Your Admin Account

2.4 Verify the Deployment

3. Working with Models

3.1 List Available Models

3.2 Pull Additional Models

3.3 Remove a Model

3.4 Set a Default Model

4. RAG (Retrieval-Augmented Generation)

4.1 Upload Documents via the UI

4.2 Configure the Embedding Model

4.3 Tune Chunking and Retrieval

4.4 Enable Web Search

5. Async Document Ingestion

5.1 Enable the Ingestion Worker

5.2 Enqueue Documents Programmatically

5.3 Monitor Ingestion Status

6. External LLM Providers

6.1 Add OpenAI

6.2 Add Azure OpenAI

6.3 Add Anthropic (Claude)

6.4 Use an External Secret Manager

7. GPU Acceleration

7.1 Enable GPU for Ollama

7.2 Enable the GPU Workbench

7.3 Verify GPU Access

8. Agentic Workloads (LangGraph)

8.1 Enable LangGraph with PostgreSQL

8.2 Deploy a Custom Graph

8.3 Test the LangGraph API

9. MCP Tool Integration (MCPO)

9.1 Enable MCPO

9.2 Configure MCP Servers

10. PostgreSQL Modes

10.1 Standalone (Lab)

10.2 CloudNativePG (Production HA)

10.3 External Managed Database

11. Ingress and TLS

11.1 Expose Open WebUI with NGINX

11.2 Expose Open WebUI with Envoy Gateway

11.3 Automated TLS with cert-manager

12. Authentication with Authelia (SSO / OIDC)

12.1 Enable Authelia

12.2 Create users

12.3 Enable MFA (two-factor)

12.4 Use PostgreSQL as storage backend

12.5 Expose Authelia via ingress

12.6 Verify OIDC integration

13. Networking and Security

13.1 Network Policies

13.2 Pod Security

13.3 Secret Management

13.4 Rotate Secrets

14. Observability

14.1 Enable OpenTelemetry

14.2 Enable Prometheus ServiceMonitors

14.3 PII Redaction

15. Scaling

15.1 Horizontal Pod Autoscaling

15.2 Manual Scaling

15.3 Resource Tuning

16. Upgrading

16.1 Upgrade the Chart

16.2 Upgrade Individual Component Images

16.3 Upgrade with Zero Downtime