Advanced Topics

Production deployment, Redis migration, troubleshooting, and optimization.

Redis Index Migration
Production Deployment
Security Best Practices
Performance Optimization
Troubleshooting

Redis Index Migration

Why Clear the Index?

Redis vector indexes have fixed dimensions. When you:

Switch embedding models (Gemma → Nomic)
Change Matryoshka dimensions (768 → 256)
Switch providers (Ollama → HuggingFace)

The old index will cause dimension mismatch errors.

Quick Migration

Using Make (Easiest):

make cache-clear

Using Redis CLI:

redis-cli FT.DROPINDEX semantic_cache DD

Using Python:

from semantic_cache.config import get_redis_client

client = get_redis_client()
client.execute_command("FT.DROPINDEX", "semantic_cache", "DD")

Complete Migration Workflow

Example: Switch from 768 → 256 dimensions

# 1. Stop the API
Ctrl+C

# 2. Update dependencies.py
# Change: output_dimension=768 → output_dimension=256

# 3. Clear Redis index
make cache-clear

# 4. Restart API
make dev

# 5. Verify logs
# Look for: "Created new index: semantic_cache"

What Gets Deleted?

Command	Index	Data
`FT.DROPINDEX semantic_cache`	✅ Deleted	❌ Kept (unsearchable)
`FT.DROPINDEX semantic_cache DD`	✅ Deleted	✅ Deleted

make cache-clear uses the DD flag - deletes everything.

Check Current Index

redis-cli FT.INFO semantic_cache

Look for the VECTOR field to see current dimensions.

Production Deployment

Docker

docker-compose.yml:

version: '3.8'

services:
  # Ollama (if using)
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

  # Semantic Cache API
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - REDIS_URL=redis://redis:6379
      - EMBEDDING_MODEL=embeddinggemma
      # For HuggingFace (if using):
      # - HF_TOKEN=${HF_TOKEN}
      # - EMBEDDING_OUTPUT_DIMENSION=768
    depends_on:
      - redis
      - ollama  # Remove if using HuggingFace
    restart: unless-stopped

  # Redis Stack
  redis:
    image: redis/redis-stack:latest
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  ollama_data:
  redis_data:

Pull Ollama model:

docker-compose exec ollama ollama pull embeddinggemma

Start services:

docker-compose up -d

Kubernetes

ollama-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: models
          mountPath: /root/.ollama
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: ollama-models

---
apiVersion: v1
kind: Service
metadata:
  name: ollama
spec:
  ports:
  - port: 11434
  selector:
    app: ollama

api-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: semantic-cache
spec:
  replicas: 3
  selector:
    matchLabels:
      app: semantic-cache
  template:
    metadata:
      labels:
        app: semantic-cache
    spec:
      containers:
      - name: api
        image: your-registry/semantic-cache:latest
        ports:
        - containerPort: 8000
        env:
        - name: REDIS_URL
          value: "redis://redis:6379"
        - name: EMBEDDING_MODEL
          value: "embeddinggemma"
        # For HuggingFace:
        # - name: HF_TOKEN
        #   valueFrom:
        #     secretKeyRef:
        #       name: huggingface-token
        #       key: HF_TOKEN
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
  name: semantic-cache
spec:
  type: LoadBalancer
  ports:
  - port: 8000
  selector:
    app: semantic-cache

secret.yaml (for HuggingFace):

apiVersion: v1
kind: Secret
metadata:
  name: huggingface-token
type: Opaque
stringData:
  HF_TOKEN: hf_your_token_here

Systemd Service

/etc/systemd/system/semantic-cache.service:

[Unit]
Description=Semantic Cache Service
After=network.target redis.service

[Service]
Type=simple
User=semantic-cache
WorkingDirectory=/opt/semantic-cache
Environment="REDIS_URL=redis://localhost:6379"
Environment="EMBEDDING_MODEL=embeddinggemma"
# For HuggingFace:
# Environment="HF_TOKEN=hf_your_token_here"
# Environment="EMBEDDING_OUTPUT_DIMENSION=768"
ExecStart=/opt/semantic-cache/.venv/bin/uvicorn semantic_cache.api.app:app --host 0.0.0.0 --port 8000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Start service:

sudo systemctl enable semantic-cache
sudo systemctl start semantic-cache
sudo systemctl status semantic-cache

Security Best Practices

Environment Variables

DO ✅:

Store secrets in .env (add to .gitignore)
Use read-only HuggingFace tokens
Use secret managers in production (AWS Secrets Manager, Vault)
Rotate tokens periodically (every 6-12 months)
Use separate tokens per environment (dev/staging/prod)

DON'T ❌:

Commit tokens to git
Share tokens in chat/email
Use personal tokens for production
Hardcode tokens in code
Give write access unless needed

HuggingFace Token Security

Check if token is exposed:

git log -p | grep -i "hf_"

If exposed:

Go to https://huggingface.co/settings/tokens
Delete the token immediately
Create new token
Update .env
Restart services

Redis Security

Enable authentication:

# docker-compose.yml
redis:
  image: redis/redis-stack:latest
  command: redis-server --requirepass your_secure_password
  environment:
    - REDIS_PASSWORD=your_secure_password

Update connection:

# .env
REDIS_URL=redis://:your_secure_password@localhost:6379

Network Security

Production checklist:

Use HTTPS/TLS for API
Firewall Redis (only internal access)
Rate limiting on API endpoints
API authentication (JWT, API keys)
Monitor for unusual traffic patterns

Performance Optimization

Caching Strategy

Threshold Tuning:

# Lower = stricter matching (fewer hits, higher precision)
CACHE_DISTANCE_THRESHOLD=0.10

# Medium = balanced (recommended)
CACHE_DISTANCE_THRESHOLD=0.15

# Higher = looser matching (more hits, risk false positives)
CACHE_DISTANCE_THRESHOLD=0.25

Monitor hit rate:

curl http://localhost:8000/cache/stats

Target: 60-70% hit rate for optimal cost savings.

TTL Configuration

# Short TTL for fast-changing data
CACHE_TTL=3600  # 1 hour

# Medium TTL for typical use
CACHE_TTL=604800  # 7 days (default)

# Long TTL for stable data
CACHE_TTL=2592000  # 30 days

Redis Optimization

redis.conf settings:

# Memory limit
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence (for durability)
save 900 1
save 300 10
save 60 10000

# Snapshotting
rdbcompression yes
rdbchecksum yes

Dimension Selection

Storage vs Quality:

768 dims: Best quality, 2x storage
512 dims: 95% quality, 1.3x storage
256 dims: 90% quality, 0.67x storage
128 dims: 85% quality, 0.33x storage

Recommendation:

Small scale (<10K entries): Use 768
Medium scale (10K-100K): Use 512
Large scale (>100K): Use 256

Batch Processing

# Process multiple queries in one embedding call
embeddings = provider.encode_batch([
    "Query 1",
    "Query 2",
    "Query 3"
])

Benefits: ~50% faster than individual calls.

Troubleshooting

Dimension Mismatch Error

Error:

redis.exceptions.ResponseError: Dimension mismatch: expected 384, got 768

Solution:

make cache-clear
make dev

Ollama Connection Refused

Error:

requests.exceptions.ConnectionError: Connection refused

Solution:

# Check if Ollama is running
curl http://localhost:11434/api/version

# If not, start it
ollama serve

HuggingFace 401 Error

Error:

HTTPError: 401 Client Error: Unauthorized

Solution:

# Authenticate
huggingface-cli login

# Accept model license
# Visit: https://huggingface.co/google/embeddinggemma-300m
# Click: "Agree and access repository"

Redis Connection Error

Error:

redis.exceptions.ConnectionError: Error connecting to Redis

Solution:

# Check Redis is running
docker compose ps

# Start Redis if needed
docker compose up -d redis

# Check Redis health
redis-cli ping
# Should return: PONG

Slow Inference

Problem: Embeddings take >200ms

Solutions:

Check model loading: First request always slower (model loading)
Use batch processing: Process multiple queries together
Switch to lighter model: Try all-minilm (384 dims, faster)
Check CPU load: Embeddings are CPU-intensive
Consider GPU: For production scale

Memory Issues

Problem: High memory usage

Solutions:

Use smaller dimensions: 256 or 128 instead of 768
Limit Redis memory: Set maxmemory in redis.conf
Enable eviction: Use allkeys-lru policy
Monitor with: redis-cli info memory

Index Creation Fails

Error:

redis.exceptions.ResponseError: Index already exists

Solution:

# Force drop and recreate
redis-cli FT.DROPINDEX semantic_cache DD
make dev

Monitoring

Health Checks

# API health
curl http://localhost:8000/health

# Redis health
redis-cli ping

# Ollama health (if using)
curl http://localhost:11434/api/version

Metrics to Track

Cache Performance:

Hit rate (target: 60-70%)
Average lookup time (target: <5ms)
Total entries
Storage size

API Performance:

Request latency (p50, p95, p99)
Error rate
Requests per second

Redis Metrics:

redis-cli info stats
redis-cli info memory
redis-cli FT.INFO semantic_cache

Logging

Application logs:

# View logs
docker-compose logs -f api

# Or for systemd
journalctl -u semantic-cache -f

Redis logs:

docker-compose logs -f redis

Backup and Recovery

Backup Redis Data

# Manual backup
redis-cli BGSAVE

# Copy RDB file
cp /var/lib/redis/dump.rdb /backup/dump-$(date +%Y%m%d).rdb

Restore Redis Data

# Stop Redis
docker-compose stop redis

# Replace RDB file
cp /backup/dump-20260205.rdb /var/lib/redis/dump.rdb

# Start Redis
docker-compose start redis

Automated Backups

Cron job:

# /etc/cron.daily/redis-backup
#!/bin/bash
redis-cli BGSAVE
sleep 10
cp /var/lib/redis/dump.rdb /backup/dump-$(date +%Y%m%d).rdb
find /backup -name "dump-*.rdb" -mtime +7 -delete

Multi-Environment Setup

Use different index names per environment:

.env.dev:

CACHE_INDEX_NAME=semantic_cache_dev

.env.staging:

CACHE_INDEX_NAME=semantic_cache_staging

.env.prod:

CACHE_INDEX_NAME=semantic_cache_prod

This allows testing different models/configs without affecting production.

Resources

Next Steps

✅ Understand production considerations
Deploy to your environment
Set up monitoring
Configure backups
Tune performance based on metrics

FilesExpand file tree

ADVANCED.md

Latest commit

History

ADVANCED.md

File metadata and controls

Advanced Topics

Table of Contents

Redis Index Migration

Why Clear the Index?

Quick Migration

Complete Migration Workflow

What Gets Deleted?

Check Current Index

Production Deployment

Docker

Kubernetes

Systemd Service

Security Best Practices

Environment Variables

HuggingFace Token Security

Redis Security

Network Security

Performance Optimization

Caching Strategy

TTL Configuration

Redis Optimization

Dimension Selection

Batch Processing

Troubleshooting

Dimension Mismatch Error

Ollama Connection Refused

HuggingFace 401 Error

Redis Connection Error

Slow Inference

Memory Issues

Index Creation Fails

Monitoring

Health Checks

Metrics to Track

Logging

Backup and Recovery

Backup Redis Data

Restore Redis Data

Automated Backups

Multi-Environment Setup

Resources

Next Steps