CONTINUUM

Production Data Flywheel & Self-Healing LLM Platform

A production-grade MLOps platform that captures quality signal from live LLM traffic,
scores every response with a real AI judge, detects statistical drift, and orchestrates
the full self-healing loop — built as a cloud-native event-driven microservices system.

The Problem

Every LLM in production silently degrades. Prompts shift, user behaviour changes, model capability drifts. Most teams find out weeks later — through user complaints or a drop in business metrics. By then the damage is done.

Continuum solves this. It instruments your production traffic, scores every LLM response in real-time using a multi-signal quality evaluator (ROUGE/BLEU + semantic similarity + a real Claude LLM-as-judge), applies Statistical Process Control to detect drift the moment it becomes statistically significant, and maintains the full infrastructure to orchestrate surgical LoRA fine-tuning on exactly the failing capability cluster — without human intervention.

Live Platform

The platform was deployed end-to-end on AWS ap-south-1 and verified with real production traffic. 30 LLM traces were seeded, scored by Claude, and queried through the live API and dashboard.

Overview — AVG Quality 74.9% across 23 Claude-scored traces

Quality Panel — Per-trace LLM-judge evaluations (4 dimensions each)

Real Claude judge correctly differentiating: good answers score ~0.89 (coherence 0.95, relevance 0.98), degraded answers score ~0.40 (informativeness 0.10, relevance 0.15)

Cloud Infrastructure

ECS Fargate — All 11 microservices running on continuum-dev-cluster

Fargate Tasks — running containers

ECR — 11 Docker image repositories

Application Load Balancer

Routes /api/* → api-gateway, /dashboard/* → React UI

RDS PostgreSQL — quality scores persisted

EC2 — self-managed Kafka (KRaft)

10 topics, 30 traces flowed end-to-end

CloudWatch — live quality-scorer logs

DB init → embedding model → Kafka connected → scoring

What Was Proven

Capability	Evidence
11-service event-driven pipeline on ECS Fargate	All services Active on `continuum-dev-cluster`
Kafka (KRaft) with 10 topics	30 traces: `production-traces → quality-scoring-queue → scored-traces-queue`
Real LLM-as-judge via Claude	23 traces scored — good ~0.89, degraded ~0.40, 4 dimensions each
ROUGE/BLEU + sentence-transformer semantic scoring	Fully local, no API needed
SPC drift detector, poison guard, decision engine	All Kafka-connected, consuming, DB-persisted
PostgreSQL persistence (RDS)	Schema created, quality scores written and queryable
Authenticated REST API gateway (FastAPI + ALB)	All 7 endpoints live with Bearer auth
React dashboard with real data	74.9% avg quality trend, real evaluations table
166 Prometheus metric series	Exposed at `/metrics`
Full Terraform IaC	One `terraform apply` provisions the entire cloud environment
e2e test: 10/10 PASS	`python scripts/cloud_e2e_test.py`

Limitations

Statistical window. The SPC drift detector requires a minimum of 100 scored traces in the rolling window before Western Electric rules activate. The current deployment was validated with 30 traces — sufficient to verify the ingestion and scoring pipeline end-to-end, but the drift signal and downstream decision engine were not exercised. Closing the loop requires sustained production-like traffic.

GPU dependency. The fine-tuning orchestrator (LoRA/QLoRA), vLLM serving layer, and DARE/TIES adapter merging all require GPU compute. ECS Fargate is CPU-only; these services run as stubs in the current cloud deployment. A GPU instance (g5.xlarge or equivalent) or SageMaker endpoint is required to exercise the full self-healing loop.

Single-node Kafka. The broker runs as a single KRaft node — no replication, no ISR guarantees. Suitable for development and demonstration; not appropriate for production data durability requirements. A 3-broker KRaft cluster or AWS MSK would be the upgrade path.

Model registry is a stub. The model-registry service (MLflow integration) is scaffolded but not yet wired into the fine-tuning orchestrator. Adapter lineage — base model, training data hash, benchmark results, merge configuration — is not persisted across runs.

HTTP only. The ALB serves HTTP in the dev environment. A production deployment requires TLS termination via ACM, a Route53 record, and HTTPS listeners. The current network topology uses public subnets with security-group-only isolation — cost-optimised for dev sessions, not a production security posture.

Roadmap

Close the autonomous loop

The core engineering work is complete. What remains is operational: provision a GPU instance, seed ≥150 traces with a deliberate quality degradation pattern (e.g., truncated answers after trace 80), observe the SPC window fire a CRITICAL drift signal, trigger a LoRA fine-tuning job on the failing capability cluster, merge the adapter with DARE/TIES, and canary-deploy at 5% → 50% → 100% with automatic rollback if the quality regression re-appears. This is the intended demo path once GPU compute is available.

Production-grade Kafka

Migrate to a 3-broker KRaft cluster with replication.factor=3 and min.insync.replicas=2. Add schema registry (Confluent or AWS Glue) for contract enforcement at the broker level rather than at the consumer. Consider MSK Serverless for operational simplicity at the cost of ~$150/month.

MLflow model registry

Wire the finetuning-orchestrator to register every adapter checkpoint with full lineage: base model ID, training data hash, capability benchmark results pre/post merge, and the DARE/TIES merge configuration. Expose registered models through the model-registry service for the A/B deployer to query.

vLLM serving at scale

Deploy vLLM on a GPU auto-scaling group (Karpenter on EKS or EC2 Auto Scaling) with a draft model for speculative decoding. Add model sharding for 7B+ parameter models. Expose a OpenAI-compatible /v1/chat/completions endpoint so any existing LLM client can route traffic through Continuum without code changes.

Real-time observability

Replace the dashboard's 60-second polling interval with Server-Sent Events or WebSocket streams from the API gateway. Expose a /v1/stream/quality endpoint that pushes scored trace events as they arrive. Add quality score percentile bands (P50/P95/P99) to the drift detection chart.

Alerting integration

Route CRITICAL drift signals to PagerDuty and Slack via SNS. Implement SLO burn rate alerts in CloudWatch: if the 1-hour error budget is consumed at a rate that would exhaust the 30-day budget within 6 hours, page on-call. Add a dead-letter-queue depth alarm to catch silent pipeline failures.

Kubernetes migration (EKS)

Move from ECS Fargate to EKS for horizontal pod autoscaling, GPU node pools managed by Karpenter, and better resource bin-packing across services. The existing Kubernetes manifests in infrastructure/k8s/ provide the starting point.

Multi-tenant isolation

Add per-model-id Kafka consumer group routing so multiple models can share the same pipeline without cross-contamination. Implement RBAC on the API gateway: separate API keys per deployment, admin-only access to trigger endpoints, read-only keys for dashboard consumers.

Architecture

PRODUCTION TRAFFIC (LLM responses)
             │
             ▼
┌──────────────────────────────────────┐
│  INGESTION                           │
│  Kafka consumer · Pydantic validation│
│  Data lineage · Provenance tagging   │
│  ──────────────────────────────────  │
│  → quality-scoring-queue             │
└──────────────────┬───────────────────┘
                   │
          ┌────────▼────────┐
          │ QUALITY SCORER  │
          │                 │
          │ 1. ROUGE/BLEU   │  ← lexical (local)
          │ 2. Semantic sim │  ← sentence-transformers
          │ 3. LLM-as-judge │  ← Claude: 4 dimensions
          │                 │
          │ composite = 0.6·judge + 0.2·lexical + 0.2·semantic
          │ → scored-traces-queue + PostgreSQL
          └────────┬────────┘
                   │
      ┌────────────┼────────────┐
      ▼            ▼            ▼
┌───────────┐ ┌───────────┐ ┌──────────────────────┐
│  DRIFT    │ │  POISON   │ │  FINETUNING          │
│  DETECTOR │ │  GUARD    │ │  ORCHESTRATOR        │
│           │ │           │ │  (GPU-gated)         │
│  SPC      │ │ Provenance│ │                      │
│  Western  │ │ Self-gen  │ │  LoRA / QLoRA        │
│  Electric │ │ filter    │ │  DARE / TIES merge   │
│  DoWhy    │ │ Dist. div │ │  Capability guard    │
│           │ │           │ │                      │
│ → drift-  │ │ → approved│ │  → model-registry    │
│   signals │ │   traces  │ │  → ab-deployer       │
└─────┬─────┘ └───────────┘ └──────────────────────┘
      │
      ▼
┌──────────────────┐     ┌─────────────────────────────┐
│  DECISION ENGINE │     │  SERVING  (GPU-gated)       │
│                  │     │  vLLM → ab-deployer          │
│  P95 > 2σ drop?  │     │  Canary 5%→50%→100%         │
│  Cooldown check  │     │  Auto-rollback on regression │
│  → finetuning-   │     └─────────────────────────────┘
│    triggers      │
└──────────────────┘

OBSERVABILITY:   Prometheus (166 series) · CloudWatch · OpenTelemetry · Structured logs
API GATEWAY:     FastAPI 0.111 + ALB · rate limiting · Bearer auth
DASHBOARD:       React 18 + Recharts + Zustand + TanStack Query + Tailwind
INFRASTRUCTURE:  ECS Fargate · EC2 Kafka (KRaft) · RDS PostgreSQL · ElastiCache Redis
                 Secrets Manager · ECR · ALB · Terraform IaC

Tech Stack

Layer	Technology	Why
Event streaming	Apache Kafka (KRaft)	Industry-standard, exactly-once semantics
Quality scoring	PyTorch + sentence-transformers + Claude	Multi-signal: lexical + semantic + LLM judge
Drift detection	SciPy SPC + Western Electric rules	Low false-positive rate vs threshold alerts
Causal attribution	DoWhy	Most mature Python causal inference library
Fine-tuning	HuggingFace PEFT + LoRA/QLoRA	Most mature LoRA ecosystem
Adapter merging	mergekit (DARE/TIES)	Only production-grade merger preventing capability regression
Inference	vLLM	Best throughput, continuous batching, speculative decoding
Pipeline orchestration	Celery + Redis	Proven, async task execution
API	FastAPI 0.111 + Pydantic v2	Async-native, type-safe, OpenAPI
Dashboard	React 18 + TypeScript + Recharts + Zustand	Real-time, typed, reactive
Containers	AWS ECS Fargate	Serverless, pay-per-second
Kafka hosting	AWS EC2 (self-managed KRaft)	~$150/month cheaper than MSK
Database	AWS RDS PostgreSQL	Managed, stoppable between sessions
Cache / queues	AWS ElastiCache Redis	Celery broker, response caching
IaC	Terraform (modular)	Reproducible, version-controlled infrastructure
CI/CD	GitHub Actions	lint → type-check → test → build → push → deploy
Observability	Prometheus + OpenTelemetry + CloudWatch	166 metric series, traces, structured logs

Repository Structure

continuum/
├── services/
│   ├── ingestion/                # Kafka consumer, Pydantic validation, lineage tagging
│   ├── quality_scorer/           # ROUGE + semantic + Claude LLM-as-judge
│   ├── drift_detector/           # SPC, Western Electric rules, DoWhy causal attribution
│   ├── poison_guard/             # Provenance tagging, self-generated data filter
│   ├── decision_engine/          # SPC threshold policy, cooldown, trigger management
│   ├── finetuning_orchestrator/  # LoRA/QLoRA pipeline, capability guard (GPU-gated)
│   ├── model-registry/           # MLflow wrapper + lineage
│   ├── ab_deployer/              # Canary 5→50→100%, auto-rollback on regression
│   ├── serving/                  # vLLM proxy (GPU-gated)
│   └── api_gateway/              # FastAPI unified API, rate limiting, Bearer auth
├── dashboard/                    # React 18 TypeScript frontend
├── shared/                       # Pydantic contracts, Kafka producer/consumer, DB session
├── infrastructure/
│   ├── terraform/
│   │   ├── modules/              # alb, ecs, ecr, networking, rds, redis, kafka-ec2, secrets
│   │   └── environments/dev/
│   └── k8s/                      # Kubernetes manifests (EKS alternative)
├── tests/
│   ├── unit/
│   ├── integration/              # Full Docker Compose stack tests
│   └── e2e/
├── scripts/
│   ├── local-setup.py            # Initialize Kafka topics + DB schema
│   └── cloud_e2e_test.py         # End-to-end test against live ALB (10/10 checks)
├── docs/
│   ├── architecture.md           # Full system technical deep-dive
│   ├── deployment/               # Cloud deployment guide
│   └── operations/               # Runbooks, smoke tests
└── docker-compose.yml            # Full local dev environment

Quick Start — Local

Prerequisites

docker --version    # Docker Desktop 4.x+
python --version    # Python 3.11+
node --version      # Node.js 20+

Start the stack

git clone https://github.com/HarshTomar1234/continuum.git
cd continuum

# Start Kafka, PostgreSQL, Redis, MLflow
docker compose up -d

# Initialize Kafka topics + DB schema
python scripts/local-setup.py

Run a service

cd services/quality_scorer
pip install -e ".[dev]"

export ANTHROPIC_API_KEY=sk-ant-...   # Required for LLM judge only
export ENVIRONMENT=local

python -m services.quality_scorer.src.main

Send a trace into the pipeline

python - << 'EOF'
import json
from confluent_kafka import Producer

trace = {
    "schema_version": "1.0",
    "trace_id": "test-001",
    "timestamp_utc": "2026-06-21T10:00:00Z",
    "model_id": "gpt2-ft-v3",
    "model_version": "v3.1",
    "deployment_id": "prod-slot-A",
    "request": {
        "prompt": "Explain gradient descent in machine learning.",
        "system_prompt": None,
        "temperature": 0.7,
        "max_tokens": 512
    },
    "response": {
        "completion": "Gradient descent iteratively minimizes a loss function by moving parameters in the direction of steepest descent, scaled by a learning rate, until convergence.",
        "finish_reason": "stop",
        "input_token_count": 8,
        "output_token_count": 30,
        "latency_ms": 312
    },
    "metadata": {
        "user_id": "u1",
        "session_id": "sess-001",
        "source_system": "chatbot",
        "region": "ap-south-1",
        "tags": {}
    }
}

p = Producer({"bootstrap.servers": "localhost:9094"})
p.produce("production-traces", json.dumps(trace).encode())
p.flush()
print("Trace produced.")
EOF

Run the tests

pytest tests/unit/ -v
pytest tests/integration/ -v          # requires: docker compose up -d
python scripts/cloud_e2e_test.py      # end-to-end against local or live ALB

Open the dashboard

cd dashboard && npm install && npm run dev
# → http://localhost:3000

Cloud Deployment (AWS)

Cost model: designed for start/stop sessions. ~$5–15 per active session. ~$2–5/month at rest (S3 state + ECR only).

1. Bootstrap AWS prerequisites

aws configure   # Region: ap-south-1

# S3 bucket for Terraform state
aws s3 mb s3://continuum-terraform-state-YOURACCOUNTID --region ap-south-1

# DynamoDB table for state locking
aws dynamodb create-table \
  --table-name continuum-terraform-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region ap-south-1

# EC2 key pair for Kafka SSH access
aws ec2 create-key-pair \
  --key-name continuum-dev-key \
  --region ap-south-1 \
  --query 'KeyMaterial' --output text > ~/.ssh/continuum-dev-key.pem
chmod 400 ~/.ssh/continuum-dev-key.pem

2. Configure and apply Terraform

cd infrastructure/terraform/environments/dev

cat > terraform.tfvars << 'EOF'
aws_region        = "ap-south-1"
project           = "continuum"
environment       = "dev"
vpc_cidr          = "10.0.0.0/16"
ecs_desired_count = 1
db_password       = "YourSecurePassword123!"
anthropic_api_key = "sk-ant-..."
kafka_key_name    = "continuum-dev-key"
alert_email       = "you@email.com"
domain_name       = ""
EOF

terraform init
terraform plan
terraform apply   # ~10–15 min first time

After apply:

platform_url    = "http://continuum-dev-alb-{id}.ap-south-1.elb.amazonaws.com"
kafka_public_ip = "x.x.x.x"
ecs_cluster     = "continuum-dev-cluster"

3. Build and push all service images

AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
ECR="${AWS_ACCOUNT_ID}.dkr.ecr.ap-south-1.amazonaws.com"

aws ecr get-login-password --region ap-south-1 | \
  docker login --username AWS --password-stdin ${ECR}

for svc in api_gateway quality_scorer drift_detector poison_guard decision_engine \
           finetuning_orchestrator ab_deployer serving ingestion; do
  ecr_name=$(echo $svc | tr '_' '-')
  docker build -f services/${svc}/Dockerfile -t continuum-${ecr_name}:latest .
  docker tag  continuum-${ecr_name}:latest ${ECR}/continuum-dev-${ecr_name}:latest
  docker push ${ECR}/continuum-dev-${ecr_name}:latest
done

# Dashboard
docker build -f dashboard/Dockerfile -t continuum-dashboard:latest .
docker tag  continuum-dashboard:latest ${ECR}/continuum-dev-dashboard:latest
docker push ${ECR}/continuum-dev-dashboard:latest

4. Force redeploy all services

CLUSTER="continuum-dev-cluster"
for svc in api-gateway quality-scorer drift-detector poison-guard decision-engine \
           finetuning-orchestrator ab-deployer serving ingestion dashboard; do
  aws ecs update-service \
    --cluster ${CLUSTER} \
    --service continuum-dev-${svc} \
    --force-new-deployment \
    --region ap-south-1 \
    --query 'service.serviceName' --output text
done

5. Verify deployment

pip install httpx
python scripts/cloud_e2e_test.py

Expected:

[PASS] ALB routes to api-gateway          HTTP 200
[PASS] health payload ok
[PASS] unauthenticated request rejected   HTTP 401
[PASS] quality endpoint authorized        HTTP 200
[PASS] scored traces present              count=23
[PASS] quality differentiation
[PASS] drift endpoint authorized          HTTP 200
[PASS] triggers endpoint authorized       HTTP 200
[PASS] deployments endpoint authorized    HTTP 200
[PASS] metrics endpoint exposed           HTTP 200 (166 series)

Passed: 10   Failed: 0

6. Seed traces into the live pipeline

KAFKA_IP=$(terraform output -raw kafka_public_ip)
KEY=~/.ssh/continuum-dev-key.pem

scp -i $KEY seed_traces.jsonl ec2-user@${KAFKA_IP}:/tmp/
ssh -i $KEY ec2-user@${KAFKA_IP} \
  "cat /tmp/seed_traces.jsonl | sudo /opt/kafka/bin/kafka-console-producer.sh \
   --bootstrap-server localhost:9092 --topic production-traces"

# Monitor pipeline flow
ssh -i $KEY ec2-user@${KAFKA_IP} "
for t in production-traces quality-scoring-queue scored-traces-queue drift-signals; do
  T=\$(sudo /opt/kafka/bin/kafka-get-offsets.sh --bootstrap-server localhost:9092 \
       --topic \$t 2>/dev/null | awk -F: '{s+=\$3} END {print s}')
  printf '%-28s %s messages\n' \"\$t\" \"\${T:-0}\"
done"

7. Stop all resources between sessions

# Scale ECS to 0 (stops Fargate billing)
terraform apply -var="ecs_desired_count=0" -auto-approve

# Stop RDS
aws rds stop-db-instance \
  --db-instance-identifier continuum-dev --region ap-south-1

# Stop Kafka EC2
aws ec2 stop-instances \
  --instance-ids $(aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=continuum-dev-kafka" \
    --query 'Reservations[0].Instances[0].InstanceId' --output text) \
  --region ap-south-1

# Full teardown
terraform destroy -auto-approve

API Reference

Base URL: http://{alb-dns}/ Auth: Authorization: Bearer {key} (default: dev-key)

Method	Endpoint	Description
`GET`	`/v1/health`	Gateway liveness — no auth required
`GET`	`/v1/quality/metrics?hours=24&limit=100`	Quality scores from PostgreSQL
`GET`	`/v1/drift/status?hours=24&limit=100`	Drift events
`GET`	`/v1/triggers?limit=50`	Fine-tuning triggers
`GET`	`/v1/deployments`	Active model deployments
`GET`	`/v1/deployments/{model_id}`	Single deployment state
`POST`	`/v1/infer/{model_id}`	Inference via vLLM (GPU-gated)
`POST`	`/v1/admin/trigger`	Manual fine-tuning trigger (admin key)
`GET`	`/metrics`	Prometheus metrics (166 series)

Example — query real Claude judge scores:

curl -s -H "Authorization: Bearer dev-key" \
  "http://{alb-dns}/v1/quality/metrics?hours=24&limit=3" | python -m json.tool

{
  "metrics": [
    {
      "trace_id": "showcase-0024",
      "model_id": "gpt2-ft-v3",
      "composite_score": 0.8903,
      "coherence": 0.95,
      "relevance": 0.98,
      "informativeness": 0.85,
      "conciseness": 0.92,
      "scored_at_utc": "2026-06-20T15:31:56Z"
    },
    {
      "trace_id": "showcase-0029",
      "model_id": "gpt2-ft-v3",
      "composite_score": 0.4,
      "coherence": 0.7,
      "relevance": 0.4,
      "informativeness": 0.1,
      "conciseness": 0.3,
      "scored_at_utc": "2026-06-20T15:32:00Z"
    }
  ],
  "count": 23,
  "window_hours": 24
}

Kafka Topics

Topic	Producer	Consumers
`production-traces`	Your serving layer	ingestion
`quality-scoring-queue`	ingestion	quality-scorer
`scored-traces-queue`	quality-scorer	drift-detector, poison-guard
`approved-traces-queue`	poison-guard	finetuning-orchestrator
`drift-signals`	drift-detector	decision-engine
`finetuning-triggers`	decision-engine	finetuning-orchestrator
`deployment-events`	ab-deployer	monitoring
`dead-letter-queue`	all services on error	ops review

Environment Variables

Variable	Default	Description
`ENVIRONMENT`	`local`	`local` / `dev` / `prod`
`KAFKA_BOOTSTRAP_SERVERS`	`localhost:9094`	Broker address
`POSTGRES_URL`	`postgresql+asyncpg://...localhost...`	Full async DB URL
`DB_PASSWORD`	`continuum`	Injected from Secrets Manager in cloud
`REDIS_URL`	`redis://localhost:6379/0`	Redis connection URL
`ANTHROPIC_API_KEY`	—	Required for quality-scorer LLM judge
`LLM_JUDGE_MODEL`	`claude-haiku-4-5-20251001`	Claude model for scoring
`COMPOSITE_SCORE_HIGH_THRESHOLD`	`0.7`	High quality tier threshold
`COMPOSITE_SCORE_LOW_THRESHOLD`	`0.5`	Low quality tier threshold
`SPC_WINDOW_SIZE`	`100`	Samples before SPC rules activate
`SPC_K_SIGMA`	`3.0`	Control limit multiplier

CI/CD

GitHub Actions runs on every push:

push → ruff lint → mypy type-check → pytest unit → docker build
     → [on PR to develop] ECR push → ECS force-redeploy

# Run locally before pushing
ruff check services/ shared/
mypy services/ shared/ --ignore-missing-imports
pytest tests/unit/ -v --cov=services --cov-report=term-missing

Hard Technical Challenges

Challenge	Why it's hard	Approach
Causal drift attribution	Detecting drift is easy. Knowing why requires causal graphs over feature-outcome pairs	DoWhy causal inference over production trace pairs
Feedback loop poison detection	Self-generated training data causes silent model collapse	Distribution divergence + provenance tagging on every trace
Surgical LoRA fine-tuning	Targeting only failing capabilities requires knowing which ones failed	Capability cluster mapping via benchmarks → selective adapter training
Safe adapter merging	Merging adapters without destroying base capabilities is non-trivial	DARE + TIES merging with capability regression guard
SPC-based quality triggering	Threshold alerts produce false positives	Statistical Process Control + Western Electric rules
KRaft single-node offsets	`offsets.topic.replication.factor` defaults to 3 — silently breaks all consumer groups	Set RF=1 for internal topics in `server.properties`
Cross-service contracts	Message schemas defined in one service, consumed by others	Consuming images bundle required contract modules

Key Decisions

Decision	Rationale
ECS Fargate over EKS	No $72/month control plane; pay-per-second fits start/stop model
Self-managed Kafka over MSK	~$150/month cheaper for dev/demo
FastAPI 0.111 (pinned)	0.138+ introduces `_IncludedRouter` breaking both prometheus and OTel middleware
SPC over threshold alerts	Western Electric rules reduce false-positive alert fatigue
mergekit for adapter merging	Only production-grade DARE/TIES implementation
vLLM over TGI	Better throughput, speculative decoding, active community

License

Harsh Tomar

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
dashboard		dashboard
docs		docs
images		images
infrastructure		infrastructure
scripts		scripts
services		services
shared		shared
tests		tests
.coveragerc		.coveragerc
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
seed_traces.jsonl		seed_traces.jsonl

Folders and files

Latest commit

History

Repository files navigation

CONTINUUM

Production Data Flywheel & Self-Healing LLM Platform

The Problem

Live Platform

Overview — AVG Quality 74.9% across 23 Claude-scored traces

Quality Panel — Per-trace LLM-judge evaluations (4 dimensions each)

Cloud Infrastructure

What Was Proven

Limitations

Roadmap

Close the autonomous loop

Production-grade Kafka

MLflow model registry

vLLM serving at scale

Real-time observability

Alerting integration

Kubernetes migration (EKS)

Multi-tenant isolation

Architecture

Tech Stack

Repository Structure

Quick Start — Local

Prerequisites

Start the stack

Run a service

Send a trace into the pipeline

Run the tests

Open the dashboard

Cloud Deployment (AWS)

1. Bootstrap AWS prerequisites

2. Configure and apply Terraform

3. Build and push all service images

4. Force redeploy all services

5. Verify deployment

6. Seed traces into the live pipeline

7. Stop all resources between sessions

API Reference

Kafka Topics

Environment Variables

CI/CD

Hard Technical Challenges

Key Decisions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages