Self-hosted AI coding agent infrastructure running entirely on consumer hardware. Demonstrates that sophisticated AI systems—RAG, test-time compute scaling, and continuous learning—can run on a single 16GB consumer GPU.
- 99.5% Success Rate — Ralph Loop retry algorithm with temperature escalation
- Full RAG Pipeline — 100GB vector storage, semantic code search
- Continuous Learning — Nightly LoRA fine-tuning from successful completions
- Consumer Hardware — Single RTX 5060 Ti (16GB VRAM)
Host: 4 vCPU (AMD RYZEN 5 2600) • 12GB DDR4 RAM • 150GB SSD • RHEL 9
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2196F3', 'primaryTextColor': '#212121', 'primaryBorderColor': '#1565C0', 'lineColor': '#455A64', 'secondaryColor': '#E3F2FD', 'tertiaryColor': '#ECEFF1', 'edgeLabelBackground': '#ECEFF1'}}}%%
flowchart TB
subgraph external[" "]
client(["Client<br/>OpenCode / API"])
end
subgraph gateway["Gateway"]
proxy["LLM Proxy :8000<br/>Auth • Rate Limit"]
portal["API Portal :3000<br/>Users • Keys"]
end
subgraph core["Core Services"]
rag["RAG API :8001<br/>Orchestration"]
embed["Embeddings :8080<br/>MiniLM-L6-v2"]
end
%% Central inference engine - outside subgraphs for central positioning
llama["llama-server :8000<br/>Qwen3-14B • GPU"]
subgraph data["Storage"]
qdrant[("Qdrant<br/>100GB Vectors")]
redis[("Redis<br/>Queues • Metrics")]
end
subgraph atlas["Task Processing"]
worker["Task Worker<br/>Ralph Loop<br/>99.5% Success"]
sandbox["Sandbox :8020<br/>pytest • pylint"]
dash["Dashboard :3001<br/>Monitoring"]
end
subgraph learn["Learning"]
trainer["Nightly Trainer<br/>LoRA Fine-tune"]
lora[("Adapters<br/>Hot-swap")]
end
%% Gateway flow
client -->|"request"| proxy
proxy -.->|"validate key"| portal
proxy -->|"chat/completions"| rag
%% RAG API calls llama-server for inference
rag -->|"inference"| llama
rag -->|"embed query"| embed
embed -->|"search vectors"| qdrant
%% Task submission to Redis
rag -->|"submit task"| redis
redis -->|"poll result"| rag
%% Ralph Loop: Task Worker flow
redis -->|"pull task"| worker
worker -->|"generate code"| llama
worker -->|"test code"| sandbox
worker -->|"result + training"| redis
%% Monitoring
redis -->|"metrics"| dash
%% Learning pipeline
redis -.->|"training data"| trainer
trainer -->|"fine-tune"| lora
lora -.->|"load LoRA"| llama
classDef client fill:#37474F,stroke:#263238,color:#fff
classDef gateway fill:#607D8B,stroke:#455A64,color:#fff
classDef core fill:#2196F3,stroke:#1565C0,color:#fff
classDef gpu fill:#4CAF50,stroke:#2E7D32,color:#fff
classDef storage fill:#00BCD4,stroke:#00838F,color:#fff
classDef process fill:#FF9800,stroke:#E65100,color:#fff
classDef learn fill:#9C27B0,stroke:#6A1B9A,color:#fff
class client client
class proxy,portal gateway
class rag core
class llama,embed gpu
class qdrant,redis storage
class worker,sandbox,dash process
class trainer,lora learn
Component Details
| Layer | Service | Port | Purpose |
|---|---|---|---|
| Gateway | LLM Proxy | 8000 | Auth, rate limiting |
| API Portal | 3000 | Users, API keys, usage | |
| Core | RAG API | 8001 | Orchestration, chunking |
| llama-server | 8000 | GPU inference (Qwen3-14B) | |
| Embeddings | 8080 | Vectorization (384 dims) | |
| Storage | Qdrant | 6333 | Vector DB (HNSW) |
| Redis | 6379 | Queues, metrics, cache | |
| Processing | Task Worker | — | Ralph Loop engine |
| Sandbox | 8020 | Isolated execution | |
| Dashboard | 3001 | Monitoring UI | |
| Learning | Trainer | — | Nightly LoRA (2am) |
git clone https://github.com/itigges22/atlas.git && cd atlas
cp atlas.conf.example atlas.conf && ./scripts/install.sh
kubectl get pods # Verify all services runningRequirements: K3s, NVIDIA GPU (8GB+ VRAM), 4+ vCPU, 12GB+ RAM, 50GB+ SSD
ATLAS exposes an OpenAI-compatible API, so it works with any client that supports the OpenAI protocol.
Recommended: OpenCode Fork — A terminal-based AI coding agent based on Opencode, forked and optimized for ATLAS.
git clone https://github.com/itigges22/opencode.git && cd opencode
bun install
bun run devAlternatives: Cursor, Continue, aider, or any OpenAI-compatible client.
Ralph Loop — 99.5% Success via Test-Time Compute
P(success) = 1 - (1 - p)^k → p=0.65, k=5: 99.5%
| Attempt | Temp | Strategy |
|---|---|---|
| 1 | 0.3 | Conservative |
| 2 | 0.4 | Minor variation |
| 3 | 0.5 | Moderate creativity |
| 4 | 0.6 | Explore alternatives |
| 5 | 0.7 | Maximum creativity |
Each retry accumulates error context, guiding away from previous failures.
Continuous Learning — Nightly LoRA Fine-tuning
- Export — Successful completions (rating ≥4) from Redis
- Train — LoRA (r=8, α=16) on CPU
- Validate — 66% pass rate required
- Deploy — Hot-swap via symlink
Coming soon — Consumer vs enterprise hardware comparisons.
| Architecture | System design, data flows, algorithms |
| Configuration | All options explained |
| Setup | Installation guide |
| Troubleshooting | Common issues |
See CONTRIBUTING.md for guidelines.
Apache 2.0 — LICENSE — Copyright 2025 Isaac Tigges