Skip to content

Adaptive Test-time Learning and Autonomous Specialization

License

Notifications You must be signed in to change notification settings

itigges22/ATLAS

A.T.L.A.S

Adaptive Test-time Learning and Autonomous Specialization

📚 Docs⚙️ Config🔧 Setup

License Python K3s CUDA PRs Welcome

Self-hosted AI coding agent infrastructure running entirely on consumer hardware. Demonstrates that sophisticated AI systems—RAG, test-time compute scaling, and continuous learning—can run on a single 16GB consumer GPU.

  • 99.5% Success Rate — Ralph Loop retry algorithm with temperature escalation
  • Full RAG Pipeline — 100GB vector storage, semantic code search
  • Continuous Learning — Nightly LoRA fine-tuning from successful completions
  • Consumer Hardware — Single RTX 5060 Ti (16GB VRAM)

Hardware

Host: 4 vCPU (AMD RYZEN 5 2600) • 12GB DDR4 RAM • 150GB SSD • RHEL 9


Architecture

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2196F3', 'primaryTextColor': '#212121', 'primaryBorderColor': '#1565C0', 'lineColor': '#455A64', 'secondaryColor': '#E3F2FD', 'tertiaryColor': '#ECEFF1', 'edgeLabelBackground': '#ECEFF1'}}}%%
flowchart TB
    subgraph external[" "]
        client(["Client<br/>OpenCode / API"])
    end
    subgraph gateway["Gateway"]
        proxy["LLM Proxy :8000<br/>Auth • Rate Limit"]
        portal["API Portal :3000<br/>Users • Keys"]
    end
    subgraph core["Core Services"]
        rag["RAG API :8001<br/>Orchestration"]
        embed["Embeddings :8080<br/>MiniLM-L6-v2"]
    end

    %% Central inference engine - outside subgraphs for central positioning
    llama["llama-server :8000<br/>Qwen3-14B • GPU"]

    subgraph data["Storage"]
        qdrant[("Qdrant<br/>100GB Vectors")]
        redis[("Redis<br/>Queues • Metrics")]
    end
    subgraph atlas["Task Processing"]
        worker["Task Worker<br/>Ralph Loop<br/>99.5% Success"]
        sandbox["Sandbox :8020<br/>pytest • pylint"]
        dash["Dashboard :3001<br/>Monitoring"]
    end
    subgraph learn["Learning"]
        trainer["Nightly Trainer<br/>LoRA Fine-tune"]
        lora[("Adapters<br/>Hot-swap")]
    end
    %% Gateway flow
    client -->|"request"| proxy
    proxy -.->|"validate key"| portal
    proxy -->|"chat/completions"| rag

    %% RAG API calls llama-server for inference
    rag -->|"inference"| llama
    rag -->|"embed query"| embed
    embed -->|"search vectors"| qdrant

    %% Task submission to Redis
    rag -->|"submit task"| redis
    redis -->|"poll result"| rag

    %% Ralph Loop: Task Worker flow
    redis -->|"pull task"| worker
    worker -->|"generate code"| llama
    worker -->|"test code"| sandbox
    worker -->|"result + training"| redis

    %% Monitoring
    redis -->|"metrics"| dash

    %% Learning pipeline
    redis -.->|"training data"| trainer
    trainer -->|"fine-tune"| lora
    lora -.->|"load LoRA"| llama
    classDef client fill:#37474F,stroke:#263238,color:#fff
    classDef gateway fill:#607D8B,stroke:#455A64,color:#fff
    classDef core fill:#2196F3,stroke:#1565C0,color:#fff
    classDef gpu fill:#4CAF50,stroke:#2E7D32,color:#fff
    classDef storage fill:#00BCD4,stroke:#00838F,color:#fff
    classDef process fill:#FF9800,stroke:#E65100,color:#fff
    classDef learn fill:#9C27B0,stroke:#6A1B9A,color:#fff
    class client client
    class proxy,portal gateway
    class rag core
    class llama,embed gpu
    class qdrant,redis storage
    class worker,sandbox,dash process
    class trainer,lora learn
Loading
Component Details
Layer Service Port Purpose
Gateway LLM Proxy 8000 Auth, rate limiting
API Portal 3000 Users, API keys, usage
Core RAG API 8001 Orchestration, chunking
llama-server 8000 GPU inference (Qwen3-14B)
Embeddings 8080 Vectorization (384 dims)
Storage Qdrant 6333 Vector DB (HNSW)
Redis 6379 Queues, metrics, cache
Processing Task Worker Ralph Loop engine
Sandbox 8020 Isolated execution
Dashboard 3001 Monitoring UI
Learning Trainer Nightly LoRA (2am)

Quick Start

git clone https://github.com/itigges22/atlas.git && cd atlas
cp atlas.conf.example atlas.conf && ./scripts/install.sh
kubectl get pods  # Verify all services running

Requirements: K3s, NVIDIA GPU (8GB+ VRAM), 4+ vCPU, 12GB+ RAM, 50GB+ SSD


Recommended Client

ATLAS exposes an OpenAI-compatible API, so it works with any client that supports the OpenAI protocol.

Recommended: OpenCode Fork — A terminal-based AI coding agent based on Opencode, forked and optimized for ATLAS.

git clone https://github.com/itigges22/opencode.git && cd opencode
bun install
bun run dev

Alternatives: Cursor, Continue, aider, or any OpenAI-compatible client.


Key Algorithms

Ralph Loop — 99.5% Success via Test-Time Compute
P(success) = 1 - (1 - p)^k    →    p=0.65, k=5: 99.5%
Attempt Temp Strategy
1 0.3 Conservative
2 0.4 Minor variation
3 0.5 Moderate creativity
4 0.6 Explore alternatives
5 0.7 Maximum creativity

Each retry accumulates error context, guiding away from previous failures.

Continuous Learning — Nightly LoRA Fine-tuning
  1. Export — Successful completions (rating ≥4) from Redis
  2. Train — LoRA (r=8, α=16) on CPU
  3. Validate — 66% pass rate required
  4. Deploy — Hot-swap via symlink

Benchmarks

Coming soon — Consumer vs enterprise hardware comparisons.


Documentation

Architecture System design, data flows, algorithms
Configuration All options explained
Setup Installation guide
Troubleshooting Common issues

Contributing

See CONTRIBUTING.md for guidelines.


Apache 2.0LICENSE — Copyright 2025 Isaac Tigges

About

Adaptive Test-time Learning and Autonomous Specialization

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published