A handful of companies control the vast majority of global AI traffic — and with it, your data, your costs, and your uptime. Every query you send to a centralized provider is business intelligence you don’t own, running on infrastructure you don’t control, priced on terms you can’t negotiate.
If AI is becoming critical infrastructure, it shouldn’t be rented. Self-hosting local AI should be a sovereign human right, not a career choice.
Dream Server is the exit. A fully local AI stack — LLM inference, chat, voice, agents, workflows, RAG, image generation, and privacy tools — deployed on your hardware with a single command. No cloud. No subscriptions. No one watching.
New here? Read the Friendly Guide or listen to the audio version — a complete walkthrough of what Dream Server is, how it works, and how to make it your own. No technical background needed.
Platform Support — March 2026
Platform Status Linux (NVIDIA + AMD) Supported — install and run today Windows (NVIDIA + AMD) Supported — install and run today macOS (Apple Silicon) Supported — install and run today Tested Linux distros: Ubuntu 24.04/22.04, Debian 12, Fedora 41+, Arch Linux, CachyOS, openSUSE Tumbleweed. Other distros using apt, dnf, pacman, or zypper should also work — open an issue if yours doesn't.
Windows: Requires Docker Desktop with WSL2 backend. NVIDIA GPUs use Docker GPU passthrough; AMD Strix Halo runs llama-server natively with Vulkan.
macOS: Requires Apple Silicon (M1+) and Docker Desktop. llama-server runs natively with Metal GPU acceleration; all other services run in Docker.
See the Support Matrix for details.
Because running your own AI shouldn't require a CS degree and a weekend of debugging CUDA drivers. Right now, setting up local AI means stitching together a dozen projects, writing Docker configs from scratch, and praying everything talks to each other. Most people give up and go back to paying OpenAI.
We built Dream Server so you don't have to.
- One command — detects your GPU, picks the right model, generates credentials, launches everything
- Chatting in under 2 minutes — bootstrap mode gives you a working model instantly while your full model downloads in the background
- 13 services, pre-wired — chat, agents, voice, workflows, search, RAG, image generation, privacy tools. All talking to each other out of the box
- Fully moddable — every service is an extension. Drop in a folder, run
dream enable, done
curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/dream-server/get-dream-server.sh | bashOpen http://localhost:3000 and start chatting.
No GPU? Dream Server also runs in cloud mode — same full stack, powered by OpenAI/Anthropic/Together APIs instead of local inference:
./install.sh --cloud
Port conflicts? Every port is configurable via environment variables. See
.env.examplefor the full list, or override at install time:WEBUI_PORT=9090 ./install.sh
Manual install (Linux)
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shWindows (PowerShell)
Requires Docker Desktop with WSL2 backend enabled. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer
.\install.ps1The installer detects your GPU, picks the right model, generates credentials, starts all services, and creates a Desktop shortcut to the Dashboard. Manage with .\dream-server\installers\windows\dream.ps1 status.
macOS (Apple Silicon)
Requires Apple Silicon (M1+) and Docker Desktop. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shThe installer detects your chip, picks the right model for your unified memory, launches llama-server natively with Metal acceleration, and starts all other services in Docker. Manage with ./dream-macos.sh status.
See the macOS Quickstart for details.
- Open WebUI — full-featured chat interface with conversation history, web search, document upload, and 30+ languages
- llama-server — high-performance LLM inference with continuous batching, auto-selected for your GPU
- LiteLLM — API gateway supporting local/cloud/hybrid modes
- Whisper — speech-to-text
- Kokoro — text-to-speech
- OpenClaw — autonomous AI agent framework
- n8n — workflow automation with 400+ integrations (Slack, email, databases, APIs)
- Qdrant — vector database for retrieval-augmented generation (RAG)
- SearXNG — self-hosted web search (no tracking)
- Perplexica — deep research engine
- ComfyUI — node-based image generation
- Privacy Shield — PII scrubbing proxy for API calls
- Dashboard — real-time GPU metrics, service health, model management
The installer detects your GPU and picks the optimal model automatically. No manual configuration.
| VRAM | Model | Example GPUs |
|---|---|---|
| < 8 GB | Qwen3.5 2B (Q4_K_M) | Any GPU or CPU-only |
| 8–11 GB | Qwen3 8B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen3 8B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen3 14B (Q4_K_M) | RTX 4090, A6000 |
| 40+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | A100, multi-GPU |
| 90+ GB | Qwen3 Coder Next (80B MoE, Q4_K_M) | Multi-GPU A100/H100 |
| Unified RAM | Model | Hardware |
|---|---|---|
| 64–89 GB | Qwen3 30B-A3B (30B MoE) | Ryzen AI MAX+ 395 (64GB) |
| 90+ GB | Qwen3 Coder Next (80B MoE) | Ryzen AI MAX+ 395 (96GB) |
| Unified RAM | Model | Example Hardware |
|---|---|---|
| < 16 GB | Qwen3.5 2B (Q4_K_M) | M1/M2 base (8GB) |
| 16–24 GB | Qwen3 4B (Q4_K_M) | M4 Mac Mini (16GB) |
| 32 GB | Qwen3 8B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
| 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |
Override tier selection: ./install.sh --tier 3
No waiting for large downloads. Dream Server uses bootstrap mode by default:
- Downloads a tiny 1.5B model in under a minute
- You start chatting immediately
- The full model downloads in the background
- Hot-swap to the full model when it's ready — zero downtime
The installer pulls all services in parallel. Downloads are resume-capable — interrupted downloads pick up where they left off.
Skip bootstrap: ./install.sh --no-bootstrap
The installer picks a model for your hardware, but you can switch anytime:
dream model current # What's running now?
dream model list # Show all available tiers
dream model swap T3 # Switch to a different tierIf the new model isn't downloaded yet, pre-fetch it first:
./scripts/pre-download.sh --tier 3 # Download before switching
dream model swap T3 # Then swap (restarts llama-server)Already have a GGUF you want to use? Drop it in data/models/, update GGUF_FILE and LLM_MODEL in .env, and restart:
docker compose restart llama-serverRollback is automatic — if a new model fails to load, Dream Server reverts to your previous model.
Dream Server is designed to be modded. Every service is an extension — a folder with a manifest.yaml and a compose.yaml. The dashboard, CLI, health checks, and compose stack all discover extensions automatically.
extensions/services/
my-service/
manifest.yaml # Metadata: name, port, health endpoint, GPU backends
compose.yaml # Docker Compose fragment (auto-merged into the stack)
dream enable my-service # Enable it
dream disable my-service # Disable it
dream list # See everythingThe installer itself is modular — 6 libraries and 13 phases, each in its own file. Want to add a hardware tier, swap a default model, or skip a phase? Edit one file.
Full extension guide | Installer architecture
The dream CLI manages your entire stack:
dream status # Health checks + GPU status
dream list # All services and their state
dream logs llm # Tail logs (aliases: llm, stt, tts)
dream restart [service] # Restart one or all services
dream start / stop # Start or stop the stack
dream mode cloud # Switch to cloud APIs via LiteLLM
dream mode local # Switch back to local inference
dream mode hybrid # Local primary, cloud fallback
dream model swap T3 # Switch to a different hardware tier
dream enable n8n # Enable an extension
dream disable whisper # Disable one
dream config show # View .env (secrets masked)
dream preset save gaming # Snapshot current config
dream preset load gaming # Restore itOther tools get you part of the way. Dream Server gets you the whole way.
| Dream Server | Ollama + Open WebUI | LocalAI | |
|---|---|---|---|
| Scope | Full AI stack — inference to agents to workflows | LLM + chat | LLM only |
| One-command install | Everything, auto-configured | LLM + chat only | LLM only |
| Hardware auto-detect + model selection | NVIDIA + AMD Strix Halo | No | No |
| AMD APU unified memory support | ROCm + llama-server | Partial (Vulkan) | No |
| Autonomous AI agents | OpenClaw | No | No |
| Workflow automation | n8n (400+ integrations) | No | No |
| Voice (STT + TTS) | Whisper + Kokoro | No | No |
| Image generation | ComfyUI | No | No |
| RAG pipeline | Qdrant + embeddings | No | No |
| Extension system | Manifest-based, hot-pluggable | No | No |
| Multi-GPU | Yes (NVIDIA) | Partial | Partial |
| Quickstart | Step-by-step install guide with troubleshooting |
| Hardware Guide | What to buy, tier recommendations |
| FAQ | Common questions and configuration |
| Extensions | How to add custom services |
| Installer Architecture | Modular installer deep dive |
| Changelog | Version history and release notes |
| Contributing | How to contribute |
Dream Server exists because people chose to build instead of wait. Every contributor here is part of something bigger than code — a growing resistance against the idea that AI should be rented, gated, and controlled by the few. These are the founders of the sovereign AI movement, proving that one person, one machine, and one dream is enough.
Thanks to kyuz0 for amd-strix-halo-toolboxes — pre-built ROCm containers for Strix Halo that saved us a lot of pain from having to build our own. And to lhl for strix-halo-testing — the foundational Strix Halo AI research and rocWMMA performance work that the broader community builds on.
- llama.cpp (ggerganov) — LLM inference engine
- Qwen (Alibaba Cloud) — Default language models
- Open WebUI — Chat interface
- ComfyUI — Image generation engine
- FLUX.1 (Black Forest Labs) — Image generation model
- AMD ROCm — GPU compute platform
- AMD Strix Halo Toolboxes (kyuz0) — Pre-built ROCm containers for AMD inference
- Strix Halo Testing (lhl) — Foundational Strix Halo AI research and rocWMMA optimizations
- n8n — Workflow automation
- Qdrant — Vector database
- SearXNG — Privacy-respecting search
- Perplexica — AI-powered search
- LiteLLM — LLM API gateway
- Kokoro FastAPI (remsky) — Text-to-speech
- Speaches — Speech-to-text
- Strix Halo Home Lab — Community knowledge base
-
Yasin Bursali (yasinBursali) — Fixed CI workflow discovery, added dashboard-api router test coverage with security-focused tests (auth enforcement, path traversal protection), documented all 14 undocumented extension services, fixed macOS disk space preflight to check the correct volume for external drive installs, moved embeddings platform override to prevent orphaned service errors when RAG is disabled, fixed macOS portability issues restoring broken Apple Silicon Neural Engine detection (GNU date/grep to POSIX), fixed docker compose failure diagnostic unreachable under pipefail, added stderr warning on manifest parse failure in compose resolver, fixed socket FD leak in dashboard-api, added open-webui health gate to prevent 502 errors during model warmup, hardened ComfyUI with loopback binding and no-new-privileges on both NVIDIA and AMD, fixed Apple Silicon memory limit variable mismatch, added
set -euo pipefailto the installer catching silent failures, secured OpenCode with loopback binding and auto-generated passwords, added missing external_port_env to token-spy and dashboard manifests fixing hardcoded port resolution, fixed Apple Silicon dashboard to show correct RAM and GPU info using HOST_RAM_GB unified memory override, added VRAM gate fallback for Apple Silicon so features no longer incorrectly show insufficient_vram on unified memory machines, set OLLAMA_PORT=8080 in the macOS compose overlay with GPU_BACKEND=apple alignment, added dynamic port conflict detection from extension manifests on macOS, added cross-platform_sed_ihelper for BSD/GNU sed compatibility, removed API key from token-spy HTML response replacing it with a sessionStorage-based login overlay, added WSL2 host RAM detection via powershell.exe for correct tier selection, fixed dashboard health checks to treat HTTP 4xx as unhealthy, replaced GNU-onlydate +%s%Nwith portable_now_ms()timestamps across 8 files, fixed COMPOSE_FLAGS word-splitting bugs by converting to arrays, added a macOS readiness sidecar for native llama-server before open-webui starts, added mode-aware compose overlays for litellm/openclaw/perplexica depends_on (local/hybrid only), fixed subprocess leak on client disconnect in setup.py, added Bash 4+ guard with Homebrew re-exec for macOS health checks replacing associative arrays with portable indexed arrays, and added .get() defaults for optional manifest feature fields preventing KeyError on sparse manifests, added Langfuse LLM observability extension (foundation) shipping disabled by default with auto-generated secrets and telemetry suppression, added Bash 4+ guard with portable indexed arrays for macOS health checks, wired LiteLLM to Langfuse with conditional callback activation, removed duplicate network definition in docker-compose.base.yml, fixed macOS llama-server DNS resolution for LiteLLM via extra_hosts, and surfaced manifest YAML parse errors in the dashboard-api status response with narrowed exception handling -
latentcollapse (Matt C) — Security audit and hardening: OpenClaw localhost binding fix, multi-GPU VRAM detection, AMD dashboard hardening, and the Agent Policy Engine (APE) extension
-
Igor Lins e Silva (igorls) — Stability audit fixing 9 infrastructure bugs: dynamic compose discovery in backup/restore/update scripts, Token Spy persistent storage and connection pool hardening, dotglob rollback fix, systemd auto-resume service correction, removed auth gate from preflight ports endpoint for setup wizard compatibility, added ESLint flat config for the dashboard, cleaned up unused imports and linting across the Python codebase, and resolved CI failures across dashboard and smoke tests
-
Nino Skopac (NinoSkopac) — Token Spy dashboard improvements: shared metric normalization with parity tests, budget and active session tracking, configurable secure CORS replacing wildcard origins, and DB backend compatibility shim for sidecar migration
-
Glexy (fullstackdev0110) — Fixed dream-cli chat port initialization bug, hardened validate.sh environment variable handling with safer quoting and .env parsing, removed all
evalusage from installer/preflight env parsing, added a safe-env loader (lib/safe-env.sh) to prevent shell injection, unified all .env loading across 9 scripts to useload_env_file()eliminating duplicated parsers, added dream-cli status-json/config-validate/mode-summary commands, added extension manifest validation with versioned compatibility gating (dream_min/dream_max) for the v2 extension ecosystem, added comprehensive compatibility matrix documentation, added test suites with CI integration for manifest validation, health checks, env validation, and CPU-only path, made session-cleanup.sh portable across macOS/Linux (POSIX grep, stat, numfmt fallback), added --help flag to session-cleanup.sh, and fixed ShellCheck SC2086 warnings and SC2155 errors across health-check.sh, detect-hardware.sh, pre-download.sh, progress.sh, qrcode.sh, migrate-config.sh, llm-cold-storage.sh, session-manager.sh, and 07-devtools.sh -
bugman-007 — Parallelized health checks in dream status for 5–10× speedup using async gather with proper timeout handling, benchmark and test scripts, integrated backup/restore commands into dream-cli, added preset import/export with path traversal protection and archive validation, added preset diff command for comparing configurations with secret masking, quarantined broken edge quickstart instructions replacing them with supported cloud mode path, added SHA256 integrity manifests and verification for backups, added restore safety prompts requiring backup ID confirmation, added backup/restore round-trip integration test, added preset compatibility validation before load, added service registry tests to CI, added Python type checking with mypy, added disk space preflight checks to backup/restore with portable size estimation, and added session-level caching to compose flags resolution for performance, expanded dashboard-api test coverage for privacy, updates, and workflow endpoints, added structured logging to agent monitor replacing silent exception swallowing, added bash completion for dream-cli with dynamic backup ID resolution, added automatic pre-update backup with rollback command and health verification, fixed gitleaks CI to use OSS CLI instead of paid license action, added disk space preflight checks to backup/restore, and replaced disabled VAD patch with AST-based Python patcher for safe Whisper voice activity detection
-
norfrt6-lab — Replaced 12+ silent exception-swallowing patterns with specific exception types and proper logging, added cross-platform system metrics (macOS/Windows) for uptime, CPU, and RAM, plus Apple Silicon GPU detection via sysctl/vm_stat
-
boffin-dmytro — Added SHA256 integrity verification for GGUF model downloads with pre- and post-download checks, corrupt file detection with automatic re-download, fixed model filename casing mismatches, added network timeout hardening across 33+ HTTP operations preventing indefinite hangs, added port conflict and Ollama detection for the Linux installer matching macOS parity, fixed trap handler bugs in installer phases replacing explicit tmpfile cleanup for safe early-exit, added retry logic with error classification and exponential backoff for Docker image pulls, added a GPU detection progress indicator eliminating user anxiety during hardware scans, added Windows zip integrity validation with retry logic, added Docker image pull retry with timeout and post-pull validation via
docker inspect, added Intel Arc GPU detection and CPU-only default fallback replacing incorrect NVIDIA assumption, added compose stack validation during phase 02 catching syntax errors early, added background process tracking for FLUX model downloads with JSON-based task registry, and improved health check robustness with per-request timeout and adaptive exponential backoff, added unified cross-platform path resolution utilities with POSIX-portable disk space checks, added markdown local link validation for CI, added download robustness with exponential-backoff retry to macOS installer, added configurable health check timeouts to manifest schema solving slow-start services, added SHA256 checksum verification to restore operations with graceful fallback for older backups, added service dependency validation before compose up preventing missing-service failures, added comprehensive manifest schema validator, reduced installed footprint by excluding dev-only files via rsync, added strict error handling ('set -euo pipefail') to operational scripts, added doc link checker to CI, and added rsync progress indicators to backup/restore operations -
takutakutakkun0420-hue — Added log rotation to all base services preventing unbounded disk growth, and added open-webui startup dependency on llama-server health ensuring the UI never shows a broken state
-
reo0603 — Fixed Makefile paths after dashboard-api move and heredoc quoting bug in session-manager.sh SSH command, narrowed broad exception catches to specific types across dashboard-api, parallelized health checks for 17× faster execution, added compose.local.yaml for dashboard/open-webui/privacy-shield service dependencies, added .dockerignore files to all custom Dockerfiles reducing build context, fixed H2C smuggling vector in nginx proxy and added wss:// for HTTPS in voice agent, added comprehensive extension integration and hardware compatibility test suites, and hardened secret management with .gitignore patterns for key/pem/credential files and SQL identifier validation in token-spy
-
nt1412 — Wired dashboard-api agent metrics to Token Spy with background metrics collection, added TOKEN_SPY_URL/TOKEN_SPY_API_KEY env vars, and fixed missing key_management.py in privacy-shield Dockerfile
-
evereq — Relocated docs/images to resources/docs/images for cleaner monorepo root
-
championVisionAI — Added Alpine Linux (apk) and Void Linux (xbps) package manager support to the installer abstraction layer, hardened hardware detection with JSON output escaping and container/WSL2 detection, rewrote healthcheck.py with retries, HEAD-to-GET fallback, status code matching, and structured JSON output, hardened Docker phase with daemon start/retry logic and compose v1/v2 detection, added cross-platform python3/python command resolution with shared detection utility, and hardened env schema validation with robust .env parsing, enum validation, and line-number error reporting, added sim summary validation test suite with 10 test cases covering help, missing files, invalid JSON, and strict mode, hardened hardware detection with JSON output escaping and container/WSL2 detection, hardened healthcheck.py with retries and HEAD-to-GET fallback, hardened Docker phase with daemon start/retry and compose v1/v2 detection, fixed Windows python3/python command resolution, added extension audit workflow with 838-line Python auditor and 'dream audit' CLI command, added duplicate key detection to env validation, added compact JSON output mode and --help flag to hardware detection, and failed env validation on duplicate keys preventing silent config corruption
-
Tony363 — Hardened service-registry.sh against shell injection in service IDs, narrowed silent exception catches in db.py to specific types, improved PII scrubber with Luhn check for credit card detection and deterministic token round-trip, fixed token-spy settings persistence with atomic writes replacing broken fcntl locking, fixed SSH command injection in session-manager.sh using stdin piping with shlex.quote, narrowed broad exception catches across dashboard-api to specific types with appropriate log levels, and added CLAUDE.md with project instructions and design philosophy
-
buddy0323 — Ported Windows installer phases 01-07 to native PowerShell decomposing the monolithic script into focused phase files, added Intel Arc SYCL tier map (ARC/ARC_LITE) with docker-compose.intel.yml overlay, detection logic, tier-map tests, and SHA256 verification, added Intel Arc oneAPI SYCL compose overlay with two-stage llama-sycl Dockerfile, added Intel Arc detection checks (lspci, Level Zero runtime, render nodes, group membership), and authored the Intel Arc support matrix documentation and setup guide
-
blackeagle273 — Enhanced macOS installer with idempotent .env and config generation preserving existing secrets across re-installs
-
eva57gr — Fixed bash syntax error in Token Spy session-manager.sh SSH heredoc command, and unified port contract across installer, schema, compose, and manifests with canonical ports.json registry
-
cycloarcane — Fixed unbound variable crash by guarding service-registry.sh sourcing in install-core.sh, health-check.sh, and 04-requirements.sh
-
Rowan (rowanbelanger713) — Enhanced llama-server with configurable batch-size, threads, and parallel request knobs, added TTL caching and async threading to dashboard-api status endpoints, pooled httpx connections for LiteLLM, lazy-loaded React routes with memoized components, scoped CSS transitions to interactive elements, paused polling on hidden tabs, and split Vite output into vendor/icons chunks for faster loading
-
onyxhat — Fixed missing variable initialization in installer scripts If we missed anyone, open an issue. We want to get this right.
Apache 2.0 — Use it, modify it, ship it. See LICENSE.
Built by Light Heart Labs and the growing resistance that refuses to rent what should be owned.


