Most "AI desktop" projects fall into one of two camps:
- An Electron app that takes over your real laptop (UI-TARS-desktop). The model clicks your keyboard, the operator hopes nothing else is open, and there's no fleet story.
- A cloud VM you rent (Anthropic Cowork, ByteDance Remote, Cua, Bytebot-now-archived). Polished UX, surprise bills, vendor sees every pixel, can't put it on your VPS.
Atelier OS is a third thing: a docker compose up that gives every
teammate their own persistent Linux desktop, themed for AI use,
driven by an open-weights vision model running on your hardware via a
small FastAPI fleet that screenshots → thinks → clicks for as long as
the task takes. The AI's files survive restarts. The fleet API lets one
chat dispatch parallel goals across N employees' desktops. Every action
is in an append-only audit log on your disk.
MIT. Built atop standard Linux pieces — Sway (Wayland headless),
grim / wtype / wlrctl, Holo3-35B-A3B (#1 open computer-use model
on OSWorld-Verified at 77.8%) or Anthropic Computer Use API. The whole
orchestrator is ~1,300 LOC of Python + ~250 LOC of configs.
Captured via GET /sessions/{id}/screenshot after dispatching the
sequence below through the action-daemon — proving every verb
(key combo, type, screenshot) end-to-end:
# 1. Sway opens foot (terminal) on Mod+Return
$ echo '{"action":"key","text":"super+Return"}' | docker exec -i ${CN} atelier-action-cli
# 2. Type a comment
$ echo '{"action":"type","text":"# Atelier OS v0.1.5 — live desktop"}' | docker exec -i ${CN} atelier-action-cli
$ echo '{"action":"key","text":"Return"}' | docker exec -i ${CN} atelier-action-cli
# 3. Screenshot
$ curl -o shot.png http://fleet:8090/sessions/${SID}/screenshot
$ file shot.png
shot.png: PNG image data, 1280 x 720, 8-bit/color RGB, non-interlacedThe image above is the byte-identical output of that last curl.
./setup.shfrom a fresh clone → fleet API healthy on:8090in ~2 minPOST /sessions→ spawns a per-teammate desktop container (image 0.1.5)- Desktop boots: Sway → action-daemon → wayvnc → websockify + noVNC on
:8443 GET /sessions/{id}/embed→ 307 to a live, iframe-able noVNC stream (Wayland-native VNC over WebSocket, ~500 ms latency, no plugin)GET /sessions/{id}/screenshot→ real 1280×720 PNGPOST /sessions/{id}/task→ dispatches a goal, SSE step stream; per-session mutex — a second task on the same session returns 409- 12 action verbs work end-to-end through the in-container daemon: screenshot, mouse_move (absolute), left/right/double click, type (incl. unicode), key (Return, Escape, ctrl+a, alt+Tab, …), scroll up+down, wait
GET /sessions/{id}/audit→ every action + every cost increment, JSONL- Per-session
/home/operatorsurvives restart; registry survives fleet restart - Daily-cost cap (
MAX_USD_PER_DAY), enforced atomically across concurrent submits - 451/451 tests pass: 306 unit (sub-3s, no docker) + 145 integration (against live containers — every endpoint, every verb, every error path, multi-session stress, audit integrity, performance benchmarks p50/p95/p99, noVNC websockify handshake, documentation accuracy, snapshot lifecycle, Bearer-auth gating, Holo3 compose-profile structure, install-smoke harness)
- Operator install verifier:
python3 scripts/launch-smoke.pyafterdocker compose up -dwalks spawn → screenshot → snapshot → clone → teardown end-to-end. Stdlib-only, exits 0/1, slots into CI.
See TRACKING.md for the full shipping log — every known limitation has a reproducer + acceptance + GitHub issue link. v0.1.6 closed all 8 v0.2 deliverables (S1-S8). Status at a glance:
- ✅ Fleet-API auth (S1, PR #13) — opt-in
ATELIER_API_TOKEN, constant- time comparison, anti-leak header-only design - ✅ TLS in compose (S2, PR #14) — opt-in
with-tlsprofile, Caddy auto-Let's-Encrypt, SSE-flush + X-Forwarded-* preserved - ✅ Per-session CPU/memory quotas (S3, PR #11) —
--memory 4g --cpus 2.0defaults via env - ✅ Snapshot + clone session-state API (S4, PR #15) —
POST /sessions/{id}/snapshot+POST /sessions {snapshot_id: ...}clones source's image + home tar into a new session. Path-traversal safe. - ✅ Live budget visible in the desktop (S5, PR #12) — waybar shows
$0.42 / $2.00; falls back to last-known with(stale)on fleet hiccup - ✅ Bundled vLLM Holo3 sidecar (S6, PR #16) —
--profile holo3 upspawns vLLM with safe 24 GB GPU defaults; weights cached in named volume - ✅ Action-daemon protocol versioning (S7, PR #11) —
{"v": 1, ...}on every wire-level request - ✅ Boot-wait race (S8, PR #10) —
?wait=readyquery param blocks until the desktop healthcheck is green - ⏭ Real WebRTC (vs WebSocket VNC) — out of scope for atelier-os; future separate project (the v0.1.6 wayvnc + websockify + noVNC stack is iframe-embeddable today, ~500 ms latency)
"In the market of Iron you need to sell Gold." Atelier OS is the Gold: every piece is standard upstream Linux but the curation, theming, streaming, fleet, audit, and chat-embed are assembled into one experience nobody else ships.
git clone https://github.com/karany97/atelier-os.git
cd atelier-os
./setup.sh # one prompt for a session password, then ~2 minThat's the whole install. The script copies .env.example → .env,
prompts for a session password (or generates one), checks FLEET_PORT
isn't taken (common collision: multica also wants 8090), builds both
images (docker compose build fleet desktop-template), starts the
fleet, and waits for /health.
# stdlib-only, no pip install. Walks /health → /budget → spawn session
# (with ?wait=ready) → screenshot → snapshot → clone-from-snapshot →
# teardown. Reports PASS/FAIL per check; exits 0 if all green.
python3 scripts/launch-smoke.pyOperators behind bearer auth pass the token:
FLEET_URL=http://localhost:8090 \
ATELIER_API_TOKEN=$(grep ATELIER_API_TOKEN .env | cut -d= -f2) \
python3 scripts/launch-smoke.pyManual override paths:
# Per-session embed (iframe-able noVNC live stream)
$ curl -X POST http://localhost:8090/sessions \
-H 'content-type: application/json' \
-d '{"label":"alice"}'
{"id":"sess_...","embed_url":"http://localhost:8090/sessions/sess_.../embed", ...}
$ curl -I http://localhost:8090/sessions/sess_.../embed
HTTP/1.1 307 Temporary Redirect
Location: http://localhost:7100/ # → noVNC client served on the session's port
# Per-session screenshot (PNG bytes through the action-daemon)
$ curl -o shot.png http://localhost:8090/sessions/sess_.../screenshot
$ file shot.png
shot.png: PNG image data, 1280 x 720, 8-bit/color RGB
# Dispatch a goal (returns 202; tail the SSE stream for live steps)
$ curl -X POST http://localhost:8090/sessions/sess_.../task \
-H 'content-type: application/json' \
-d '{"goal":"open epiphany and search nandai jewellery"}'
{"task_id":"task_...","status":"running","stream_url":".../stream",...}To spin up a second employee's desktop:
curl -X POST http://localhost:8090/sessions \
-H 'content-type: application/json' \
-d '{"label":"alice","theme":"terracotta"}'
# → 201 + {"session_id":"...", "embed_url":"...", "vnc_url":"..."}To embed in any chat / dashboard:
<iframe src="https://atelier-os.example.com/sessions/abc123/embed"
allow="clipboard-read; clipboard-write; fullscreen"
sandbox="allow-scripts allow-same-origin allow-forms allow-popups"
referrerpolicy="no-referrer"></iframe>The default docker compose up runs the fleet on plaintext HTTP at
$FLEET_PORT (assume private network). To put Caddy in front for TLS
termination + automatic Let's Encrypt cert:
# 1. Copy + customize the Caddyfile template (or just set the env vars below)
cp compose/Caddyfile.example compose/Caddyfile
# 2. Tell Caddy your domain + Let's Encrypt contact email
cat >> .env <<'EOF'
ATELIER_TLS_DOMAIN=atelier-os.example.com
ATELIER_TLS_EMAIL=ops@example.com
EOF
# 3. Bring up the with-tls profile (caddy alongside the fleet)
docker compose --profile with-tls up -dCaddy auto-provisions a cert via HTTP-01 challenge on first request, so
:80 + :443 must be reachable from the public internet and your
domain's A/AAAA record must point at this host. ACME state persists in a
named volume across docker compose down.
Caveat — per-session noVNC ports. The /sessions/{id}/embed
endpoint redirects to http://host:7XXX/ (the per-session port the
desktop container exposes). Caddy in this template does NOT TLS-
terminate those ports — they stay plain HTTP. If your iframe loads from
an HTTPS atelier, browsers will block the HTTP redirect target
(mixed-content). For v0.1.6 most operators keep those per-session ports
on a private network and embed from a same-network atelier instance.
v0.2 will ship a handle_path /v/{port}/* Caddy pattern that
dynamically proxies the per-session ports through the same TLS endpoint.
The default MODEL_BACKEND=anthropic calls Anthropic Computer Use (real
$0.05–$0.40/task). For fully-local, free, open-weights vision the
holo3 profile spins up a vLLM sidecar serving Holo3-35B-A3B:
# 1. NVIDIA Container Toolkit installed on the host (24 GB GPU minimum)
# 2. Switch the fleet to local Holo3 in .env
sed -i 's/^MODEL_BACKEND=anthropic/MODEL_BACKEND=holo3/' .env
# 3. Bring up the fleet + the Holo3 sidecar
docker compose --profile holo3 up -dFirst boot downloads ~70 GB of Holo3 weights from HuggingFace (cached
to the holo3-models named volume; survives docker compose down).
Steady-state boot: ~30 s. The fleet auto-discovers
HOLO3_ENDPOINT=http://holo3:8000/v1 over the compose network.
Holo3-35B-A3B is a MoE with ~3 B active params — single-card inference on RTX 3090 / 4090 / A6000 / A100. OSWorld-Verified 77.8% (the #1 OSS computer-use model, beats Anthropic Opus 4.6 at 1/10 the cost).
The fleet API is open by default (assume private network). To require
Authorization: Bearer <token> on every state-touching endpoint:
echo "ATELIER_API_TOKEN=$(openssl rand -hex 32)" >> .env
docker compose restart fleet/health, /sessions/{id}/embed, and /budget stay open by design
(load-bearing for healthchecks, iframe navigation, and the in-container
budget poller respectively — see comments in driver/src/main.py).
Constant-time comparison via hmac.compare_digest. Query-param tokens
are rejected (anti-leak).
┌──────────────────────────┐
│ Destiny Atelier (chat) │ ──── any chat surface, really
└──────────┬───────────────┘
│ iframe / SSE
▼
┌──────────────────────────────────────────────────────────┐
│ Atelier OS fleet API (FastAPI, this repo) │
│ · POST /sessions create new desktop │
│ · GET /sessions/{id}/embed iframe-able WebRTC URL │
│ · POST /sessions/{id}/task dispatch a goal │
│ · GET /sessions/{id}/audit per-turn audit log │
│ · DELETE /sessions/{id} teardown + persist home │
└──────────┬───────────────────────────────────────────────┘
│ docker exec
▼
┌──────────────────────────────────────────────────────────┐
│ Per-employee desktop container (one each) │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Sway (Wayland kiosk compositor) │ │
│ │ + waybar status bar (Atelier branding) │ │
│ │ + Firefox + xterm + Files + your apps │ │
│ │ + GTK4/Qt6 Atelier theme (terracotta default) │ │
│ └────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ wayvnc + websockify + noVNC (HTTP→WS→VNC on :8443)│ │
│ │ · Wayland-native VNC server (sway's sibling project)│
│ │ · ~500 ms latency over WebSocket, no plugin │ │
│ │ · iframe-embeddable, fullscreen-capable │ │
│ └────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Action daemon (grim + wtype + ydotool) │ │
│ │ · screenshot via grim (Wayland-native, lossless) │ │
│ │ · type via wtype, key via ydotool │ │
│ │ · ~1 ms per action (Unix socket, not docker exec)│ │
│ └────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
│
▼ (model calls)
┌────────────────────────┐ ┌────────────────────────┐
│ Holo3-35B-A3B local │ OR │ Anthropic Computer │
│ (Apache 2.0, vLLM) │ │ Use API (paid escape) │
│ OSWorld-Verified 77.8%│ │ Sonnet 4.5 default │
└────────────────────────┘ └────────────────────────┘
| UI-TARS-desktop | Bytebot (archived) | Anthropic Cowork | Atelier OS | |
|---|---|---|---|---|
| Form factor | Electron app | Docker (dead since Mar 2026) | macOS/Windows app | docker compose up |
| Per-employee fleet | ❌ | ❌ | ❌ | ✅ N sessions |
| Iframe-embeddable | ❌ | ❌ | ❌ | ✅ noVNC/wayvnc |
| Persistent home | ❌ (releases on navigate) | ✅ | ❌ | ✅ |
| Open weights | ✅ (UI-TARS 1.5-7B) | ✅ | ❌ | ✅ (Holo3 default) |
| Wayland-native screen capture | ❌ (X11) | ❌ (X11) | n/a | ✅ (grim) |
| Live video stream | ✅ (WebRTC) | ✅ (VNC) | n/a | ✅ (wayvnc, WS) |
| Screenshot-on-demand via API | ❌ | ❌ | n/a | ✅ (proven, PNG) |
| Per-session task mutex | n/a | n/a | n/a | ✅ |
| Test count | unknown | unknown | proprietary | 451 unit+integration |
| Audit log per turn | ❌ | ❌ | ❌ | ✅ JSONL |
| Self-host floor | local app on your laptop | Docker on Linux | macOS only | Docker on any Linux |
| License | Apache 2.0 | Apache 2.0 | proprietary | MIT |
| Cost per task | model-only | model-only | $0.20-$2.00 | model-only ($0 with local Holo3, $0.05-$0.40 with Sonnet 4.5) |
Holo3-35B-A3B (HCompany, 2026) is currently the #1 open-source computer-use model on the OSWorld- Verified benchmark at 77.8% — ahead of OpenCUA-72B (45%) and Simular Agent-S2 (34.5%), and even beats Anthropic Opus 4.6 on the same benchmark at roughly 1/10 the per-token cost.
It's a 35B-parameter MoE with 3B active params — fits a single RTX 3090 at 40-60 t/s via vLLM. Apache 2.0 weights. We ship a vLLM systemd unit that boots it on demand.
Operators who don't have a GPU can point the driver at the Anthropic
Computer Use API instead (MODEL_BACKEND=anthropic). Either way, the
desktop, the fleet, the audit, the embed — all stay the same.
- ✅ Repo layout (MIT, README, threat model, architecture doc, CHANGELOG)
- ✅
compose/docker-compose.yml— fleet API + N session containers, opt-inwith-tls(Caddy) +holo3(vLLM sidecar) profiles - ✅
desktop/Dockerfile— Ubuntu 24.04 + Sway + Atelier theme, buildsatelier-os/desktop:0.1.5cleanly - ✅
desktop/sway-config+desktop/waybar-config.json— keyboard kiosk + live-budget status bar - ✅
desktop/atelier-budget-poller.py— in-container fleet/budgetpoller, atomic write to/tmp/atelier-budget - ✅
driver/src/main.py— FastAPI fleet API (sessions, embed, task, audit, budget, snapshot lifecycle, optional Bearer-token gating) - ✅
driver/src/session.py— per-session container lifecycle, quotas,snapshot_id-driven clone path - ✅
driver/src/snapshot.py— multi-session snapshot module (image + home tar, path-traversal-safe extract, most-recent delete safeguard) - ✅
driver/src/action.py— Wayland-native action dispatch - ✅
driver/src/model.py— Holo3 + Anthropic backends, swappable - ✅
driver/src/audit.py— append-only JSONL audit log - ✅
tests/— 451 tests (306 unit + 145 integration), all green - ✅
scripts/launch-smoke.py— operator install verifier (spawn → snapshot → clone → teardown, exits 0/1) - ✅
docs/threat-model.md+docs/architecture.md - ✅
TRACKING.md— every known limitation has a closed PR + writeup - ⏳
.github/workflows/ci.yml— staged; awaits one-time OAuth scope refresh on the maintainer side (issue #9)
- Hot-swap themes per-session (each employee picks their palette)
- File-drop from chat to session home
- Real WebRTC streaming pipeline (vs current wayvnc + WebSocket) for the per-session embed
- Multi-cluster: federate sessions across N atelier-os hosts
- One-USB-boot ISO ("Atelier OS Live") for kiosk deployments
| Repo | What |
|---|---|
| karany97/nandai-atelier | The chat surface that embeds Atelier OS sessions |
| karany97/destiny-computer | The v0.1 single-desktop pattern Atelier OS evolved from |
| karany97/atelier-os | This repo |
MIT. See LICENSE.
If you find a vulnerability, please email
security@destiny.computer instead of opening a public issue.
PGP key in repo root. 72-hour acknowledgement, 14-day public disclosure.
