Skip to content

karany97/atelier-os

Repository files navigation

Atelier OS

A purpose-built Linux desktop for AI agents.
Sway compositor + open-weights vision model.
One container per teammate. Fleet API. Bug-free or it doesn't ship.

License: MIT Composes with: Destiny Atelier Composes with: Destiny Computer Stack: Sway + Holo3 Status: v0.1.6 production ready Tests: 451/451 Release: v0.1.6 CI


What this is

Most "AI desktop" projects fall into one of two camps:

  1. An Electron app that takes over your real laptop (UI-TARS-desktop). The model clicks your keyboard, the operator hopes nothing else is open, and there's no fleet story.
  2. A cloud VM you rent (Anthropic Cowork, ByteDance Remote, Cua, Bytebot-now-archived). Polished UX, surprise bills, vendor sees every pixel, can't put it on your VPS.

Atelier OS is a third thing: a docker compose up that gives every teammate their own persistent Linux desktop, themed for AI use, driven by an open-weights vision model running on your hardware via a small FastAPI fleet that screenshots → thinks → clicks for as long as the task takes. The AI's files survive restarts. The fleet API lets one chat dispatch parallel goals across N employees' desktops. Every action is in an append-only audit log on your disk.

MIT. Built atop standard Linux pieces — Sway (Wayland headless), grim / wtype / wlrctl, Holo3-35B-A3B (#1 open computer-use model on OSWorld-Verified at 77.8%) or Anthropic Computer Use API. The whole orchestrator is ~1,300 LOC of Python + ~250 LOC of configs.

What the desktop looks like

atelier-os v0.1.5 live desktop

Captured via GET /sessions/{id}/screenshot after dispatching the sequence below through the action-daemon — proving every verb (key combo, type, screenshot) end-to-end:

# 1. Sway opens foot (terminal) on Mod+Return
$ echo '{"action":"key","text":"super+Return"}'  | docker exec -i ${CN} atelier-action-cli
# 2. Type a comment
$ echo '{"action":"type","text":"# Atelier OS v0.1.5 — live desktop"}' | docker exec -i ${CN} atelier-action-cli
$ echo '{"action":"key","text":"Return"}' | docker exec -i ${CN} atelier-action-cli
# 3. Screenshot
$ curl -o shot.png http://fleet:8090/sessions/${SID}/screenshot
$ file shot.png
shot.png: PNG image data, 1280 x 720, 8-bit/color RGB, non-interlaced

The image above is the byte-identical output of that last curl.

What works today (v0.1.6, 451/451 tests green on real hardware)

  • ./setup.sh from a fresh clone → fleet API healthy on :8090 in ~2 min
  • POST /sessions → spawns a per-teammate desktop container (image 0.1.5)
  • Desktop boots: Sway → action-daemon → wayvnc → websockify + noVNC on :8443
  • GET /sessions/{id}/embed → 307 to a live, iframe-able noVNC stream (Wayland-native VNC over WebSocket, ~500 ms latency, no plugin)
  • GET /sessions/{id}/screenshot → real 1280×720 PNG
  • POST /sessions/{id}/task → dispatches a goal, SSE step stream; per-session mutex — a second task on the same session returns 409
  • 12 action verbs work end-to-end through the in-container daemon: screenshot, mouse_move (absolute), left/right/double click, type (incl. unicode), key (Return, Escape, ctrl+a, alt+Tab, …), scroll up+down, wait
  • GET /sessions/{id}/audit → every action + every cost increment, JSONL
  • Per-session /home/operator survives restart; registry survives fleet restart
  • Daily-cost cap (MAX_USD_PER_DAY), enforced atomically across concurrent submits
  • 451/451 tests pass: 306 unit (sub-3s, no docker) + 145 integration (against live containers — every endpoint, every verb, every error path, multi-session stress, audit integrity, performance benchmarks p50/p95/p99, noVNC websockify handshake, documentation accuracy, snapshot lifecycle, Bearer-auth gating, Holo3 compose-profile structure, install-smoke harness)
  • Operator install verifier: python3 scripts/launch-smoke.py after docker compose up -d walks spawn → screenshot → snapshot → clone → teardown end-to-end. Stdlib-only, exits 0/1, slots into CI.

Roadmap (and what's tracked vs done)

See TRACKING.md for the full shipping log — every known limitation has a reproducer + acceptance + GitHub issue link. v0.1.6 closed all 8 v0.2 deliverables (S1-S8). Status at a glance:

  • ✅ Fleet-API auth (S1, PR #13) — opt-in ATELIER_API_TOKEN, constant- time comparison, anti-leak header-only design
  • ✅ TLS in compose (S2, PR #14) — opt-in with-tls profile, Caddy auto-Let's-Encrypt, SSE-flush + X-Forwarded-* preserved
  • ✅ Per-session CPU/memory quotas (S3, PR #11) — --memory 4g --cpus 2.0 defaults via env
  • ✅ Snapshot + clone session-state API (S4, PR #15) — POST /sessions/{id}/snapshot + POST /sessions {snapshot_id: ...} clones source's image + home tar into a new session. Path-traversal safe.
  • ✅ Live budget visible in the desktop (S5, PR #12) — waybar shows $0.42 / $2.00; falls back to last-known with (stale) on fleet hiccup
  • ✅ Bundled vLLM Holo3 sidecar (S6, PR #16) — --profile holo3 up spawns vLLM with safe 24 GB GPU defaults; weights cached in named volume
  • ✅ Action-daemon protocol versioning (S7, PR #11) — {"v": 1, ...} on every wire-level request
  • ✅ Boot-wait race (S8, PR #10) — ?wait=ready query param blocks until the desktop healthcheck is green
  • ⏭ Real WebRTC (vs WebSocket VNC) — out of scope for atelier-os; future separate project (the v0.1.6 wayvnc + websockify + noVNC stack is iframe-embeddable today, ~500 ms latency)

"In the market of Iron you need to sell Gold." Atelier OS is the Gold: every piece is standard upstream Linux but the curation, theming, streaming, fleet, audit, and chat-embed are assembled into one experience nobody else ships.

Quick start

git clone https://github.com/karany97/atelier-os.git
cd atelier-os
./setup.sh           # one prompt for a session password, then ~2 min

That's the whole install. The script copies .env.example.env, prompts for a session password (or generates one), checks FLEET_PORT isn't taken (common collision: multica also wants 8090), builds both images (docker compose build fleet desktop-template), starts the fleet, and waits for /health.

Verify your install

# stdlib-only, no pip install. Walks /health → /budget → spawn session
# (with ?wait=ready) → screenshot → snapshot → clone-from-snapshot →
# teardown. Reports PASS/FAIL per check; exits 0 if all green.
python3 scripts/launch-smoke.py

Operators behind bearer auth pass the token:

FLEET_URL=http://localhost:8090 \
ATELIER_API_TOKEN=$(grep ATELIER_API_TOKEN .env | cut -d= -f2) \
python3 scripts/launch-smoke.py

Manual override paths:

# Per-session embed (iframe-able noVNC live stream)
$ curl -X POST http://localhost:8090/sessions \
  -H 'content-type: application/json' \
  -d '{"label":"alice"}'
{"id":"sess_...","embed_url":"http://localhost:8090/sessions/sess_.../embed", ...}

$ curl -I http://localhost:8090/sessions/sess_.../embed
HTTP/1.1 307 Temporary Redirect
Location: http://localhost:7100/    # → noVNC client served on the session's port

# Per-session screenshot (PNG bytes through the action-daemon)
$ curl -o shot.png http://localhost:8090/sessions/sess_.../screenshot
$ file shot.png
shot.png: PNG image data, 1280 x 720, 8-bit/color RGB

# Dispatch a goal (returns 202; tail the SSE stream for live steps)
$ curl -X POST http://localhost:8090/sessions/sess_.../task \
  -H 'content-type: application/json' \
  -d '{"goal":"open epiphany and search nandai jewellery"}'
{"task_id":"task_...","status":"running","stream_url":".../stream",...}

To spin up a second employee's desktop:

curl -X POST http://localhost:8090/sessions \
  -H 'content-type: application/json' \
  -d '{"label":"alice","theme":"terracotta"}'
# → 201 + {"session_id":"...", "embed_url":"...", "vnc_url":"..."}

To embed in any chat / dashboard:

<iframe src="https://atelier-os.example.com/sessions/abc123/embed"
        allow="clipboard-read; clipboard-write; fullscreen"
        sandbox="allow-scripts allow-same-origin allow-forms allow-popups"
        referrerpolicy="no-referrer"></iframe>

Behind TLS (issue #2 — opt-in Caddy profile)

The default docker compose up runs the fleet on plaintext HTTP at $FLEET_PORT (assume private network). To put Caddy in front for TLS termination + automatic Let's Encrypt cert:

# 1. Copy + customize the Caddyfile template (or just set the env vars below)
cp compose/Caddyfile.example compose/Caddyfile

# 2. Tell Caddy your domain + Let's Encrypt contact email
cat >> .env <<'EOF'
ATELIER_TLS_DOMAIN=atelier-os.example.com
ATELIER_TLS_EMAIL=ops@example.com
EOF

# 3. Bring up the with-tls profile (caddy alongside the fleet)
docker compose --profile with-tls up -d

Caddy auto-provisions a cert via HTTP-01 challenge on first request, so :80 + :443 must be reachable from the public internet and your domain's A/AAAA record must point at this host. ACME state persists in a named volume across docker compose down.

Caveat — per-session noVNC ports. The /sessions/{id}/embed endpoint redirects to http://host:7XXX/ (the per-session port the desktop container exposes). Caddy in this template does NOT TLS- terminate those ports — they stay plain HTTP. If your iframe loads from an HTTPS atelier, browsers will block the HTTP redirect target (mixed-content). For v0.1.6 most operators keep those per-session ports on a private network and embed from a same-network atelier instance. v0.2 will ship a handle_path /v/{port}/* Caddy pattern that dynamically proxies the per-session ports through the same TLS endpoint.

With local Holo3 vision (issue #6 — opt-in holo3 compose profile)

The default MODEL_BACKEND=anthropic calls Anthropic Computer Use (real $0.05–$0.40/task). For fully-local, free, open-weights vision the holo3 profile spins up a vLLM sidecar serving Holo3-35B-A3B:

# 1. NVIDIA Container Toolkit installed on the host (24 GB GPU minimum)
# 2. Switch the fleet to local Holo3 in .env
sed -i 's/^MODEL_BACKEND=anthropic/MODEL_BACKEND=holo3/' .env

# 3. Bring up the fleet + the Holo3 sidecar
docker compose --profile holo3 up -d

First boot downloads ~70 GB of Holo3 weights from HuggingFace (cached to the holo3-models named volume; survives docker compose down). Steady-state boot: ~30 s. The fleet auto-discovers HOLO3_ENDPOINT=http://holo3:8000/v1 over the compose network.

Holo3-35B-A3B is a MoE with ~3 B active params — single-card inference on RTX 3090 / 4090 / A6000 / A100. OSWorld-Verified 77.8% (the #1 OSS computer-use model, beats Anthropic Opus 4.6 at 1/10 the cost).

With Bearer auth (issue #1 — opt-in token)

The fleet API is open by default (assume private network). To require Authorization: Bearer <token> on every state-touching endpoint:

echo "ATELIER_API_TOKEN=$(openssl rand -hex 32)" >> .env
docker compose restart fleet

/health, /sessions/{id}/embed, and /budget stay open by design (load-bearing for healthchecks, iframe navigation, and the in-container budget poller respectively — see comments in driver/src/main.py). Constant-time comparison via hmac.compare_digest. Query-param tokens are rejected (anti-leak).

The architecture in one diagram

   ┌──────────────────────────┐
   │  Destiny Atelier (chat)  │     ──── any chat surface, really
   └──────────┬───────────────┘
              │  iframe / SSE
              ▼
   ┌──────────────────────────────────────────────────────────┐
   │  Atelier OS fleet API (FastAPI, this repo)               │
   │  · POST   /sessions             create new desktop       │
   │  · GET    /sessions/{id}/embed  iframe-able WebRTC URL   │
   │  · POST   /sessions/{id}/task   dispatch a goal          │
   │  · GET    /sessions/{id}/audit  per-turn audit log       │
   │  · DELETE /sessions/{id}        teardown + persist home  │
   └──────────┬───────────────────────────────────────────────┘
              │  docker exec
              ▼
   ┌──────────────────────────────────────────────────────────┐
   │  Per-employee desktop container (one each)               │
   │   ┌────────────────────────────────────────────────────┐ │
   │   │  Sway (Wayland kiosk compositor)                  │ │
   │   │  + waybar status bar (Atelier branding)           │ │
   │   │  + Firefox + xterm + Files + your apps            │ │
   │   │  + GTK4/Qt6 Atelier theme (terracotta default)    │ │
   │   └────────────────────────────────────────────────────┘ │
   │   ┌────────────────────────────────────────────────────┐ │
   │   │  wayvnc + websockify + noVNC (HTTP→WS→VNC on :8443)│ │
   │   │  · Wayland-native VNC server (sway's sibling project)│
   │   │  · ~500 ms latency over WebSocket, no plugin       │ │
   │   │  · iframe-embeddable, fullscreen-capable           │ │
   │   └────────────────────────────────────────────────────┘ │
   │   ┌────────────────────────────────────────────────────┐ │
   │   │  Action daemon (grim + wtype + ydotool)           │ │
   │   │  · screenshot via grim (Wayland-native, lossless) │ │
   │   │  · type via wtype, key via ydotool                │ │
   │   │  · ~1 ms per action (Unix socket, not docker exec)│ │
   │   └────────────────────────────────────────────────────┘ │
   └──────────────────────────────────────────────────────────┘
              │
              ▼ (model calls)
   ┌────────────────────────┐    ┌────────────────────────┐
   │  Holo3-35B-A3B local   │ OR │  Anthropic Computer    │
   │  (Apache 2.0, vLLM)    │    │  Use API (paid escape) │
   │  OSWorld-Verified 77.8%│    │  Sonnet 4.5 default    │
   └────────────────────────┘    └────────────────────────┘

Comparison

UI-TARS-desktop Bytebot (archived) Anthropic Cowork Atelier OS
Form factor Electron app Docker (dead since Mar 2026) macOS/Windows app docker compose up
Per-employee fleet ✅ N sessions
Iframe-embeddable ✅ noVNC/wayvnc
Persistent home ❌ (releases on navigate)
Open weights ✅ (UI-TARS 1.5-7B) ✅ (Holo3 default)
Wayland-native screen capture ❌ (X11) ❌ (X11) n/a ✅ (grim)
Live video stream ✅ (WebRTC) ✅ (VNC) n/a ✅ (wayvnc, WS)
Screenshot-on-demand via API n/a ✅ (proven, PNG)
Per-session task mutex n/a n/a n/a
Test count unknown unknown proprietary 451 unit+integration
Audit log per turn ✅ JSONL
Self-host floor local app on your laptop Docker on Linux macOS only Docker on any Linux
License Apache 2.0 Apache 2.0 proprietary MIT
Cost per task model-only model-only $0.20-$2.00 model-only ($0 with local Holo3, $0.05-$0.40 with Sonnet 4.5)

Why Holo3-35B-A3B (the model that ships in v0.1)

Holo3-35B-A3B (HCompany, 2026) is currently the #1 open-source computer-use model on the OSWorld- Verified benchmark at 77.8% — ahead of OpenCUA-72B (45%) and Simular Agent-S2 (34.5%), and even beats Anthropic Opus 4.6 on the same benchmark at roughly 1/10 the per-token cost.

It's a 35B-parameter MoE with 3B active params — fits a single RTX 3090 at 40-60 t/s via vLLM. Apache 2.0 weights. We ship a vLLM systemd unit that boots it on demand.

Operators who don't have a GPU can point the driver at the Anthropic Computer Use API instead (MODEL_BACKEND=anthropic). Either way, the desktop, the fleet, the audit, the embed — all stay the same.

What's in v0.1.6 (the launch-sprint release)

  • ✅ Repo layout (MIT, README, threat model, architecture doc, CHANGELOG)
  • compose/docker-compose.yml — fleet API + N session containers, opt-in with-tls (Caddy) + holo3 (vLLM sidecar) profiles
  • desktop/Dockerfile — Ubuntu 24.04 + Sway + Atelier theme, builds atelier-os/desktop:0.1.5 cleanly
  • desktop/sway-config + desktop/waybar-config.json — keyboard kiosk + live-budget status bar
  • desktop/atelier-budget-poller.py — in-container fleet /budget poller, atomic write to /tmp/atelier-budget
  • driver/src/main.py — FastAPI fleet API (sessions, embed, task, audit, budget, snapshot lifecycle, optional Bearer-token gating)
  • driver/src/session.py — per-session container lifecycle, quotas, snapshot_id-driven clone path
  • driver/src/snapshot.py — multi-session snapshot module (image + home tar, path-traversal-safe extract, most-recent delete safeguard)
  • driver/src/action.py — Wayland-native action dispatch
  • driver/src/model.py — Holo3 + Anthropic backends, swappable
  • driver/src/audit.py — append-only JSONL audit log
  • tests/451 tests (306 unit + 145 integration), all green
  • scripts/launch-smoke.py — operator install verifier (spawn → snapshot → clone → teardown, exits 0/1)
  • docs/threat-model.md + docs/architecture.md
  • TRACKING.md — every known limitation has a closed PR + writeup
  • .github/workflows/ci.yml — staged; awaits one-time OAuth scope refresh on the maintainer side (issue #9)

What's coming (v0.2 — June 2026)

  • Hot-swap themes per-session (each employee picks their palette)
  • File-drop from chat to session home
  • Real WebRTC streaming pipeline (vs current wayvnc + WebSocket) for the per-session embed
  • Multi-cluster: federate sessions across N atelier-os hosts
  • One-USB-boot ISO ("Atelier OS Live") for kiosk deployments

Companion repos

Repo What
karany97/nandai-atelier The chat surface that embeds Atelier OS sessions
karany97/destiny-computer The v0.1 single-desktop pattern Atelier OS evolved from
karany97/atelier-os This repo

License

MIT. See LICENSE.

Security

If you find a vulnerability, please email security@destiny.computer instead of opening a public issue. PGP key in repo root. 72-hour acknowledgement, 14-day public disclosure.

About

Multi-session AI desktop fleet. Sway+Wayland+wayvnc+noVNC. Per-employee container, iframe-embeddable, action-daemon for fast clicks. 259/259 tests. MIT.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors