Talk to a humanoid in a 3D apartment — and it does literally anything you say. On your screen, in VR, or standing in your real room in mixed reality.
Voice or text in. A reasoning LLM brain turns it into real, physical motion out. No hardcoded commands — he generalizes. Bring your own LLM key; the brain is open source and runs right in your browser.
SlaveX is a browser game where a fully-articulated humanoid stands in a furnished
3D apartment and does whatever you tell him. You speak (Web Speech API in Chrome)
or type a command; it's sent with a compact snapshot of the body's current state
to an open-source "brain" — **slave-agent**, a provider-agnostic, bring-your-
own-key LLM client that runs a reasoning model of your choice (OpenAI, Anthropic,
OpenRouter, Groq, Ollama, or any OpenAI-compatible endpoint). It runs directly in
your browser — including inside a Quest 3 headset — or via the bundled Node server.
The brain composes from 951 reusable motion skills and emits a single strict JSON
Action (per-joint rotations, keyframed sequences, locomotion, posture). The
frontend's PoseEngine applies it with real physics — quaternion interpolation, a
procedural walk cycle, wall/furniture collision, doorway pathfinding, and grounded
postures. There are no scripted commands: ask for something new and he infers
anatomically-plausible motion on the fly.
And it's not just a window: put on a Meta Quest 3 and step inside the apartment in immersive VR, or flip to passthrough mixed reality and place him in your real living room — then command him with a wrist panel, the headset keyboard, or your voice.
- 🎙️ Voice & text control — speak in Chrome via the Web Speech API, or type in the command box (always available, any browser).
- 🧠 An actual, open-source LLM brain —
slave-agentreasons over your command + the live body state and returns a structured Action. No proprietary CLI; it's plain, portable JavaScript that runs in the browser and Node. - 🔑 Bring your own key — any LLM — pick OpenAI, Anthropic, OpenRouter, Groq, Ollama (local), or a custom OpenAI-compatible endpoint, paste your key, and go. Your key stays in your browser; swap providers/models from a settings screen.
- 🥽 VR & Mixed Reality on Quest 3 (WebXR) — enter immersive VR (you, inside the apartment at 1:1 scale) or passthrough MR (only the man, placed in your real room — aim + trigger to set him down). In-headset wrist command panel, keyboard, and voice. Runs in the Quest browser / as a PWA — no native app.
- 🤖 951 composable agent-skills — atomic primitives + composites in the agentskills.io format; a progressive-disclosure loader keeps every prompt bounded (~22k chars) even at ~950 skills.
- 🧍 Two switchable bodies — a precise procedural mannequin (the rig is its skeleton → exact finger-level control) and a photoreal Ready Player Me human (retargeted). Toggle in-game.
- 🦴 51-joint rig — full spine, limbs, and individual finger joints (5 fingers × 3 segments per hand) for real gestures: point, peace sign, thumbs-up, count.
- 🌍 Real physics — quaternion-slerp motion, a distance-driven walk cycle, collision (can't pass through walls or furniture) + A doorway pathfinding, and grounded postures (stand, stand-on-a-table, sit, sit-on-floor, kneel, crouch, lie back/front/side) — the body always rests on its support, never floating or clipping.
- 💾 State memory — exact current pose + posture + position + recent command
history, persisted to
localStorage. Powers relative, incremental, undo, and repeat commands ("raise it higher", "now the other arm", "do that again"). - 🎨 Modern Three.js graphics — PBR materials, image-based lighting, and post-processing in a furnished open-plan apartment.
| The apartment | The mannequin |
| The apartment — open-plan living, kitchen, dining, bedroom & bath. | The mannequin — built straight from the rig; exact by construction. |
| The human | Finger-level control |
| The human — a photoreal Ready Player Me avatar, same poses retargeted. | Finger-level control — "point your left index finger down". |
Prerequisites
- Node ≥ 20 (ES modules).
- Google Chrome for voice input (any browser works for typed commands).
- An LLM API key for a provider — OpenAI, Anthropic, OpenRouter, Groq, a
local Ollama (no key), or any custom OpenAI-compatible endpoint. (No
cursor-agent, no install — you paste the key at startup.) Don't have one handy? Play offline runs built-in deterministic motions.
Run it (dev)
npm install
npm run dev # Vite → http://localhost:5173 + API → http://localhost:8787Open http://localhost:5173. On first launch a settings
screen appears — pick a provider + model, paste your key, and hit Save & Play
(or Play offline). Then tap the mic (or type) and try wave.
💡
npm run devautomatically bundles the 951 skills intopublic/skills.json(thepredevhook) so the browser brain has its full motion vocabulary.
⚡ Latency is whatever model you choose — a small/fast model (e.g.
gpt-4o-mini,llama-3.1-8b-instant, or a local Ollama model) typically replies in ~1–4 s.
Run it (production build)
npm run build
npm start # everything on http://localhost:8787Health check: curl http://localhost:8787/api/health → {"ok":true}
The brain is yours. On first run (or anytime via the gear ⚙ button, bottom-left) the settings screen lets you:
- Pick a provider + model — presets for OpenAI, Anthropic (Claude), OpenRouter, Groq, Ollama (local), and Custom, each with suggested models (or type your own model id).
- Paste your API key — show/hide toggle, with a link to where to get one. Keyless providers (Ollama / a local custom endpoint) don't need one.
- Test connection — a live ping so you know it works before you play.
- Advanced — a custom Base URL, or "Route through the local server" (the proxy — PC only, avoids browser CORS / keeps your key off the page).
- Play offline — skip the key entirely; the man uses built-in motions (wave, sit, walk, basic poses), fully on-device.
🔒 Your key stays in your browser (
localStorage) and is sent only to the provider you choose. It's never logged, never put in a URL, and error messages are scrubbed of anything key-shaped.
🌐 Browser CORS: some providers (notably OpenAI) block direct browser calls. Anthropic, OpenRouter, Groq, and Ollama/custom generally work direct (great for the headset). On a PC, enable the proxy in Advanced (or set the server env keys below) for any provider. See
[docs/brain.md](docs/brain.md#providers).
slavex runs in WebXR, so the same app drops you inside the scene on a Meta Quest 3 (and other WebXR headsets) — no native app, just the Quest browser or an installed PWA.
1. Serve it to the headset over HTTPS. WebXR requires a secure context
(https:// or localhost), and your PC's localhost isn't reachable from the
headset — so expose the dev server over HTTPS on your network. Easiest is a
tunnel:
npm run dev # SPA :5173 (proxies /api → :8787)
cloudflared tunnel --url http://localhost:5173 # or: npx localtunnel --port 5173 / ngrok http 5173Open the printed https://… URL in the Quest browser. (Or serve the LAN with a
cert: npm run dev -- --host --https, then open https://<your-PC-IP>:5173.)
2. Tap Enter VR or Enter MR. The buttons appear only when the headset supports each mode.
- VR — stand inside the virtual apartment at 1:1 scale, with controller lasers and comfort teleport + snap-turn locomotion.
- MR — passthrough hides the apartment and shows only the man in your real room. Aim a controller at the floor and pull the trigger to place him where you want (he turns to face you, anchored in place).
3. Command him in-headset. A wrist command panel (left controller) has quick commands + push-to-talk Speak + Type + a VR↔MR Switch + Exit. Type opens the Quest system keyboard (MR) or an in-VR keyboard (VR) for free text — all routed through the same brain as voice/text.
📖 Full headset guide + the on-device test checklist: [docs/vr-mr.md](docs/vr-mr.md).
There are no fixed commands — these are just a taste. See more in [docs/commands.md](docs/commands.md).
raise your right arm |
point your left index finger down |
wave |
give me a thumbs up |
sit on the couch |
walk to the kitchen and wave |
stand on the table |
lie down on the floor |
do a squat |
moonwalk |
dance |
turn around |
Then build on it: raise it higher · now the other arm · do that again · undo
— the body remembers its state.
Every command flows through one loop: command → brain → Action → engine → state → (next command).
flowchart LR
A["🎙️ Voice / ⌨️ Text / 🥽 VR command"] --> B["slave-agent<br/>think(text, state, settings)"]
B --> C{{"🧠 Your LLM (BYO-key)<br/>OpenAI · Claude · Groq · Ollama · …"}}
C -->|"composes 951 skills"| D["Strict JSON Action<br/>joints · sequence · moveTo · posture"]
B -.->|"no key / offline / error"| E["Deterministic fallback"]
E --> D
D --> F["⚙️ PoseEngine<br/>slerp · walk cycle · collision · A* · grounding"]
F --> G["🧍 Character<br/>Mannequin or Human"]
G --> H[("💾 State memory<br/>localStorage")]
H -->|"feeds the next command"| B
- The frontend calls
think(text, state, settings)—stateis a compact summary of the body's pose, posture, position, and recent history;settingsis your BYO-key config. src/agent/prompt.jsbuilds the prompt: the rotation convention, exact joint names, the Action schema, worked examples, and the most relevant motion skills (progressive disclosure).src/agent/agent.jscalls your LLM (directly in the browser, or via the Node proxy if you opt in), extracts the JSON Action, and normalizes it (permissive — it never throws). With no key or on any error, a deterministic fallback keeps the pipeline alive ((brain offline)).- The frontend
PoseEngineinterpolates the body toward the Action — quaternion-slerp poses, a procedural walk cycle, collision + A pathfinding, posture grounding, look-at, and subtle idle life.
📖 Full design write-up: [ARCHITECTURE.md](ARCHITECTURE.md) · the brain in depth:
[docs/brain.md](docs/brain.md).
The brain's motion vocabulary lives in .cursor/skills/ as 951 skills following
the agentskills.io standard. Each skill is a SKILL.md
documenting the exact joints and calibrated example angles for one motion. A prebuild
([scripts/build-skills.js](scripts/build-skills.js)) bundles them into
public/skills.json so the browser brain has the full library; the Node server
reads them straight from disk.
- Atomic skills — single-responsibility primitives (
move-arm,move-leg,move-hand-and-fingers,walk,turn-and-face,sit-and-stand,balance-and-posture). - Composite skills — higher-level actions (dances, sports, exercises, whole-body
gestures) that declare the atomics they're built from via
subskills:. - Progressive disclosure — every prompt shows a bounded catalog of all skills, but injects the full body of only the few most relevant to your command (plus their building blocks). This keeps the prompt small and fast no matter how big the library grows.
The brain composes these to do anything — point up becomes move-arm +
move-hand-and-fingers. Add your own by dropping a folder in .cursor/skills/.
📖 Deep dive: [docs/skills.md](docs/skills.md).
Both bodies sit behind a single CharacterManager and are driven by identical
canonical poses, so switching is instant and seamless.
- Mannequin (default) — generated directly from
src/character/rig.js(51 joints incl. fingers). The rig is its skeleton, so every joint is exact by construction. The precision reference. - Realistic human — a photoreal, fully-clothed Ready Player Me GLB, mapped onto the canonical rig via rotation-only retargeting (auto upright/facing alignment, scaled to ~1.8 m with feet on the floor). Only offered once it has loaded and is verified to have fingers + legs.
Bring your own avatar — bone names are auto-detected (Mixamo, Ready Player Me, many Blender/UE exports), so you can drop in any rigged GLB with finger + leg bones:
CHARACTER_GLB=/models/my-avatar.glb npm run dev📖 Rig & retargeting details: [docs/rig.md](docs/rig.md).
The brain is normally configured in the browser (BYO-key settings screen) — no env vars required. The variables below are optional, for the server and avatars.
| Variable | Default | What it does |
|---|---|---|
PORT |
8787 |
Port for the Express API / production server. |
LLM_PROVIDER |
openai |
(Server proxy) provider id when routing commands through the server. |
LLM_MODEL |
(none) | (Server proxy) model id. |
LLM_API_KEY |
(none) | (Server proxy) API key held server-side — keeps it off the browser entirely. |
LLM_BASE_URL |
(provider preset) | (Server proxy) override the provider base URL (custom/self-hosted/Ollama on another host). |
CHARACTER_GLB |
(bundled avatar) | Path or URL to a custom rigged GLB for the realistic human. |
BRAIN_SKILLS_NOCACHE |
(off) | Reload skills from disk on every request (Node) while authoring new ones. |
🧩 Server-side key example (browser never sees the key): start with
LLM_PROVIDER=openai LLM_MODEL=gpt-4o-mini LLM_API_KEY=sk-… npm start, then enable Advanced → Route through the local server in the settings screen.
- Mic does nothing / "mic error" — voice needs Chrome + microphone permission
on a secure context (
localhostis fine). Click the mic and allow access, or just type. Other browsers show "voice not supported — use the text box". - Replies say
(brain offline)— there's no API key, or the LLM call failed, so the deterministic fallback handled it. Open the gear (⚙) to add/fix your key, or switch providers. - "Network/CORS blocked" when saving/testing a key — your provider blocks direct browser calls (common with OpenAI). On a PC, enable Advanced → Route through the local server; on a headset, use a browser-callable provider (Anthropic / OpenRouter / Groq / Ollama) or Play offline.
- No "Enter VR/MR" buttons — the page isn't on HTTPS/localhost, or the headset
doesn't report support. Open the
https://…URL in the Quest browser. See[docs/vr-mr.md](docs/vr-mr.md). - "Realistic human" toggle is disabled — the GLB failed to load or lacks
finger/leg bones; the app runs mannequin-only. Check the console for the
[CharacterHuman]skeleton report.
- Chained, multi-step commands ("walk to the kitchen, then sit down")
- Streaming partial Actions for faster perceived response
- Object interaction — pick up, carry, and use props
- Hand-tracking input in VR/MR (no controllers)
- An avatar gallery + drag-and-drop "bring your own" GLB
- Shareable replays of command sequences
- Touch / mobile controls
- A growing, community-contributed skill library
Contributions are welcome — new skills, bodies, providers, bug fixes, and docs. Adding
a motion skill is as easy as dropping a SKILL.md into .cursor/skills/. See
[CONTRIBUTING.md](CONTRIBUTING.md) to get started.
MIT © Sumit Aich
If slavex made you smile, a star helps other people find it — and motivates the roadmap above. Drop a ⭐ here.
- Three.js — the 3D engine behind the rendering, rig, scene, and WebXR.
- Ready Player Me — the photoreal humanoid avatar.
- agentskills.io — the skill format the motion library follows.
- The open LLM ecosystem — OpenAI-compatible providers (OpenAI, Anthropic,
OpenRouter, Groq, Ollama, and friends) that any
slave-agentuser can plug in with their own key.