slaveX

Talk to a humanoid in a 3D apartment — and it does literally anything you say. On your screen, in VR, or standing in your real room in mixed reality.

Voice or text in. A reasoning LLM brain turns it into real, physical motion out. No hardcoded commands — he generalizes. Bring your own LLM key; the brain is open source and runs right in your browser.

What is this?

SlaveX is a browser game where a fully-articulated humanoid stands in a furnished 3D apartment and does whatever you tell him. You speak (Web Speech API in Chrome) or type a command; it's sent with a compact snapshot of the body's current state to an open-source "brain" — **slave-agent**, a provider-agnostic, bring-your- own-key LLM client that runs a reasoning model of your choice (OpenAI, Anthropic, OpenRouter, Groq, Ollama, or any OpenAI-compatible endpoint). It runs directly in your browser — including inside a Quest 3 headset — or via the bundled Node server.

The brain composes from 951 reusable motion skills and emits a single strict JSON Action (per-joint rotations, keyframed sequences, locomotion, posture). The frontend's PoseEngine applies it with real physics — quaternion interpolation, a procedural walk cycle, wall/furniture collision, doorway pathfinding, and grounded postures. There are no scripted commands: ask for something new and he infers anatomically-plausible motion on the fly.

And it's not just a window: put on a Meta Quest 3 and step inside the apartment in immersive VR, or flip to passthrough mixed reality and place him in your real living room — then command him with a wrist panel, the headset keyboard, or your voice.

✨ Features

🎙️ Voice & text control — speak in Chrome via the Web Speech API, or type in the command box (always available, any browser).
🧠 An actual, open-source LLM brain — slave-agent reasons over your command + the live body state and returns a structured Action. No proprietary CLI; it's plain, portable JavaScript that runs in the browser and Node.
🔑 Bring your own key — any LLM — pick OpenAI, Anthropic, OpenRouter, Groq, Ollama (local), or a custom OpenAI-compatible endpoint, paste your key, and go. Your key stays in your browser; swap providers/models from a settings screen.
🥽 VR & Mixed Reality on Quest 3 (WebXR) — enter immersive VR (you, inside the apartment at 1:1 scale) or passthrough MR (only the man, placed in your real room — aim + trigger to set him down). In-headset wrist command panel, keyboard, and voice. Runs in the Quest browser / as a PWA — no native app.
🤖 951 composable agent-skills — atomic primitives + composites in the agentskills.io format; a progressive-disclosure loader keeps every prompt bounded (~22k chars) even at ~950 skills.
🧍 Two switchable bodies — a precise procedural mannequin (the rig is its skeleton → exact finger-level control) and a photoreal Ready Player Me human (retargeted). Toggle in-game.
🦴 51-joint rig — full spine, limbs, and individual finger joints (5 fingers × 3 segments per hand) for real gestures: point, peace sign, thumbs-up, count.
🌍 Real physics — quaternion-slerp motion, a distance-driven walk cycle, collision (can't pass through walls or furniture) + A doorway pathfinding, and grounded postures (stand, stand-on-a-table, sit, sit-on-floor, kneel, crouch, lie back/front/side) — the body always rests on its support, never floating or clipping.
💾 State memory — exact current pose + posture + position + recent command history, persisted to localStorage. Powers relative, incremental, undo, and repeat commands ("raise it higher", "now the other arm", "do that again").
🎨 Modern Three.js graphics — PBR materials, image-based lighting, and post-processing in a furnished open-plan apartment.

📸 Screenshots


The apartment	The mannequin
The apartment — open-plan living, kitchen, dining, bedroom & bath.	The mannequin — built straight from the rig; exact by construction.
The human	Finger-level control
The human — a photoreal Ready Player Me avatar, same poses retargeted.	Finger-level control — "point your left index finger down".

🚀 Quickstart

Prerequisites

Node ≥ 20 (ES modules).
Google Chrome for voice input (any browser works for typed commands).
An LLM API key for a provider — OpenAI, Anthropic, OpenRouter, Groq, a local Ollama (no key), or any custom OpenAI-compatible endpoint. (No cursor-agent, no install — you paste the key at startup.) Don't have one handy? Play offline runs built-in deterministic motions.

Run it (dev)

npm install
npm run dev          # Vite → http://localhost:5173   +   API → http://localhost:8787

Open http://localhost:5173. On first launch a settings screen appears — pick a provider + model, paste your key, and hit Save & Play (or Play offline). Then tap the mic (or type) and try wave.

💡 npm run dev automatically bundles the 951 skills into public/skills.json (the predev hook) so the browser brain has its full motion vocabulary.

⚡ Latency is whatever model you choose — a small/fast model (e.g. gpt-4o-mini, llama-3.1-8b-instant, or a local Ollama model) typically replies in ~1–4 s.

Run it (production build)

npm run build
npm start            # everything on http://localhost:8787

Health check: curl http://localhost:8787/api/health → {"ok":true}

⚙️ Bring your own key (providers)

The brain is yours. On first run (or anytime via the gear ⚙ button, bottom-left) the settings screen lets you:

Pick a provider + model — presets for OpenAI, Anthropic (Claude), OpenRouter, Groq, Ollama (local), and Custom, each with suggested models (or type your own model id).
Paste your API key — show/hide toggle, with a link to where to get one. Keyless providers (Ollama / a local custom endpoint) don't need one.
Test connection — a live ping so you know it works before you play.
Advanced — a custom Base URL, or "Route through the local server" (the proxy — PC only, avoids browser CORS / keeps your key off the page).
Play offline — skip the key entirely; the man uses built-in motions (wave, sit, walk, basic poses), fully on-device.

🔒 Your key stays in your browser (localStorage) and is sent only to the provider you choose. It's never logged, never put in a URL, and error messages are scrubbed of anything key-shaped.

🌐 Browser CORS: some providers (notably OpenAI) block direct browser calls. Anthropic, OpenRouter, Groq, and Ollama/custom generally work direct (great for the headset). On a PC, enable the proxy in Advanced (or set the server env keys below) for any provider. See [docs/brain.md](docs/brain.md#providers).

🥽 VR & Mixed Reality (Quest 3)

slavex runs in WebXR, so the same app drops you inside the scene on a Meta Quest 3 (and other WebXR headsets) — no native app, just the Quest browser or an installed PWA.

1. Serve it to the headset over HTTPS. WebXR requires a secure context (https:// or localhost), and your PC's localhost isn't reachable from the headset — so expose the dev server over HTTPS on your network. Easiest is a tunnel:

npm run dev                                   # SPA :5173 (proxies /api → :8787)
cloudflared tunnel --url http://localhost:5173   # or: npx localtunnel --port 5173 / ngrok http 5173

Open the printed https://… URL in the Quest browser. (Or serve the LAN with a cert: npm run dev -- --host --https, then open https://<your-PC-IP>:5173.)

2. Tap Enter VR or Enter MR. The buttons appear only when the headset supports each mode.

VR — stand inside the virtual apartment at 1:1 scale, with controller lasers and comfort teleport + snap-turn locomotion.
MR — passthrough hides the apartment and shows only the man in your real room. Aim a controller at the floor and pull the trigger to place him where you want (he turns to face you, anchored in place).

3. Command him in-headset. A wrist command panel (left controller) has quick commands + push-to-talk Speak + Type + a VR↔MR Switch + Exit. Type opens the Quest system keyboard (MR) or an in-VR keyboard (VR) for free text — all routed through the same brain as voice/text.

📖 Full headset guide + the on-device test checklist: [docs/vr-mr.md](docs/vr-mr.md).

🗣️ Try saying…

There are no fixed commands — these are just a taste. See more in [docs/commands.md](docs/commands.md).


`raise your right arm`	`point your left index finger down`
`wave`	`give me a thumbs up`
`sit on the couch`	`walk to the kitchen and wave`
`stand on the table`	`lie down on the floor`
`do a squat`	`moonwalk`
`dance`	`turn around`

Then build on it: raise it higher · now the other arm · do that again · undo — the body remembers its state.

🧠 How it works

Every command flows through one loop: command → brain → Action → engine → state → (next command).

flowchart LR
    A["🎙️ Voice / ⌨️ Text / 🥽 VR command"] --> B["slave-agent<br/>think(text, state, settings)"]
    B --> C{{"🧠 Your LLM (BYO-key)<br/>OpenAI · Claude · Groq · Ollama · …"}}
    C -->|"composes 951 skills"| D["Strict JSON Action<br/>joints · sequence · moveTo · posture"]
    B -.->|"no key / offline / error"| E["Deterministic fallback"]
    E --> D
    D --> F["⚙️ PoseEngine<br/>slerp · walk cycle · collision · A* · grounding"]
    F --> G["🧍 Character<br/>Mannequin or Human"]
    G --> H[("💾 State memory<br/>localStorage")]
    H -->|"feeds the next command"| B

The frontend calls think(text, state, settings) — state is a compact summary of the body's pose, posture, position, and recent history; settings is your BYO-key config.
src/agent/prompt.js builds the prompt: the rotation convention, exact joint names, the Action schema, worked examples, and the most relevant motion skills (progressive disclosure).
src/agent/agent.js calls your LLM (directly in the browser, or via the Node proxy if you opt in), extracts the JSON Action, and normalizes it (permissive — it never throws). With no key or on any error, a deterministic fallback keeps the pipeline alive ((brain offline)).
The frontend PoseEngine interpolates the body toward the Action — quaternion-slerp poses, a procedural walk cycle, collision + A pathfinding, posture grounding, look-at, and subtle idle life.

📖 Full design write-up: [ARCHITECTURE.md](ARCHITECTURE.md) · the brain in depth: [docs/brain.md](docs/brain.md).

🤖 The skill system

The brain's motion vocabulary lives in .cursor/skills/ as 951 skills following the agentskills.io standard. Each skill is a SKILL.md documenting the exact joints and calibrated example angles for one motion. A prebuild ([scripts/build-skills.js](scripts/build-skills.js)) bundles them into public/skills.json so the browser brain has the full library; the Node server reads them straight from disk.

Atomic skills — single-responsibility primitives (move-arm, move-leg, move-hand-and-fingers, walk, turn-and-face, sit-and-stand, balance-and-posture).
Composite skills — higher-level actions (dances, sports, exercises, whole-body gestures) that declare the atomics they're built from via subskills:.
Progressive disclosure — every prompt shows a bounded catalog of all skills, but injects the full body of only the few most relevant to your command (plus their building blocks). This keeps the prompt small and fast no matter how big the library grows.

The brain composes these to do anything — point up becomes move-arm + move-hand-and-fingers. Add your own by dropping a folder in .cursor/skills/.

📖 Deep dive: [docs/skills.md](docs/skills.md).

🧍 Two bodies

Both bodies sit behind a single CharacterManager and are driven by identical canonical poses, so switching is instant and seamless.

Mannequin (default) — generated directly from src/character/rig.js (51 joints incl. fingers). The rig is its skeleton, so every joint is exact by construction. The precision reference.
Realistic human — a photoreal, fully-clothed Ready Player Me GLB, mapped onto the canonical rig via rotation-only retargeting (auto upright/facing alignment, scaled to ~1.8 m with feet on the floor). Only offered once it has loaded and is verified to have fingers + legs.

Bring your own avatar — bone names are auto-detected (Mixamo, Ready Player Me, many Blender/UE exports), so you can drop in any rigged GLB with finger + leg bones:

CHARACTER_GLB=/models/my-avatar.glb npm run dev

📖 Rig & retargeting details: [docs/rig.md](docs/rig.md).

⚙️ Configuration

The brain is normally configured in the browser (BYO-key settings screen) — no env vars required. The variables below are optional, for the server and avatars.

Variable	Default	What it does
`PORT`	`8787`	Port for the Express API / production server.
`LLM_PROVIDER`	`openai`	(Server proxy) provider id when routing commands through the server.
`LLM_MODEL`	(none)	(Server proxy) model id.
`LLM_API_KEY`	(none)	(Server proxy) API key held server-side — keeps it off the browser entirely.
`LLM_BASE_URL`	(provider preset)	(Server proxy) override the provider base URL (custom/self-hosted/Ollama on another host).
`CHARACTER_GLB`	(bundled avatar)	Path or URL to a custom rigged GLB for the realistic human.
`BRAIN_SKILLS_NOCACHE`	(off)	Reload skills from disk on every request (Node) while authoring new ones.

🧩 Server-side key example (browser never sees the key): start with LLM_PROVIDER=openai LLM_MODEL=gpt-4o-mini LLM_API_KEY=sk-… npm start, then enable Advanced → Route through the local server in the settings screen.

🛠️ Troubleshooting

Mic does nothing / "mic error" — voice needs Chrome + microphone permission on a secure context (localhost is fine). Click the mic and allow access, or just type. Other browsers show "voice not supported — use the text box".
Replies say (brain offline) — there's no API key, or the LLM call failed, so the deterministic fallback handled it. Open the gear (⚙) to add/fix your key, or switch providers.
"Network/CORS blocked" when saving/testing a key — your provider blocks direct browser calls (common with OpenAI). On a PC, enable Advanced → Route through the local server; on a headset, use a browser-callable provider (Anthropic / OpenRouter / Groq / Ollama) or Play offline.
No "Enter VR/MR" buttons — the page isn't on HTTPS/localhost, or the headset doesn't report support. Open the https://… URL in the Quest browser. See [docs/vr-mr.md](docs/vr-mr.md).
"Realistic human" toggle is disabled — the GLB failed to load or lacks finger/leg bones; the app runs mannequin-only. Check the console for the [CharacterHuman] skeleton report.

🗺️ Roadmap

Chained, multi-step commands ("walk to the kitchen, then sit down")
Streaming partial Actions for faster perceived response
Object interaction — pick up, carry, and use props
Hand-tracking input in VR/MR (no controllers)
An avatar gallery + drag-and-drop "bring your own" GLB
Shareable replays of command sequences
Touch / mobile controls
A growing, community-contributed skill library

🤝 Contributing

Contributions are welcome — new skills, bodies, providers, bug fixes, and docs. Adding a motion skill is as easy as dropping a SKILL.md into .cursor/skills/. See [CONTRIBUTING.md](CONTRIBUTING.md) to get started.

📄 License

⭐ If you like it, star it

If slavex made you smile, a star helps other people find it — and motivates the roadmap above. Drop a ⭐ here.

🙏 Acknowledgements

Three.js — the 3D engine behind the rendering, rig, scene, and WebXR.
Ready Player Me — the photoreal humanoid avatar.
agentskills.io — the skill format the motion library follows.
The open LLM ecosystem — OpenAI-compatible providers (OpenAI, Anthropic, OpenRouter, Groq, Ollama, and friends) that any slave-agent user can plug in with their own key.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.cursor/skills		.cursor/skills
.github		.github
docs		docs
public/models		public/models
scripts		scripts
server		server
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB-SETUP.md		GITHUB-SETUP.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

slaveX

Talk to a humanoid in a 3D apartment — and it does literally anything you say. On your screen, in VR, or standing in your real room in mixed reality.

What is this?

✨ Features

📸 Screenshots

🚀 Quickstart

⚙️ Bring your own key (providers)

🥽 VR & Mixed Reality (Quest 3)

🗣️ Try saying…

🧠 How it works

🤖 The skill system

🧍 Two bodies

⚙️ Configuration

🛠️ Troubleshooting

🗺️ Roadmap

🤝 Contributing

📄 License

⭐ If you like it, star it

🙏 Acknowledgements

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

slaveX

Talk to a humanoid in a 3D apartment — and it does literally anything you say. On your screen, in VR, or standing in your real room in mixed reality.

What is this?

✨ Features

📸 Screenshots

🚀 Quickstart

⚙️ Bring your own key (providers)

🥽 VR & Mixed Reality (Quest 3)

🗣️ Try saying…

🧠 How it works

🤖 The skill system

🧍 Two bodies

⚙️ Configuration

🛠️ Troubleshooting

🗺️ Roadmap

🤝 Contributing

📄 License

⭐ If you like it, star it

🙏 Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages