VoiceClaw

The world's first Voice Coding Agent, powered by Gemini 3.1 Flash Live and Claude Code.

Voice-first AI pair programmer. Talk to your codebase through Gemini Live, with Claude Code doing the heavy lifting.

Watch the demo

You speak naturally, Gemini understands your intent and orchestrates the right tools, Claude reads and writes your code, and a second Gemini narrates what's happening in real time so you're never left in silence.

How It Works

Your Voice ──► Gemini Live API ──► Decides what to do
                                        │
                    ┌───────────────────┘
                    ▼
              Function Call (e.g. code_task, investigate_and_advise)
                    │
                    ▼
             Python Backend ──► Claude Code CLI
                    │                │
                    │          Reads/writes code,
                    │          runs commands
                    │                │
                    ▼                ▼
            Real-time events ──► Browser UI
                    │          (timeline, diffs, status)
                    ▼
           Narration Gemini ──► "Claude is reading the config file..."
                    │            (spoken commentary while Claude works)
                    ▼
            Result back to Gemini ──► Speaks response to you

Voice path is zero-latency — audio streams directly from your browser to Gemini's WebSocket. The Python backend only handles Claude orchestration.

Features

Voice-first — speak naturally in 10 languages, Gemini responds with voice
Real-time narration — a second Gemini session provides live commentary while Claude works, so you're never waiting in silence
Full code agent — Claude Code reads, writes, and runs commands in your project
Unified timeline — filterable event log showing Gemini thinking, tool calls, Claude activity, and file changes
Git checkpoints — automatic snapshots before code changes, with voice-controlled rewind
Session resumption — Gemini and Claude sessions persist across page reloads
Audio visualization — Gemini-inspired wave renderer driven by real mic/speaker energy
Screenshot support — paste or attach images for visual context
Model switching — change Claude model (opus/sonnet/haiku) and reasoning effort by voice
New Chat — clear all context and start a fresh session with one click

Prerequisites

Python 3.11+
Node.js 18+
Claude Code CLI — installed and authenticated (docs)
Gemini API key — free from Google AI Studio

Quick Start

# Clone
git clone https://github.com/YourUsername/VoiceClaw.git
cd VoiceClaw

# Python dependencies
pip install -r requirements.txt

# Environment
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

# Frontend dependencies
cd frontend && npm install && cd ..

# Run
python server.py

Open http://localhost:3333 in your browser, select a project folder, and start talking.

Development Mode

Run frontend and backend separately for hot-reload:

# Terminal 1 — backend
python server.py --port 3333

# Terminal 2 — frontend (proxies API to backend)
cd frontend && npm run dev

Frontend dev server runs at http://localhost:5173.

CLI Options

python server.py [--project /path/to/project] [--port 3333]

Flag	Default	Description
`--project`	None	Pre-select a project directory (skip picker UI)
`--port`	3333	Server port

Available Voice Commands

VoiceClaw exposes these tools to Gemini. You don't need to name them — just speak naturally:

What you say	Tool used	What happens
"What does the auth middleware do?"	`investigate_and_advise`	Claude reads the code and explains (read-only)
"Add a dark mode toggle"	`code_task`	Claude writes the code (asks for confirmation first)
"Read the package.json"	`read_file`	Returns file contents
"Run the tests"	`run_command`	Executes shell command (asks for confirmation)
"What files changed?"	`get_status`	Shows git status and recent activity
"Plan a refactor of the API layer"	`plan_task`	Claude plans without making changes
"Why is login broken?"	`debug_issue`	Claude diagnoses the bug
"Review my changes"	`review_changes`	Code review for quality and security
"Undo the last change"	`rewind`	Restores to a git checkpoint
"Switch to opus"	`set_claude_model`	Changes Claude model or reasoning effort
"Open localhost:3000"	`open_url`	Opens URL in a new browser tab

Architecture

VoiceClaw/
├── server.py                 # FastAPI backend, WebSocket relay, REST API
├── claude_runner.py          # Spawns "claude -p" CLI, parses stream-json
├── function_router.py        # Routes Gemini function calls to Claude
├── checkpoint.py             # Git checkpointing & session persistence
├── context_bridge.py         # Recent Claude activity context
├── gemini_session.py         # Ephemeral token generation
├── stt_service.py            # Audio transcription via Gemini
├── prompts/
│   ├── gemini_system.md      # Main Gemini system prompt
│   └── narration_system.md   # Narrator personality prompt
├── frontend/
│   ├── src/
│   │   ├── main.ts           # App orchestration
│   │   ├── gemini-connection.ts  # Gemini Live WebSocket
│   │   ├── narration-connection.ts  # Commentary session
│   │   ├── audio-manager.ts  # Mic capture & speaker playback
│   │   ├── backend-connection.ts  # Backend WebSocket
│   │   ├── ui.ts             # DOM, transcript, timeline
│   │   ├── wave-renderer.ts  # Audio visualization
│   │   ├── types.ts          # Interfaces & function declarations
│   │   └── debug-log.ts      # Ctrl+Shift+D to download logs
│   ├── index.html
│   ├── vite.config.ts
│   └── package.json
└── requirements.txt

Key Design Decisions

Browser → Gemini direct: Voice audio never touches the Python backend. The browser opens a WebSocket directly to Gemini Live API using an ephemeral token. This keeps latency minimal.
Claude via CLI: Instead of the API, VoiceClaw spawns claude -p as a subprocess. This gives full access to Claude Code's tool suite (file read/write, shell, LSP) without reimplementing any of it.
Dual Gemini sessions: The main session handles conversation and tool orchestration. A separate narration session provides spoken commentary while Claude works, so the user gets continuous feedback.
Function routing: Different functions give Claude different permission levels. investigate_and_advise is read-only, code_task gets full edit access, run_command gets shell access.

Debugging

Press Ctrl+Shift+D in the browser to download a detailed event log. Logs include timestamped entries for every Gemini message, audio chunk, Claude event, and function call.

Environment Variables

Variable	Required	Description
`GEMINI_API_KEY`	Yes	API key from Google AI Studio

Community

Discord — join for help, feature discussions, and demos
LinkedIn — follow for updates

Disclaimer

VoiceClaw is an open-source project hosted exclusively on this GitHub repository. We do NOT own, operate, or are affiliated with any website using the "VoiceClaw" name, including but not limited to voiceclaw.io or any similar domains. The only official source for VoiceClaw is this repository: github.com/yogen-ghodke-113/VoiceClaw. Exercise caution with any other source claiming to be VoiceClaw.

License

Apache License 2.0 — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceClaw

How It Works

Features

Prerequisites

Quick Start

Development Mode

CLI Options

Available Voice Commands

Architecture

Key Design Decisions

Debugging

Environment Variables

Community

Disclaimer

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
frontend		frontend
prompts		prompts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checkpoint.py		checkpoint.py
claude_runner.py		claude_runner.py
context_bridge.py		context_bridge.py
function_router.py		function_router.py
gemini_session.py		gemini_session.py
requirements.txt		requirements.txt
server.py		server.py
stt_service.py		stt_service.py

Folders and files

Latest commit

History

Repository files navigation

VoiceClaw

How It Works

Features

Prerequisites

Quick Start

Development Mode

CLI Options

Available Voice Commands

Architecture

Key Design Decisions

Debugging

Environment Variables

Community

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages