voice-bridge

Modular voice-to-AI-to-voice bridge. Listens in a real-time voice channel, transcribes speech locally, sends text to an AI backend, and plays back the AI response as speech.

First implementation targets Discord, but the architecture cleanly separates platform-specific logic from core voice/AI logic for future transports.

┌─────────────────────────────────────────────────────────┐
│                    voice-bridge                         │
│                                                         │
│  ┌───────────┐    ┌─────┐    ┌────┐    ┌─────┐          │
│  │ Transport │───▶│ STT │───▶│ AI │───▶│ TTS │──┐       │
│  │ (Discord) │    └─────┘    └────┘    └─────┘  │       │
│  │           │◀─────────────────────────────────┘       │
│  └───────────┘                                          │
│       │              │          │          │            │
│       ▼              ▼          ▼          ▼            │
│  discord.js     whisper.cpp  OpenClaw   Piper TTS       │
│  @discordjs/    (HTTP)       (HTTP)     (HTTP)          │
│  voice                                                  │
└─────────────────────────────────────────────────────────┘

Prerequisites

Node.js 22+
pnpm
Docker & Docker Compose (for whisper + piper servers)
A Discord bot token

Discord Bot Setup

Go to Discord Developer Portal
Create a new application
Go to Bot → click Reset Token → copy the token
Enable these Privileged Gateway Intents:
- Message Content Intent
Go to OAuth2 → URL Generator:
- Scopes: bot, applications.commands
- Bot Permissions: Connect, Speak, Use Voice Activity
Use the generated URL to invite the bot to your server
Copy your Application ID (Client ID) and Guild ID (Server ID)

Configuration

Copy .env.example to .env and fill in your values:

cp .env.example .env

Variable	Required	Description
`DISCORD_TOKEN`	Yes	Bot token
`DISCORD_CLIENT_ID`	Yes	Application/Client ID
`DISCORD_GUILD_ID`	Yes	Server ID for slash commands
`DISCORD_LISTEN_USER_ID`	No	Only respond to this user
`WHISPER_URL`	No	Whisper server URL (default: `http://whisper:8080`)
`PIPER_URL`	No	Piper server URL (default: `http://piper:5000`)
`OPENCLAW_URL`	Yes	OpenClaw API base URL
`OPENCLAW_API_KEY`	Yes	OpenClaw API key
`OPENCLAW_MODEL`	No	Model identifier (default: `default`)
`OPENCLAW_SYSTEM_PROMPT`	No	Custom system prompt

Quick Start (Docker Compose)

cp .env.example .env
# Edit .env with your credentials

docker compose up --build

This starts:

voice-bridge — the Node.js bot
whisper — whisper.cpp HTTP server (speech-to-text)
piper — Piper TTS server (text-to-speech)

Local Development

pnpm install
pnpm build

# Register slash commands (one-time)
pnpm deploy-commands

# Run the bot
pnpm start

# Or watch mode for development
pnpm dev

Slash Commands

Command	Description
`/join`	Join your current voice channel
`/leave`	Leave the voice channel
`/status`	Show connection status

Running Tests

pnpm test

Architecture

The project is built around four core interfaces that are completely platform-agnostic:

VoiceTransport — join/leave voice, receive audio streams, play audio back
SpeechToText — audio buffers → text
TextToSpeech — text → audio buffers
AIBackend — text in → text out

The Pipeline orchestrator wires these together: transport → STT → AI → TTS → transport playback.

Project Structure

src/
├── core/
│   ├── interfaces.ts    # Platform-agnostic interfaces
│   ├── pipeline.ts      # Pipeline orchestrator
│   └── index.ts         # Core exports
├── transports/
│   └── discord/
│       ├── transport.ts  # Discord VoiceTransport implementation
│       ├── commands.ts   # Slash command definitions + handlers
│       ├── deploy-commands.ts  # One-time command registration
│       └── index.ts
├── stt/
│   └── whisper.ts       # Whisper.cpp STT implementation
├── tts/
│   └── piper.ts         # Piper TTS implementation
├── ai/
│   └── openclaw.ts      # OpenClaw AI implementation
├── config.ts            # Environment config loader
└── index.ts             # Entry point

Adding a New Transport

Implement the VoiceTransport interface from src/core/interfaces.ts:

import type { VoiceTransport, UserAudioStream } from "./core/interfaces.js";

class MyTransport implements VoiceTransport {
  async join(channelId: string): Promise<void> { /* ... */ }
  async leave(): Promise<void> { /* ... */ }
  onUserAudio(handler: (userAudio: UserAudioStream) => void): void { /* ... */ }
  async playAudio(audio: Buffer): Promise<void> { /* ... */ }
  isConnected(): boolean { /* ... */ }
}

Then wire it into a Pipeline instance — the STT, AI, and TTS modules work unchanged.

Adding a New STT/TTS/AI Backend

Implement the corresponding interface:

// STT
class MySTT implements SpeechToText {
  async transcribe(audio: Buffer): Promise<string> { /* ... */ }
}

// TTS
class MyTTS implements TextToSpeech {
  async synthesize(text: string): Promise<Buffer> { /* ... */ }
}

// AI
class MyAI implements AIBackend {
  async chat(userMessage: string): Promise<string> { /* ... */ }
}

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.iloom		.iloom
.vscode		.vscode
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.pnpm-approve-builds.json		.pnpm-approve-builds.json
Dockerfile		Dockerfile
Dockerfile.piper		Dockerfile.piper
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
piper-server.py		piper-server.py
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-bridge

Prerequisites

Discord Bot Setup

Configuration

Quick Start (Docker Compose)

Local Development

Slash Commands

Running Tests

Architecture

Project Structure

Adding a New Transport

Adding a New STT/TTS/AI Backend

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-bridge

Prerequisites

Discord Bot Setup

Configuration

Quick Start (Docker Compose)

Local Development

Slash Commands

Running Tests

Architecture

Project Structure

Adding a New Transport

Adding a New STT/TTS/AI Backend

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages