🎙️ Voice Booking Agent

A production-oriented voice booking system that converts user speech into safe, deterministic service bookings. Built for rapid iteration without sacrificing correctness.

Voice is treated as just another input channel—never as an authority.

🧱 Tech Stack

Runtime: Bun + Node.js
Backend: Express + TypeScript
Database: PostgreSQL
ORM: Prisma
Cache / State: Redis
Speech-to-Text (STT): Pluggable (Google STT / Bhashini)
Intent Extraction: LLM (guard-railed)
Text-to-Speech (TTS): External provider / device TTS

🎯 Core Principles

LLMs do not execute business logic
All AI outputs are validated, structured, and rejectable
Conversation state is explicit and externalized
Booking APIs are idempotent and shared
Confirmation is mandatory before booking

If any of these are violated, it’s a bug—not a feature.

🏗️ High-Level Flow

User Voice
→ Audio Upload
→ Speech-to-Text
→ Intent Extraction (LLM)
→ Schema Validation
→ Conversation State (Redis)
→ Booking Engine
→ Confirmation
→ Text-to-Speech

📂 Project Structure

src/
  app.ts                # Express app
  server.ts             # Bootstraps server
  config/               # Env & config loaders

  voice/
    voice.routes.ts     # Voice entrypoints
    voice.controller.ts
    stt/                # Speech-to-text adapters
    intent/             # LLM intent extraction
    state/              # Redis conversation state
    responses/          # Voice/text responses

  booking/
    booking.service.ts  # Core booking logic

  prisma/
    schema.prisma

  infra/
    redis.ts
    db.ts

No “ai” folder. This is product code, not a demo.

🚀 Getting Started

1. Install dependencies

bun install

2. Run the server

bun run dev

Server starts on http://localhost:4006 (configurable via PORT).

3. Test the voice endpoint (upload audio)

curl -X POST http://localhost:4006/voice/audio \
  -H "X-Conversation-Id: test-123" \
  -F "audio=@/Users/pushkarmondal/100xdevs/voice_booking/sample.wav"

🔐 Google Speech-to-Text (Service Account) Setup

Step 1: Create a service account + download JSON credentials

Go to Google Cloud Console
Select your project (or create one)
Navigate to IAM & Admin → Service Accounts
Click Create Service Account
Give it a name (example: speech-to-text-service)
Grant the Cloud Speech-to-Text Admin role (or at minimum Cloud Speech Client)
Click Done
Click on the created service account
Go to the Keys tab
Click Add Key → Create New Key
Choose JSON
Download the JSON file

Step 2: Configure authentication

Option A (recommended for local dev): set GOOGLE_APPLICATION_CREDENTIALS

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"

Or in your .env file:

GOOGLE_APPLICATION_CREDENTIALS=/Users/pushkarmondal/100xdevs/voice_booking/google-credentials.json

Step 3: Enable the API

Go to Google Cloud Console
Navigate to APIs & Services → Library
Search for Cloud Speech-to-Text API
Click Enable

Step 4: Restart the server

bun run dev

Quick setup checklist (copy/paste)

# 1. Place your service account JSON file in your project
mv ~/Downloads/your-service-account-key.json ./google-credentials.json

# 2. Add to .env
echo "GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/google-credentials.json" >> .env

# 3. Add to .gitignore to avoid committing credentials
echo "google-credentials.json" >> .gitignore

# 4. Restart your server
bun run dev

2. Environment variables

DATABASE_URL=postgresql://...
REDIS_URL=redis://...
STT_API_KEY=...
LLM_API_KEY=...
VOICE_BOOKING_ENABLED=true

3. Database setup

bun prisma generate
bun prisma migrate dev

4. Run the server

bun run dev

Server starts on http://localhost:4006 (configurable via PORT).

🔊 Voice API

Upload audio

POST /voice/audio
Headers:
  X-Conversation-Id: <uuid>
Body:
  audio/wav | audio/webm

Behavior

Accepts max 15s audio
Returns 202 Accepted
Triggers async voice pipeline

🧠 Intent Extraction Contract

LLM must return ONLY JSON:

{
  "intent": "BOOK_SERVICE",
  "slots": {
    "service": "facial",
    "date": "2026-02-01",
    "time": "evening",
    "location": "near_me"
  },
  "confidence": 0.87
}

If:

confidence is low
fields are ambiguous
schema validation fails

→ system asks for clarification.

No guessing. Ever.

🗃️ Conversation State (Redis)

Each conversation is tracked explicitly:

{
  state: "COLLECTING" | "CONFIRMING" | "BOOKED",
  slots: { service?, date?, time?, location? },
  expiresAt
}

TTL: 15 minutes

Stateless APIs. Stateful experience.

📅 Booking Engine

Shared with UI bookings
Idempotent via Idempotency-Key
Reservation lock with TTL
Voice cannot bypass confirmation

Voice calls the same APIs your app uses.

🌍 Language Support

Hindi ✅
Assamese ✅ (not as per now but in future we can add)
English ✅

Language is:

Detected via STT
Treated identically in intent pipeline
Never inferred from location

🧯 Safety & Kill Switches

Feature flag: VOICE_BOOKING_ENABLED
Confidence thresholds on STT + intent
Mandatory confirmation step
Full transcript + decision logging

Voice can be disabled instantly without redeploy.

🧪 What This Is (and Is Not)

✅ Production-ready architecture ✅ Designed for scale and failure ✅ Interview-grade system design

❌ Not a chatbot ❌ Not “AI decides” logic ❌ Not a voice toy

📈 Next Extensions

WebSocket audio streaming
Latency budgets & tracing
Voice analytics (drop-offs per state)
Staff-side voice booking

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
package.json		package.json
sample.wav		sample.wav
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Voice Booking Agent

🧱 Tech Stack

🎯 Core Principles

🏗️ High-Level Flow

📂 Project Structure

🚀 Getting Started

1. Install dependencies

2. Run the server

3. Test the voice endpoint (upload audio)

🔐 Google Speech-to-Text (Service Account) Setup

Step 1: Create a service account + download JSON credentials

Step 2: Configure authentication

Step 3: Enable the API

Step 4: Restart the server

Quick setup checklist (copy/paste)

2. Environment variables

3. Database setup

4. Run the server

🔊 Voice API

Upload audio

🧠 Intent Extraction Contract

🗃️ Conversation State (Redis)

📅 Booking Engine

🌍 Language Support

🧯 Safety & Kill Switches

🧪 What This Is (and Is Not)

📈 Next Extensions

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Voice Booking Agent

🧱 Tech Stack

🎯 Core Principles

🏗️ High-Level Flow

📂 Project Structure

🚀 Getting Started

1. Install dependencies

2. Run the server

3. Test the voice endpoint (upload audio)

🔐 Google Speech-to-Text (Service Account) Setup

Step 1: Create a service account + download JSON credentials

Step 2: Configure authentication

Step 3: Enable the API

Step 4: Restart the server

Quick setup checklist (copy/paste)

2. Environment variables

3. Database setup

4. Run the server

🔊 Voice API

Upload audio

🧠 Intent Extraction Contract

🗃️ Conversation State (Redis)

📅 Booking Engine

🌍 Language Support

🧯 Safety & Kill Switches

🧪 What This Is (and Is Not)

📈 Next Extensions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages