Skip to content

Pushkarmondal/voice-booking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Voice Booking Agent

A production-oriented voice booking system that converts user speech into safe, deterministic service bookings. Built for rapid iteration without sacrificing correctness.

Voice is treated as just another input channelβ€”never as an authority.


🧱 Tech Stack

  • Runtime: Bun + Node.js
  • Backend: Express + TypeScript
  • Database: PostgreSQL
  • ORM: Prisma
  • Cache / State: Redis
  • Speech-to-Text (STT): Pluggable (Google STT / Bhashini)
  • Intent Extraction: LLM (guard-railed)
  • Text-to-Speech (TTS): External provider / device TTS

🎯 Core Principles

  • LLMs do not execute business logic
  • All AI outputs are validated, structured, and rejectable
  • Conversation state is explicit and externalized
  • Booking APIs are idempotent and shared
  • Confirmation is mandatory before booking

If any of these are violated, it’s a bugβ€”not a feature.


πŸ—οΈ High-Level Flow

User Voice
β†’ Audio Upload
β†’ Speech-to-Text
β†’ Intent Extraction (LLM)
β†’ Schema Validation
β†’ Conversation State (Redis)
β†’ Booking Engine
β†’ Confirmation
β†’ Text-to-Speech

πŸ“‚ Project Structure

src/
  app.ts                # Express app
  server.ts             # Bootstraps server
  config/               # Env & config loaders

  voice/
    voice.routes.ts     # Voice entrypoints
    voice.controller.ts
    stt/                # Speech-to-text adapters
    intent/             # LLM intent extraction
    state/              # Redis conversation state
    responses/          # Voice/text responses

  booking/
    booking.service.ts  # Core booking logic

  prisma/
    schema.prisma

  infra/
    redis.ts
    db.ts

No β€œai” folder. This is product code, not a demo.


πŸš€ Getting Started

1. Install dependencies

bun install

2. Run the server

bun run dev

Server starts on http://localhost:4006 (configurable via PORT).

3. Test the voice endpoint (upload audio)

curl -X POST http://localhost:4006/voice/audio \
  -H "X-Conversation-Id: test-123" \
  -F "audio=@/Users/pushkarmondal/100xdevs/voice_booking/sample.wav"

πŸ” Google Speech-to-Text (Service Account) Setup

Step 1: Create a service account + download JSON credentials

  1. Go to Google Cloud Console
  2. Select your project (or create one)
  3. Navigate to IAM & Admin β†’ Service Accounts
  4. Click Create Service Account
  5. Give it a name (example: speech-to-text-service)
  6. Grant the Cloud Speech-to-Text Admin role (or at minimum Cloud Speech Client)
  7. Click Done
  8. Click on the created service account
  9. Go to the Keys tab
  10. Click Add Key β†’ Create New Key
  11. Choose JSON
  12. Download the JSON file

Step 2: Configure authentication

Option A (recommended for local dev): set GOOGLE_APPLICATION_CREDENTIALS

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"

Or in your .env file:

GOOGLE_APPLICATION_CREDENTIALS=/Users/pushkarmondal/100xdevs/voice_booking/google-credentials.json

Step 3: Enable the API

  1. Go to Google Cloud Console
  2. Navigate to APIs & Services β†’ Library
  3. Search for Cloud Speech-to-Text API
  4. Click Enable

Step 4: Restart the server

bun run dev

Quick setup checklist (copy/paste)

# 1. Place your service account JSON file in your project
mv ~/Downloads/your-service-account-key.json ./google-credentials.json

# 2. Add to .env
echo "GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/google-credentials.json" >> .env

# 3. Add to .gitignore to avoid committing credentials
echo "google-credentials.json" >> .gitignore

# 4. Restart your server
bun run dev

2. Environment variables

DATABASE_URL=postgresql://...
REDIS_URL=redis://...
STT_API_KEY=...
LLM_API_KEY=...
VOICE_BOOKING_ENABLED=true

3. Database setup

bun prisma generate
bun prisma migrate dev

4. Run the server

bun run dev

Server starts on http://localhost:4006 (configurable via PORT).


πŸ”Š Voice API

Upload audio

POST /voice/audio
Headers:
  X-Conversation-Id: <uuid>
Body:
  audio/wav | audio/webm

Behavior

  • Accepts max 15s audio
  • Returns 202 Accepted
  • Triggers async voice pipeline

🧠 Intent Extraction Contract

LLM must return ONLY JSON:

{
  "intent": "BOOK_SERVICE",
  "slots": {
    "service": "facial",
    "date": "2026-02-01",
    "time": "evening",
    "location": "near_me"
  },
  "confidence": 0.87
}

If:

  • confidence is low
  • fields are ambiguous
  • schema validation fails

β†’ system asks for clarification.

No guessing. Ever.


πŸ—ƒοΈ Conversation State (Redis)

Each conversation is tracked explicitly:

{
  state: "COLLECTING" | "CONFIRMING" | "BOOKED",
  slots: { service?, date?, time?, location? },
  expiresAt
}

TTL: 15 minutes

Stateless APIs. Stateful experience.


πŸ“… Booking Engine

  • Shared with UI bookings
  • Idempotent via Idempotency-Key
  • Reservation lock with TTL
  • Voice cannot bypass confirmation

Voice calls the same APIs your app uses.


🌍 Language Support

  • Hindi βœ…
  • Assamese βœ… (not as per now but in future we can add)
  • English βœ…

Language is:

  • Detected via STT
  • Treated identically in intent pipeline
  • Never inferred from location

🧯 Safety & Kill Switches

  • Feature flag: VOICE_BOOKING_ENABLED
  • Confidence thresholds on STT + intent
  • Mandatory confirmation step
  • Full transcript + decision logging

Voice can be disabled instantly without redeploy.


πŸ§ͺ What This Is (and Is Not)

βœ… Production-ready architecture βœ… Designed for scale and failure βœ… Interview-grade system design

❌ Not a chatbot ❌ Not β€œAI decides” logic ❌ Not a voice toy


πŸ“ˆ Next Extensions

  • WebSocket audio streaming
  • Latency budgets & tracing
  • Voice analytics (drop-offs per state)
  • Staff-side voice booking

About

Voice Booking Agent is a production-oriented voice-to-booking backend that converts user speech into safe, deterministic service bookings.

Topics

Resources

Stars

Watchers

Forks

Contributors