Voice Agent with Deterministic State Machine and Multi-Layer Guardrails

title	Voice Agent - Deterministic Control
emoji	🦷
colorFrom	gray
colorTo	yellow
sdk	docker
pinned	false
app_port	7860

Voice Agent with Deterministic State Machine and Multi-Layer Guardrails

11-state FSM voice agent with six guardrail layers. ~1,500 lines of TypeScript, no SDK abstractions. Tool scoping makes hallucination structurally impossible.

Demo:

demo.mp4

▶ Try the Live Demo — Bring your own Deepgram API key. No key is stored or logged.

Why This Exists

Most LLM voice agents can hallucinate business hours, skip intake steps, or break on interruption. This one can't — by design.

Production voice agents built on LLMs share a predictable failure mode: the model controls the conversation, and prompt instructions degrade under pressure. Hallucinated facts, skipped required steps, undefined behavior on interruption. I built this agent to prove a different architecture: the LLM is a constrained tool, not the controller. State flow is enforced in code. Facts come from tools the LLM didn't have access to until the right moment. Output is validated before it becomes audio.

Architecture Overview

┌──────────────────────────────────────────────────────────────┐
│  Layer 1: Input Classification                               │
│  User Speech → classifyInput() → [off-topic?] → Redirect    │
├──────────────────────────────────────────────────────────────┤
│  Layer 2: State Machine                                      │
│  11-state appointment flow (+ CONNECTING/DISCONNECTED), hard transition gates, NLP-assisted progression  │
├──────────────────────────────────────────────────────────────┤
│  Layer 3: Tool Scoping                                       │
│  Client executor enforces per-state tool allowlist           │
├──────────────────────────────────────────────────────────────┤
│  Layer 4: Tool Execution                                     │
│  Real lookups only — LLM never has facts in context          │
├──────────────────────────────────────────────────────────────┤
│  Layer 5: Output Validation                                  │
│  Block + replace any response violating state rules          │
├──────────────────────────────────────────────────────────────┤
│  Layer 6: Interruption Handling                              │
│  Detect barge-in → clear buffers → rollback state → recover  │
└──────────────────────────────────────────────────────────────┘

Key Technical Decisions

1. Deepgram Voice Agent API

Built on Deepgram's Voice Agent WebSocket API (wss://agent.deepgram.com/v1/agent/converse) with:

Nova-3 for speech-to-text (fastest, most accurate)
Aura-2 for text-to-speech
OpenAI gpt-4o-mini for the think layer, routed via Deepgram's think.provider config
Short-lived JWT authentication via /v1/auth/grant — your API key never touches the WebSocket

Raw WebSocket integration — full control over the event lifecycle, function calling, and binary PCM16 audio pipeline.

2. Deterministic State Machine + NLP Fallback

GREETING → COLLECT_NAME → COLLECT_REASON → SAVE_INTAKE → ASK_TIME →
VALIDATE_HOURS → CONFIRM_SLOT → BOOK_APPOINTMENT → SUMMARY → CLOSING

allowedTransitions[] prevents invalid state jumps at the code level
UpdatePrompt sends a state-specific system prompt on every transition
Client-side executor enforces allowedToolNames per state — out-of-scope calls return an error response, never execute

NLP-assisted inference: When the LLM handles things conversationally without calling tools, inferStateFromConversation() detects state transitions from natural language patterns (name detection, scheduling intent, slot confirmation, booking confirmation) and advances the state machine automatically. Tool calls remain authoritative; NLP is a reliable fallback.

Result: The LLM literally cannot call a tool that isn't available. UI state machine stays in sync with the conversation even when the LLM skips function calls.

3. Multi-Layer Guardrails

Six defense layers, each independent:

Layer	What It Does	Example
Input Classification	Keyword-based intent routing, no LLM	"What's the weather?" → instant redirect
State Gate	Hard transition validation	COLLECT_NAME → ASK_TIME → blocked
Tool Scope	Only state-relevant tools in session	Can't check hours during intake
Tool Execution	External data lookup, never LLM memory	Business hours from deterministic logic, not context
Output Validation	Regex + rule check on agent text	"I've booked" in wrong state → blocked
Interruption Handler	Detect barge-in → rollback → recover	Mid-booking interrupt → rollback to CONFIRM_SLOT

4. React UI with Real-Time Architecture Visualization

The UI is built with React + TypeScript (Vite), decoupled from the agent core via a typed EventBus. All visualization panels subscribe to bus events — the WebSocket client never touches the DOM.

State Machine Panel — 10-node flow diagram. Current state glows, completed states dim, animated on every transition.
Guardrail Indicators — 6 stacked layer badges. Flash with reason text when triggered; CSS animation resets via React key cycling.
Tool Call Feed — Live log showing tool name, arguments, result, and latency. Newest-first, capped at 20 entries.

5. Interruption Handling with State Rollback

Every state has a defined rollback target. On UserStartedSpeaking during agent speech:

Clear local audio playback buffers
Determine safe rollback state
Transition + send UpdatePrompt with rollback state's system prompt
Inject recovery message

No undefined behavior on interruption.

Running Locally

# Install dependencies
npm install

# Build the React SPA
npm run build

# Start Express server (serves dist/ + token endpoint)
npm start

# Open http://localhost:7860
# Paste your Deepgram API key (never stored)
# Allow microphone access → Click Connect

Development Mode (hot reload)

# Terminal 1 — Express token server
npm start

# Terminal 2 — Vite dev server (proxies /api to Express)
npm run dev
# Open http://localhost:5173

Production / Docker

docker build -t dental-voice-agent .
docker run -p 7860:7860 dental-voice-agent

BYOK (Bring Your Own Key) Security

Your Deepgram API key is:

Sent via POST to the backend token endpoint (/api/token)
Used once to mint a short-lived JWT via POST /v1/auth/grant
Never logged, stored, or persisted
The ephemeral JWT (not your key) authenticates the WebSocket connection via bearer subprotocol
JWT expires after 10 minutes

Tech Stack

Component	Choice	Rationale
Voice API	Deepgram Voice Agent API (WebSocket)	Nova-3 STT, Aura-2 TTS, built-in turn detection
Frontend	React + TypeScript (Vite)	Stable component model, clean state management
Styling	Vanilla CSS + CSS variables	Full control, no framework overhead
Server	Express + tsx	Serves SPA + BYOK token endpoint
Deployment	Docker → HF Spaces	Free hosting, direct URL access
Design	Dark theme, copper accents, glassmorphism	Premium look, architecture visibility

Project Structure

├── server.ts                  # Express server (BYOK token endpoint + static serve)
├── Dockerfile                 # Multi-stage build: Vite build → lean runtime
├── index.html                 # React entry point
├── src/
│   ├── main.tsx               # ReactDOM.createRoot entry
│   ├── main_sdk.ts            # Voice agent core (WebSocket, audio I/O, guardrails)
│   ├── state-machine.ts       # State definitions, transitions, NLP inference, tool execution
│   ├── event-bus.ts           # Typed EventBus — decouples agent core from UI
│   ├── components/
│   │   ├── App.tsx            # Root layout
│   │   ├── ConnectionBar.tsx  # API key input, connect/disconnect, status
│   │   ├── StateMachinePanel.tsx  # Animated state flow visualization
│   │   ├── GuardrailPanel.tsx     # 6-layer guardrail badges with flash animation
│   │   └── ToolFeedPanel.tsx      # Live tool call log with latency
│   └── ui/
│       └── styles.css         # Design system (CSS variables, dark theme, animations)
├── ARCHITECTURE.md            # Detailed layer-by-layer architecture doc
└── vite.config.ts             # Vite + React plugin config

Production Roadmap

These are intentional scope boundaries for a prototype — not oversights:

This demo does not persist appointment records; bookings are confirmed in-call only.

Test Suite — 25-30 scripted conversation scenarios covering edge cases (interruption during tool call, double-booking, off-topic chains)
Production Infrastructure — Health checks, monitoring, WebSocket reconnection, token refresh
Real Integrations — Database for patient records, calendar API for availability, Twilio for telephony
Error Recovery — Rate limit backoff, graceful degradation, session recovery
Load Testing — Concurrent session handling, connection pooling

The core architecture (state machine, guardrails, tool scoping, NLP fallback) is designed to be production-transferable.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
public		public
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
server.ts		server.ts
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent with Deterministic State Machine and Multi-Layer Guardrails

Why This Exists

Architecture Overview

Key Technical Decisions

1. Deepgram Voice Agent API

2. Deterministic State Machine + NLP Fallback

3. Multi-Layer Guardrails

4. React UI with Real-Time Architecture Visualization

5. Interruption Handling with State Rollback

Running Locally

Development Mode (hot reload)

Production / Docker

BYOK (Bring Your Own Key) Security

Tech Stack

Project Structure

Production Roadmap

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent with Deterministic State Machine and Multi-Layer Guardrails

Why This Exists

Architecture Overview

Key Technical Decisions

1. Deepgram Voice Agent API

2. Deterministic State Machine + NLP Fallback

3. Multi-Layer Guardrails

4. React UI with Real-Time Architecture Visualization

5. Interruption Handling with State Rollback

Running Locally

Development Mode (hot reload)

Production / Docker

BYOK (Bring Your Own Key) Security

Tech Stack

Project Structure

Production Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages