Skip to content

ArchieMcM234/smx_langchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Sandwich Demo 🥪

A real-time, voice-to-voice AI pipeline demo featuring a sandwich shop order assistant. Built with LangChain/LangGraph agents and Speechmatics for both speech-to-text and text-to-speech.

Architecture

The pipeline processes audio through transform stages. Note that due to the Speechmatics integration, an AsyncQueue is used to manage connection states between the components.

flowchart LR
    subgraph Client [Browser]
        Mic[🎤 Microphone] -->|PCM Audio| WS_Out[WebSocket]
        WS_In[WebSocket] -->|Audio + Events| Speaker[🔊 Speaker]
    end

    subgraph Server [Node.js / Python]
        WS_Receiver[WS Receiver] --> Pipeline

        subgraph Pipeline [Voice Agent Pipeline]
            direction LR
            STT[Speechmatics STT] -->|Transcripts| Agent[LangChain Agent]
            Agent -->|Text Chunks| TTS[Speechmatics TTS]
        end

        Pipeline -->|Events| WS_Sender[WS Sender]
    end

    WS_Out --> WS_Receiver
    WS_Sender --> WS_In
Loading

Pipeline Stages

Each stage is an async generator that transforms a stream of events:

  1. STT Stage (sttStream): Streams audio to Speechmatics, yields transcription events (stt_chunk, stt_output).

  2. Agent Stage (agentStream): Passes upstream events through, invokes LangChain agent on final transcripts, yields agent responses (agent_chunk, tool_call, tool_result, agent_end).

  3. TTS Stage (ttsStream): Passes upstream events through, sends agent text to Speechmatics, yields audio events (tts_chunk).

🚧 Work in Progress: Speechmatics Port

This project has been ported to use Speechmatics for both the Speech-to-Text (STT) and Text-to-Speech (TTS) layers, replacing the previous multi-provider pipeline.

Status: Active Development 🛠️

While the core pipeline is functional, this demo serves primarily as a proof-of-concept to demonstrate how easy it is to use the Voice SDK to add Speechmatics to these applications.

Implementation Details

  • Unified Provider: Replaced separate STT/TTS services with a single Speechmatics integration.

  • Async Queue: To accommodate the specific call-and-response nature of the Speechmatics WebSocket connection, an AsyncQueue has been implemented to manage the flow of events and maintain stable connections.

Known Limitations & Roadmap

  • Interruption Handling (Barge-in): The current implementation does not yet support "barge-in." If you interrupt the agent, it will continue speaking until it has finished processing all text in its output queue. It will not skip the remaining audio.

  • Speaker Awareness: By utilizing the Speechmatics Voice SDK, the architecture is now primed to support instant speaker-aware conversations (diarization), allowing the agent to distinguish between different speakers in real-time.

Prerequisites

  • Node.js (v18+) or Python (3.11+)

  • pnpm or uv (Python package manager)

API Keys

Service Environment Variable Purpose
Speechmatics SPEECHMATICS_API_KEY Unified STT & TTS
Anthropic ANTHROPIC_API_KEY LangChain Agent (Claude)

Quick Start

Using Make (Recommended):

# Install all dependencies
make bootstrap

# Run TypeScript implementation (with hot reload)
make dev-ts

# Or run Python implementation (with hot reload)
make dev-py

The app will be available at http://localhost:8000.

Manual Setup

TypeScript

cd components/typescript
pnpm install
cd ../web
pnpm install && pnpm build
cd ../typescript
pnpm run server

Python

cd components/python
uv sync --dev
cd ../web
pnpm install && pnpm build
cd ../python
uv run src/main.py

Project Structure

components/
├── web/                 # Svelte frontend (shared by both backends)
│   └── src/
├── typescript/          # Node.js backend
│   └── src/
│       ├── index.ts     # Main server & pipeline
│       └── speechmatics/# Speechmatics client (Unified STT/TTS)
└── python/              # Python backend
    └── src/
        ├── main.py             # Main server & pipeline
        ├── speechmatics_client.py # Speechmatics client
        └── events.py           # Event type definitions

Event Types

The pipeline communicates via a unified event stream:

Event Direction Description
stt_chunk STT → Client Partial transcription (real-time feedback)
stt_output STT → Agent Final transcription
agent_chunk Agent → TTS Text chunk from agent response
tool_call Agent → Client Tool invocation
tool_result Agent → Client Tool execution result
agent_end Agent → TTS Signals end of agent turn
tts_chunk TTS → Client Audio chunk for playback

About

Port of the LangChain voice demo to use Speechmatics for stt and tts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors