Skip to content

muneeb-rashid-cyan/Multi-Agent-Voice-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Research Assistant

A production-quality multi-agent voice assistant built with the OpenAI Agents SDK. Speak a question — the system routes it to the right specialist agent, reasons over it, and responds with natural voice.

demo


What It Does

  • 🎙 Speak — hold the mic button or press Space to ask anything
  • 🧠 Routes — General Agent decides which specialist handles the request
  • 🔍 Researches — Research Agent searches the web and summarizes results
  • 💻 Codes — Code Agent explains algorithms and solves math problems
  • 🔊 Responds — answer played back as natural voice (OpenAI TTS nova)
  • 💾 Remembers — full conversation history persists across sessions (SQLite)

Tech Stack

Layer Technology
Agent Framework OpenAI Agents SDK 0.11.1
LLM GPT-4o (specialists) · GPT-4o-mini (triage)
Speech-to-Text OpenAI Whisper (gpt-4o-transcribe)
Text-to-Speech OpenAI TTS (tts-1-hd · nova voice)
Memory SQLiteSession — persists across restarts
Backend FastAPI + Uvicorn
Frontend Vanilla HTML · CSS · JavaScript

Architecture

User speaks
     │
     ▼
STT — gpt-4o-transcribe
     │
     ▼
┌──────────────────────────────────────────┐
│            General Agent                 │
│  • Input guardrails (jailbreak, empty)   │
│  • Output guardrails (PII, length cap)   │
│  • Routes by question type               │
└──────────┬───────────────┬───────────────┘
           │               │
  Research?│               │ Code / Math?
           ▼               ▼
 ┌──────────────┐   ┌──────────────┐
 │ Research     │   │ Code Agent   │
 │ Agent        │   │              │
 │ tools:       │   │ tools:       │
 │ • web_search │   │ • calculator │
 │ • summarizer │   └──────────────┘
 └──────────────┘
           │
           ▼
TTS — tts-1-hd · nova voice
           │
           ▼
      🔊 Browser plays audio
           │
           ▼
  SQLiteSession saves turn
  (memory persists next run)

OpenAI Agents SDK Concepts Demonstrated

Concept File
Multi-agent handoffs agents/general_agent.py
Typed handoff metadata (input_type) agents/general_agent.py
on_handoff callbacks agents/general_agent.py
@function_tool tools/web_search.py · tools/summarizer.py · tools/calculator.py
@input_guardrail guardrails/input_guards.py
@output_guardrail guardrails/output_guards.py
Custom VoiceWorkflowBase subclass workflow/session_workflow.py
VoicePipeline + VoicePipelineConfig main.py (CLI)
SQLiteSession persistent memory session/memory.py
Runner.run_streamed() workflow/session_workflow.py

Setup

Prerequisites

  • Python 3.10+
  • OpenAI API key with GPT-4o access
  • PortAudio (for CLI mode only)
# macOS
brew install portaudio

Install

git clone https://github.com/your-username/voice-research-assistant.git
cd voice-research-assistant

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

cp .env.example .env
# Add your OPENAI_API_KEY to .env

Run — Web UI

python server.py

Open http://localhost:8000

  • Hold the 🎙 button (or press Space) → speak → release → agent responds
  • Click the session badge (top right) to start a fresh conversation

Run — CLI

python -m src.voice_research_assistant.main

Press ENTER to start/stop recording. Ctrl+C to quit.


Example Questions

Type Example
Research "What is quantum entanglement?"
Current events "What happened with OpenAI recently?"
Math "What is 2 to the power of 32?"
Code "Explain what a Python generator is"
Simple "What can you help me with?"

Session Memory

Conversation history is saved to data/conversations.db. The agent remembers previous turns — even after you restart.

# Start a fresh session
SESSION_ID=new-session python server.py

# Reset memory entirely
rm data/conversations.db

Project Structure

voice-research-assistant/
├── server.py                              # FastAPI server + web UI entry point
├── .env.example                           # Environment variable template
├── requirements.txt
├── assets/
│   └── demo.png
├── frontend/
│   ├── index.html                         # Web UI
│   └── static/
│       ├── app.js                         # Recording, API calls, playback
│       └── style.css                      # Dark theme
└── src/voice_research_assistant/
    ├── main.py                            # CLI entry point
    ├── config.py                          # Environment variables
    ├── api/
    │   └── voice_handler.py               # STT → Agent → TTS pipeline for web
    ├── audio/
    │   ├── recorder.py                    # Push-to-talk mic capture (CLI)
    │   └── player.py                      # Real-time audio playback (CLI)
    ├── agents/
    │   ├── general_agent.py               # Triage agent with guardrails + handoffs
    │   ├── research_agent.py              # Web search specialist
    │   └── code_agent.py                  # Code and math specialist
    ├── tools/
    │   ├── web_search.py                  # DuckDuckGo (no API key needed)
    │   ├── summarizer.py                  # Condenser via gpt-4o-mini
    │   └── calculator.py                  # Safe AST-based arithmetic
    ├── guardrails/
    │   ├── input_guards.py                # Jailbreak + empty input detection
    │   └── output_guards.py               # PII detection + response length cap
    ├── workflow/
    │   └── session_workflow.py            # Custom VoiceWorkflowBase + SQLiteSession
    └── session/
        └── memory.py                      # SQLiteSession factory

Related Projects

About

A multi-agent voice assistant built with OpenAI Agents SDK — speak a question, get a spoken answer. Features agent handoffs, guardrails, web search, and persistent memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors