Voice Research Assistant

A production-quality multi-agent voice assistant built with the OpenAI Agents SDK. Speak a question — the system routes it to the right specialist agent, reasons over it, and responds with natural voice.

What It Does

🎙 Speak — hold the mic button or press Space to ask anything
🧠 Routes — General Agent decides which specialist handles the request
🔍 Researches — Research Agent searches the web and summarizes results
💻 Codes — Code Agent explains algorithms and solves math problems
🔊 Responds — answer played back as natural voice (OpenAI TTS nova)
💾 Remembers — full conversation history persists across sessions (SQLite)

Tech Stack

Layer	Technology
Agent Framework	OpenAI Agents SDK `0.11.1`
LLM	GPT-4o (specialists) · GPT-4o-mini (triage)
Speech-to-Text	OpenAI Whisper (`gpt-4o-transcribe`)
Text-to-Speech	OpenAI TTS (`tts-1-hd` · nova voice)
Memory	SQLiteSession — persists across restarts
Backend	FastAPI + Uvicorn
Frontend	Vanilla HTML · CSS · JavaScript

Architecture

User speaks
     │
     ▼
STT — gpt-4o-transcribe
     │
     ▼
┌──────────────────────────────────────────┐
│            General Agent                 │
│  • Input guardrails (jailbreak, empty)   │
│  • Output guardrails (PII, length cap)   │
│  • Routes by question type               │
└──────────┬───────────────┬───────────────┘
           │               │
  Research?│               │ Code / Math?
           ▼               ▼
 ┌──────────────┐   ┌──────────────┐
 │ Research     │   │ Code Agent   │
 │ Agent        │   │              │
 │ tools:       │   │ tools:       │
 │ • web_search │   │ • calculator │
 │ • summarizer │   └──────────────┘
 └──────────────┘
           │
           ▼
TTS — tts-1-hd · nova voice
           │
           ▼
      🔊 Browser plays audio
           │
           ▼
  SQLiteSession saves turn
  (memory persists next run)

OpenAI Agents SDK Concepts Demonstrated

Concept	File
Multi-agent handoffs	`agents/general_agent.py`
Typed handoff metadata (`input_type`)	`agents/general_agent.py`
`on_handoff` callbacks	`agents/general_agent.py`
`@function_tool`	`tools/web_search.py` · `tools/summarizer.py` · `tools/calculator.py`
`@input_guardrail`	`guardrails/input_guards.py`
`@output_guardrail`	`guardrails/output_guards.py`
Custom `VoiceWorkflowBase` subclass	`workflow/session_workflow.py`
`VoicePipeline` + `VoicePipelineConfig`	`main.py` (CLI)
`SQLiteSession` persistent memory	`session/memory.py`
`Runner.run_streamed()`	`workflow/session_workflow.py`

Setup

Prerequisites

Python 3.10+
OpenAI API key with GPT-4o access
PortAudio (for CLI mode only)

# macOS
brew install portaudio

Install

git clone https://github.com/your-username/voice-research-assistant.git
cd voice-research-assistant

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

cp .env.example .env
# Add your OPENAI_API_KEY to .env

Run — Web UI

python server.py

Open http://localhost:8000

Hold the 🎙 button (or press Space) → speak → release → agent responds
Click the session badge (top right) to start a fresh conversation

Run — CLI

python -m src.voice_research_assistant.main

Press ENTER to start/stop recording. Ctrl+C to quit.

Example Questions

Type	Example
Research	"What is quantum entanglement?"
Current events	"What happened with OpenAI recently?"
Math	"What is 2 to the power of 32?"
Code	"Explain what a Python generator is"
Simple	"What can you help me with?"

Session Memory

Conversation history is saved to data/conversations.db. The agent remembers previous turns — even after you restart.

# Start a fresh session
SESSION_ID=new-session python server.py

# Reset memory entirely
rm data/conversations.db

Project Structure

voice-research-assistant/
├── server.py                              # FastAPI server + web UI entry point
├── .env.example                           # Environment variable template
├── requirements.txt
├── assets/
│   └── demo.png
├── frontend/
│   ├── index.html                         # Web UI
│   └── static/
│       ├── app.js                         # Recording, API calls, playback
│       └── style.css                      # Dark theme
└── src/voice_research_assistant/
    ├── main.py                            # CLI entry point
    ├── config.py                          # Environment variables
    ├── api/
    │   └── voice_handler.py               # STT → Agent → TTS pipeline for web
    ├── audio/
    │   ├── recorder.py                    # Push-to-talk mic capture (CLI)
    │   └── player.py                      # Real-time audio playback (CLI)
    ├── agents/
    │   ├── general_agent.py               # Triage agent with guardrails + handoffs
    │   ├── research_agent.py              # Web search specialist
    │   └── code_agent.py                  # Code and math specialist
    ├── tools/
    │   ├── web_search.py                  # DuckDuckGo (no API key needed)
    │   ├── summarizer.py                  # Condenser via gpt-4o-mini
    │   └── calculator.py                  # Safe AST-based arithmetic
    ├── guardrails/
    │   ├── input_guards.py                # Jailbreak + empty input detection
    │   └── output_guards.py               # PII detection + response length cap
    ├── workflow/
    │   └── session_workflow.py            # Custom VoiceWorkflowBase + SQLiteSession
    └── session/
        └── memory.py                      # SQLiteSession factory

Related Projects

LangGraph Agents — same agent concepts built with LangGraph for comparison

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Research Assistant

What It Does

Tech Stack

Architecture

OpenAI Agents SDK Concepts Demonstrated

Setup

Prerequisites

Install

Run — Web UI

Run — CLI

Example Questions

Session Memory

Project Structure

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
frontend		frontend
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

Voice Research Assistant

What It Does

Tech Stack

Architecture

OpenAI Agents SDK Concepts Demonstrated

Setup

Prerequisites

Install

Run — Web UI

Run — CLI

Example Questions

Session Memory

Project Structure

Related Projects

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages