Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
330 changes: 164 additions & 166 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,166 +1,164 @@
# VoxBridge — Voice-Native AI Support Layer

> Real-time voice calls with AI agents that retrieve domain-specific documentation, execute live API calls via MCP tool calling, and respond with synthesized speech — all within a sub-800ms latency target.

---

## Team Ownership

| Developer | Directory | Responsibility |
|-----------|-----------|----------------|
| Person A | `services/agent/` | FSM, LLM, RAG, TTS, MCP tool calling |
| Person B | `services/webhook/` | Twilio integration, audio forwarding |
| Person C | `services/knowledge/` | Qdrant, Elasticsearch, document ingestion |
| Person D | `scripts/`, `infra/`, `tests/` | DevOps, Docker, CI/CD, integration tests |

---

## Architecture

```
Caller ──► Twilio ──► Node 2 (Webhook + WSS Proxy + Cloudflare)
Node 1 (LiveKit Agent + Ollama LLM + Deepgram STT + ElevenLabs TTS)
Node 3 (Elasticsearch 9.x + Qdrant 1.17.0)
```

### Node Structure

```
services/agent/ # Node 1 — Core AI processing
services/webhook/ # Node 2 — Telephony bridge (Windows 11 + WSL2)
services/knowledge/ # Node 3 — Retrieval & storage
infra/ # Docker compose, Cloudflare config
scripts/ # Setup & verification scripts
docs/ # Architecture documentation
tests/ # Cross-service integration tests
```

---

## Tech Stack

| Component | Technology |
|-----------|------------|
| Telephony | Twilio Programmable Voice + Media Streams |
| Orchestration | LiveKit Agents SDK v0.13.x (Python) |
| Turn Detection | LiveKit Semantic Turn Detection |
| STT | Deepgram Nova-3 Streaming API |
| LLM | Llama 3.1 8B via Ollama v0.13+ (local) |
| TTS Primary | ElevenLabs Streaming API |
| TTS Fallback | Kokoro TTS (local GPU) |
| Vector DB | Qdrant 1.17.0 (self-hosted Docker) |
| Lexical Search | Elasticsearch 9.x (self-hosted Docker) |
| Tool Calling | Model Context Protocol (MCP) |
| Messaging | Apache Kafka 3.x (Docker) |
| Language | Python 3.11 |

---

## Project Setup

### Prerequisites

- **Node 1:** Ubuntu 22.04, 8-core CPU, 32 GB RAM, RTX 4090 (24 GB VRAM)
- **Node 2:** Windows 11 + WSL2, 4-core CPU, 8 GB RAM
- **Node 3:** Ubuntu 22.04, 8-core CPU, 16 GB RAM
- Docker 24.x installed on all nodes
- Python 3.11 on all nodes

### Step 1: Environment Variables

```bash
cp .env.example .env
# Fill in all API keys in .env
```

### Step 2: Start Node 3 (Knowledge Base) — First

```bash
cd infra
docker compose -f docker-compose.node3.yml up -d

# Setup Qdrant multi-tenant collection
cd ../services/knowledge
pip install -r requirements.txt
python setup_qdrant.py

# Ingest documents
python ingest_documents.py --tenant-id <TENANT> --docs-path <PATH_TO_DOCS>
```

### Step 3: Start Node 2 (Telephony Bridge) — Second

```bash
# Start Kafka
cd infra
docker compose -f docker-compose.kafka.yml up -d

# Start Cloudflare tunnel
cloudflared tunnel run voxbridge-tunnel

# Start webhook server
cd ../services/webhook
pip install -r requirements.txt
uvicorn webhook_server:app --port 8000 --host 0.0.0.0
```

Configure Twilio phone number webhook to: `https://<your-tunnel-subdomain>.trycloudflare.com/twilio/incoming`

### Step 4: Start Node 1 (Agent & LLM) — Last

```bash
# Setup Ollama
bash scripts/setup_ollama.sh

# Start agent
cd services/agent
pip install -r requirements.txt
source ../../.env
python agent_main.py dev
```

### Verify All Services

```bash
bash scripts/verify_env.sh
```

---

## Key Performance Targets

| Metric | Target |
|--------|--------|
| End-to-End Latency (P95) | < 800ms |
| LLM TTFT (P95) | < 600ms |
| Deepgram STT | < 300ms |
| Qdrant Retrieval | ~2ms |
| Word Error Rate | < 5% |
| RAG Groundedness | > 0.90 |
| First Call Resolution | > 85% |

---

## Testing

```bash
# FSM unit tests
cd services/agent
pytest test_fsm.py -v

# Webhook tests
cd services/webhook
pytest tests/ -v

# Integration tests
cd tests/
pytest test_integration.py -v

# Latency benchmarks
cd services/agent
python benchmark.py
```
# Product Intelligence Platform

A highly scalable SaaS platform orchestrating autonomous, LLM-driven actions securely across external tooling suites (Jira, Slack, Linear, HubSpot). The system utilizes an event-sourcing architecture mapped to a deterministic AI agent LangGraph state machine.

![Platform Concept](https://img.shields.io/badge/Architecture-Event%20Driven-blue) ![Postgres](https://img.shields.io/badge/Postgres-14-blue) ![LLM](https://img.shields.io/badge/Agent-Llama%203-purple) ![Monorepo](https://img.shields.io/badge/Manager-UV-yellow)

---

## 🏗️ Core Architecture Pattern
The system is bifurcated into two autonomous units bridging security and stateless intelligence.

### 1️⃣ Django Core Service (Resilience & Storage)
The primary backend governing data integrity, relationships, and user authorization mapping.
- **Data Layer:** PostgreSQL 14 (leveraging specific features like GIN Indexes for JSONB fields and Partial Indexing).
- **Control Layer:** Django REST Framework providing the API interface.
- **Workflow Orchestration:** Celery & Redis to handle high-volume ingress streams securely without disrupting HTTP interfaces.
- **Responsibilities:** Idempotency constraints, Organization multi-tenancy rules, Identity tracking, Webhook reception, Dead Letter Queue (DLQ) operations.

### 2️⃣ FastAPI Agent Service (Stateless Intelligence Layer)
An independent intelligent microservice designed exclusively to evaluate natural language.
- **Logic Mapping:** `LangGraph` defining explicit node-based state transitions.
- **Natural Language Parsing:** Natively hosts zero-shot LLM prompts to output strictly typed JSON Pydantic properties.
- **Providers:** Natively hooked via the **Model Context Protocol (MCP)** standard to integrate perfectly securely to target platforms.
- **Self-Healing:** Built-in validator nodes catch hallucinated payloads natively, appending the errors to the prompt for closed-loop, isolated retries.
- **CRITICAL RESTRICTION:** FastApi possesses absolutely zero database write capabilities to ensure the AI pipeline can never autonomously destruct system integrity. It communicates to Django via strictly typed internal URLs.

---

## 🧠 Core Engineering Principles

1. **Event Sourcing First**: All raw webhook payloads from Jira/Slack are immediately dropped into Postgres `JSONB` fields permanently before any AI processing occurs.
2. **Idempotency Assurance**: Enforces strict unique constraints `(integration_id, external_id)` so multiple webhooks never hallucinate duplicate tickets into the system.
3. **Structured AI Constraints**: Output constraints via strict `BaseModel` classes force 8B parameter inference nodes to adhere mathematical rules without guessing fields natively.

---

## 🧭 LangGraph Pipeline Overview

The active LangGraph State machine flows through the following graph properties organically based on payload integrity.

1. **Fetcher Node (MCP)**: Standardizes API pagination routines mapping directly to target tools and stores raw extraction inside the typed dictionaries.
2. **Mapper Node (LLM / LangChain)**: Prompts the unified state to local models natively converting random Webhook/User chat strings into `UnifiedTicketSchema`.
3. **Validator Node (Python native)**: Evaluates the specific generated structure verifying statuses (`open`, `in_progress`), and dates ISO rules.
4. **Router Node**:
- IF valid ➡️ Escalate to Django's persistence layer for Upsert mappings.
- IF invalid ➡️ Bounce back to Pipeline Mapper recursively.
- IF Attempt cutoff (>3x) ➡️ Banish to the manual Django Dead Letter Queue API.

---

## 📂 Project Structure Map

```text
.
├── backend/ # Django monolith orchestrator
│ ├── manage.py
│ ├── config/ # Settings & URL configuration
│ ├── events/ # Routing webhook JSON payloads natively
│ ├── tickets/ # Unified normalized storage models
│ ├── sync/ # Cursor management models
│ └── integrations/ # Tool API Key authorizations
├── agent-service/ # Statelss FastAPI engine
│ ├── src/main.py # Uvicorn boot
│ ├── src/agents/ # Orchestrator & LangGraph nodes
│ ├── src/schemas.py # Strict Pydantic validations
│ └── tests/ # 93%+ Pytest branch coverage suites
├── mcp-servers/ # Model Context Protocol plugins
│ ├── jira/
│ ├── slack/
│ └── hubspot/
├── docker-compose.yml # PostgreSQL & Redis clusters
├── Makefile # UV CLI standardized commands
└── .env.example # Secret template requirements
```

---

## 🚀 Getting Started

### 1. System Requirements
- Docker & Docker Compose
- Ollama runtime logic engine installed natively
- `uv` Python workspace orchestrator

### 2. Local AI Setup
Pull the primary instruction model. We utilize standard models to adhere strictly into typed JSON endpoints natively.
```bash
ollama pull llama3:8b
```

### 3. Environment Config
Ensure you configure the root `.env` to route traffic into the local Ollama node context space:
```ini
OPENAI_API_KEY="ollama"
OPENAI_API_BASE_URL="http://127.0.0.1:11434/v1"
LLM_MODEL="llama3:8b"
LLM_TEMPERATURE=0.0
```

*(Ensure PostgreSQL/JWT variables are correctly patched mirroring `.env.example`)*

### 4. Bootstrapping Local Infrastructure
Provision the backend systems.
```bash
docker-compose up -d
```

### 5. Running the Application Cluster
To guarantee environment isolation, we utilize mapped Make commands via `uv`.

**(Terminal 1) Workspace Orchestration & Environment Initialization**:
First, install all monorepo dependencies into a centrally routed `.venv`.
```bash
make setup
make install
```

**(Terminal 2) Django Management Base**:
Boot the REST interfaces, prepare the primary database schemas, and create the admin identity.
```bash
cd backend
../.venv/bin/python manage.py makemigrations
../.venv/bin/python manage.py migrate

# Optional: Create a superuser for the graphical Django Admin interface
../.venv/bin/python manage.py createsuperuser

# Start the REST API host
../.venv/bin/python manage.py runserver
```

**(Terminal 3) FastAPI LangGraph Agent**:
The core AI orchestration node processes pipeline actions securely.
```bash
make agent
```

**(Terminal 4 & 5) Celery Asynchronous Workers**:
These are absolutely crucial for processing webhook/Long-Running HTTP queries without crashing the web layer.
```bash
# Terminal 4: Start the base Celery background consumer
make celery-worker

# Terminal 5: Start the cron-job heartbeat scheduler
make celery-beat
```

---

## 🧪 Testing Coverage & Linting
Our standardized CI/CD pipelines require total formatting alignment.
```bash
# Evaluate tests inside the LangGraph states
make test-agent

# Format via Ruff/Black standards globally
make fl
```

## 📚 Endpoints Overview
View the `api_reference.md` document artifact for a complete functional catalog tracing the specific interactions defining the frontend capabilities cross-infrastructure.
Loading
Loading