Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .claude/commands/feature-capture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
description: Add a new feature to the backlog
---

Add a new feature to the backlog by creating docs/features/[id]/idea.md.

Ask for:
- Feature name
- Type (Feature, Enhancement, Bug Fix, Tech Debt)
- Priority (P0, P1, P2)
- Effort (Small, Medium, Large)
- Impact (Low, Medium, High)
- Problem statement

Then create the feature directory and idea.md file.
12 changes: 12 additions & 0 deletions .claude/commands/feature-init.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
description: Initialize the feature workflow structure in a project
---

Initialize the feature workflow structure in this project.

This will:
1. Create the docs/features/ directory
2. Create an initial DASHBOARD.md file
3. Set up the feature tracking structure

Only run this if docs/features/ doesn't already exist.
12 changes: 12 additions & 0 deletions .claude/commands/feature-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
description: Start implementing a feature from the backlog
---

Start implementing a feature by creating docs/features/[id]/plan.md.

1. Read docs/features/DASHBOARD.md to find features in backlog
2. Let user select a feature or use the one mentioned
3. Read the feature's idea.md
4. Use @project-manager agent to expand requirements
5. Create plan.md with implementation steps
6. The plugin will automatically update the dashboard
13 changes: 13 additions & 0 deletions .claude/commands/feature-ship.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
description: Complete a feature with quality gates
---

Complete a feature by creating docs/features/[id]/shipped.md.

1. Read docs/features/DASHBOARD.md to find features in progress
2. Let user select a feature to complete
3. Run quality gates:
- @security-reviewer for security audit
- @qa-engineer for QA validation
4. Create shipped.md
5. The plugin will automatically update the dashboard
10 changes: 10 additions & 0 deletions .claude/commands/feature-status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
description: Show feature dashboard and project status
---

Show the current feature dashboard with in-progress, backlog, and completed features.

Read docs/features/DASHBOARD.md and provide a summary of:
- Features in progress
- Backlog items
- Recently completed features
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ The following third-party service accounts and API keys are required:
| ------- | ------- | ------- |
| [Daily.co](https://dashboard.daily.co/) | WebRTC/SIP transport for voice calls | [Dashboard](https://dashboard.daily.co/) |
| STT provider (e.g. [Deepgram](https://console.deepgram.com/)) | Speech-to-text (cloud API mode) | Provider console |
| TTS provider (e.g. [Cartesia](https://play.cartesia.ai/)) | Text-to-speech (cloud API mode) | Provider console |
| TTS provider (e.g. [Cartesia](https://play.cartesia.ai/) or [ElevenLabs](https://elevenlabs.io/)) | Text-to-speech (cloud API mode) | Provider console |

### AWS Account Requirements

Expand Down Expand Up @@ -421,10 +421,10 @@ Run `/destroy-project` in Claude Code. This will:

| Mode | STT/TTS | Best For |
| ---- | ------- | -------- |
| **Cloud API** (`USE_CLOUD_APIS=true`) | Deepgram + Cartesia cloud APIs | Getting started, development |
| **Cloud API** (`USE_CLOUD_APIS=true`) | Deepgram + Cartesia/ElevenLabs cloud APIs | Getting started, development |
| **Amazon SageMaker** (default) | Self-hosted on GPU instances | Production, data residency |

Cloud API mode requires Deepgram and Cartesia API keys. Amazon SageMaker mode requires [Deepgram Marketplace subscriptions](docs/reference/deepgram-marketplace-setup.md) and GPU quota.
Cloud API mode requires Deepgram and a TTS provider API key (Cartesia by default, or ElevenLabs with `TTS_PROVIDER=elevenlabs`). Amazon SageMaker mode requires [Deepgram Marketplace subscriptions](docs/reference/deepgram-marketplace-setup.md) and GPU quota.

### Known Issues

Expand Down
5 changes: 4 additions & 1 deletion backend/voice-agent/app/services/config_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ class KnowledgeBaseConfig:

@dataclass
class ProviderConfig:
"""Provider configuration for STT/TTS."""
"""Provider configuration for STT/TTS.

Supported TTS providers: "cartesia", "elevenlabs", "sagemaker"
"""

stt_provider: str = "deepgram"
tts_provider: str = "cartesia"
Expand Down
77 changes: 76 additions & 1 deletion backend/voice-agent/app/services/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

Supports switching between cloud APIs and SageMaker endpoints via configuration:
- STT_PROVIDER: "deepgram" (default, cloud API) or "sagemaker" (Deepgram on SageMaker)
- TTS_PROVIDER: "cartesia" (default, cloud API) or "sagemaker" (Deepgram Aura on SageMaker)
- TTS_PROVIDER: "cartesia" (default, cloud API), "elevenlabs", or "sagemaker" (Deepgram Aura on SageMaker)

Cloud APIs are the default for simpler deployment without SageMaker endpoints.
SageMaker providers use HTTP/2 bidirectional streaming for low-latency, VPC-local inference.
Expand Down Expand Up @@ -136,6 +136,31 @@ def create_tts_service(config: "PipelineConfig"):
encoding="linear16",
)

elif provider == "elevenlabs":
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService

api_key = os.getenv("ELEVENLABS_API_KEY")
if not api_key:
raise ValueError(
"ELEVENLABS_API_KEY environment variable required for TTS"
)

voice_id = _map_voice_id_to_elevenlabs(config.voice_id)
model = os.getenv("ELEVENLABS_MODEL", "eleven_multilingual_v2")

logger.info(
"tts_provider_selected",
provider="elevenlabs",
voice_id=voice_id,
model=model,
)
return ElevenLabsTTSService(
api_key=api_key,
voice_id=voice_id,
model=model,
sample_rate=8000,
)

else:
# Default to Cartesia cloud API
from pipecat.services.cartesia.tts import CartesiaTTSService
Expand Down Expand Up @@ -191,6 +216,56 @@ def _resolve_voice_for_sagemaker(voice_id: str | None) -> str:
return cartesia_to_deepgram.get(voice_id, default_voice)


def _map_voice_id_to_elevenlabs(voice_id: str | None) -> str:
"""
Map voice IDs to ElevenLabs format.

If it's already an ElevenLabs voice ID, returns it directly.
If it's a Cartesia UUID or Deepgram Aura name, maps to a similar ElevenLabs voice.

Args:
voice_id: Voice ID from any provider

Returns:
ElevenLabs voice ID
"""
# Default ElevenLabs voice (Rachel - clear female voice)
default_voice = "21m00Tcm4TlvDq8ikWAM"

if not voice_id:
return default_voice

# If it's not a Cartesia UUID or Deepgram name, assume it's already an ElevenLabs ID
is_cartesia_uuid = len(voice_id) == 36 and voice_id.count("-") == 4
is_deepgram_name = voice_id.startswith("aura")

if not is_cartesia_uuid and not is_deepgram_name:
return voice_id

# Map Cartesia UUIDs to ElevenLabs equivalents
cartesia_to_elevenlabs = {
"79a125e8-cd45-4c13-8a67-188112f4dd22": "21m00Tcm4TlvDq8ikWAM", # British Lady -> Rachel
"b7d50908-b17c-442d-ad8d-810c63997ed9": "EXAVITQu4vr4xnSDxMaL", # California Girl -> Bella
"5345cf08-6f37-424d-a5d9-8ae1101b9377": "MF3mGyEYCl7XYWbV9V6O", # Sweet Lady -> Emily
"a0e99841-438c-4a64-b679-ae501e7d6091": "VR6AewLTigWG4xSOukaG", # Barbershop Man -> Arnold
"fb26447f-308b-471e-8b00-8e9f04284eb5": "ErXwobaYiN019PkySvjV", # Doctor Mischief -> Antoni
}

if is_cartesia_uuid:
return cartesia_to_elevenlabs.get(voice_id, default_voice)

# Map Deepgram Aura voices to ElevenLabs equivalents
deepgram_to_elevenlabs = {
"aura-2-thalia-en": "21m00Tcm4TlvDq8ikWAM", # Thalia -> Rachel
"aura-2-luna-en": "EXAVITQu4vr4xnSDxMaL", # Luna -> Bella
"aura-2-asteria-en": "MF3mGyEYCl7XYWbV9V6O", # Asteria -> Emily
"aura-2-arcas-en": "VR6AewLTigWG4xSOukaG", # Arcas -> Arnold
"aura-2-orpheus-en": "ErXwobaYiN019PkySvjV", # Orpheus -> Antoni
}

return deepgram_to_elevenlabs.get(voice_id, default_voice)


def _map_voice_id_to_cartesia(voice_id: str | None) -> str:
"""
Map voice IDs to Cartesia format.
Expand Down
5 changes: 3 additions & 2 deletions backend/voice-agent/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
# Pipecat Voice Pipeline Dependencies for ECS
# Pin versions for reproducibility

# Core Pipecat with Daily, Silero VAD, Deepgram, Cartesia, and SageMaker support
# Core Pipecat with Daily, Silero VAD, Deepgram, Cartesia, ElevenLabs, and SageMaker support
# - daily: WebRTC transport
# - silero: Voice Activity Detection
# - deepgram: Cloud STT API + SageMaker STT (DeepgramSageMakerSTTService)
# - cartesia: Cloud TTS API
# - elevenlabs: ElevenLabs Cloud TTS API (Turbo v2.5)
# - aws: Bedrock LLM support
# - sagemaker: HTTP/2 BiDi streaming for SageMaker endpoints (requires Python >= 3.12)
# - webrtc: SmallWebRTCTransport for local browser-based prototyping (aiortc)
# - runner: FastAPI dev server + prebuilt WebRTC browser UI
pipecat-ai[daily,silero,deepgram,cartesia,aws,sagemaker,webrtc,runner]==0.0.102
pipecat-ai[daily,silero,deepgram,cartesia,elevenlabs,aws,sagemaker,webrtc,runner]==0.0.102

# AWS SDK - let pip resolve version to match aiobotocore requirements
# aiobotocore 2.25.1 requires botocore>=1.40.46
Expand Down
9 changes: 7 additions & 2 deletions infrastructure/DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ This mode uses Deepgram and Cartesia cloud APIs for STT/TTS. No SageMaker endpoi

You will need API keys for:
- **[Deepgram](https://console.deepgram.com/)** -- Speech-to-Text (Nova-3 model)
- **[Cartesia](https://play.cartesia.ai/)** -- Text-to-Speech (Sonic model)
- **[Cartesia](https://play.cartesia.ai/)** -- Text-to-Speech (Sonic model) *(default)*
- **[ElevenLabs](https://elevenlabs.io/)** -- Text-to-Speech (Multilingual v2 model) *(alternative — set `TTS_PROVIDER=elevenlabs`)*

### Step 1: Configure Environment

Expand Down Expand Up @@ -90,6 +91,9 @@ cat > ../backend/voice-agent/.env << 'EOF'
DEEPGRAM_API_KEY=your-deepgram-api-key
CARTESIA_API_KEY=your-cartesia-api-key
DAILY_API_KEY=your-daily-api-key
# Optional: Use ElevenLabs instead of Cartesia for TTS
# TTS_PROVIDER=elevenlabs
# ELEVENLABS_API_KEY=your-elevenlabs-api-key
EOF

# Push to Secrets Manager
Expand Down Expand Up @@ -446,7 +450,8 @@ cd infrastructure
| Bedrock Claude Haiku | Yes (pay-per-use) | Yes (pay-per-use) |
| Daily.co | Yes (third-party) | Yes (third-party) |
| Deepgram Cloud STT | Yes (third-party) | No (self-hosted) |
| Cartesia Cloud TTS | Yes (third-party) | No (self-hosted) |
| Cartesia Cloud TTS | Yes (third-party, default) | No (self-hosted) |
| ElevenLabs Cloud TTS | Yes (third-party, alternative) | No (self-hosted) |

**Cloud API mode** does not deploy SageMaker endpoints but routes audio through the public internet. **SageMaker mode** keeps all audio within your VPC.

Expand Down
Loading