From 66db0fcb387b6f472856403924b41a29e39f1177 Mon Sep 17 00:00:00 2001 From: Daniel Wirjo Date: Wed, 6 May 2026 08:11:03 +1000 Subject: [PATCH 1/2] feat: Add ElevenLabs as supported TTS provider Adds ElevenLabs Turbo v2.5 as an alternative TTS option alongside Cartesia and SageMaker. Set TTS_PROVIDER=elevenlabs and provide ELEVENLABS_API_KEY to use it. Co-Authored-By: Claude Opus 4.6 --- .claude/commands/feature-capture.md | 15 ++++ .claude/commands/feature-init.md | 12 +++ .claude/commands/feature-plan.md | 12 +++ .claude/commands/feature-ship.md | 13 ++++ .claude/commands/feature-status.md | 10 +++ README.md | 6 +- .../app/services/config_service.py | 5 +- backend/voice-agent/app/services/factory.py | 77 ++++++++++++++++++- backend/voice-agent/requirements.txt | 5 +- infrastructure/DEPLOYMENT.md | 9 ++- 10 files changed, 155 insertions(+), 9 deletions(-) create mode 100644 .claude/commands/feature-capture.md create mode 100644 .claude/commands/feature-init.md create mode 100644 .claude/commands/feature-plan.md create mode 100644 .claude/commands/feature-ship.md create mode 100644 .claude/commands/feature-status.md diff --git a/.claude/commands/feature-capture.md b/.claude/commands/feature-capture.md new file mode 100644 index 0000000..6b67365 --- /dev/null +++ b/.claude/commands/feature-capture.md @@ -0,0 +1,15 @@ +--- +description: Add a new feature to the backlog +--- + +Add a new feature to the backlog by creating docs/features/[id]/idea.md. + +Ask for: +- Feature name +- Type (Feature, Enhancement, Bug Fix, Tech Debt) +- Priority (P0, P1, P2) +- Effort (Small, Medium, Large) +- Impact (Low, Medium, High) +- Problem statement + +Then create the feature directory and idea.md file. diff --git a/.claude/commands/feature-init.md b/.claude/commands/feature-init.md new file mode 100644 index 0000000..e6d0071 --- /dev/null +++ b/.claude/commands/feature-init.md @@ -0,0 +1,12 @@ +--- +description: Initialize the feature workflow structure in a project +--- + +Initialize the feature workflow structure in this project. + +This will: +1. Create the docs/features/ directory +2. Create an initial DASHBOARD.md file +3. Set up the feature tracking structure + +Only run this if docs/features/ doesn't already exist. diff --git a/.claude/commands/feature-plan.md b/.claude/commands/feature-plan.md new file mode 100644 index 0000000..fdc5d78 --- /dev/null +++ b/.claude/commands/feature-plan.md @@ -0,0 +1,12 @@ +--- +description: Start implementing a feature from the backlog +--- + +Start implementing a feature by creating docs/features/[id]/plan.md. + +1. Read docs/features/DASHBOARD.md to find features in backlog +2. Let user select a feature or use the one mentioned +3. Read the feature's idea.md +4. Use @project-manager agent to expand requirements +5. Create plan.md with implementation steps +6. The plugin will automatically update the dashboard diff --git a/.claude/commands/feature-ship.md b/.claude/commands/feature-ship.md new file mode 100644 index 0000000..2e203c0 --- /dev/null +++ b/.claude/commands/feature-ship.md @@ -0,0 +1,13 @@ +--- +description: Complete a feature with quality gates +--- + +Complete a feature by creating docs/features/[id]/shipped.md. + +1. Read docs/features/DASHBOARD.md to find features in progress +2. Let user select a feature to complete +3. Run quality gates: + - @security-reviewer for security audit + - @qa-engineer for QA validation +4. Create shipped.md +5. The plugin will automatically update the dashboard diff --git a/.claude/commands/feature-status.md b/.claude/commands/feature-status.md new file mode 100644 index 0000000..6c360f2 --- /dev/null +++ b/.claude/commands/feature-status.md @@ -0,0 +1,10 @@ +--- +description: Show feature dashboard and project status +--- + +Show the current feature dashboard with in-progress, backlog, and completed features. + +Read docs/features/DASHBOARD.md and provide a summary of: +- Features in progress +- Backlog items +- Recently completed features diff --git a/README.md b/README.md index 41cac81..253ff88 100644 --- a/README.md +++ b/README.md @@ -159,7 +159,7 @@ The following third-party service accounts and API keys are required: | ------- | ------- | ------- | | [Daily.co](https://dashboard.daily.co/) | WebRTC/SIP transport for voice calls | [Dashboard](https://dashboard.daily.co/) | | STT provider (e.g. [Deepgram](https://console.deepgram.com/)) | Speech-to-text (cloud API mode) | Provider console | -| TTS provider (e.g. [Cartesia](https://play.cartesia.ai/)) | Text-to-speech (cloud API mode) | Provider console | +| TTS provider (e.g. [Cartesia](https://play.cartesia.ai/) or [ElevenLabs](https://elevenlabs.io/)) | Text-to-speech (cloud API mode) | Provider console | ### AWS Account Requirements @@ -421,10 +421,10 @@ Run `/destroy-project` in Claude Code. This will: | Mode | STT/TTS | Best For | | ---- | ------- | -------- | -| **Cloud API** (`USE_CLOUD_APIS=true`) | Deepgram + Cartesia cloud APIs | Getting started, development | +| **Cloud API** (`USE_CLOUD_APIS=true`) | Deepgram + Cartesia/ElevenLabs cloud APIs | Getting started, development | | **Amazon SageMaker** (default) | Self-hosted on GPU instances | Production, data residency | -Cloud API mode requires Deepgram and Cartesia API keys. Amazon SageMaker mode requires [Deepgram Marketplace subscriptions](docs/reference/deepgram-marketplace-setup.md) and GPU quota. +Cloud API mode requires Deepgram and a TTS provider API key (Cartesia by default, or ElevenLabs with `TTS_PROVIDER=elevenlabs`). Amazon SageMaker mode requires [Deepgram Marketplace subscriptions](docs/reference/deepgram-marketplace-setup.md) and GPU quota. ### Known Issues diff --git a/backend/voice-agent/app/services/config_service.py b/backend/voice-agent/app/services/config_service.py index f08da8d..0ae120e 100644 --- a/backend/voice-agent/app/services/config_service.py +++ b/backend/voice-agent/app/services/config_service.py @@ -29,7 +29,10 @@ class KnowledgeBaseConfig: @dataclass class ProviderConfig: - """Provider configuration for STT/TTS.""" + """Provider configuration for STT/TTS. + + Supported TTS providers: "cartesia", "elevenlabs", "sagemaker" + """ stt_provider: str = "deepgram" tts_provider: str = "cartesia" diff --git a/backend/voice-agent/app/services/factory.py b/backend/voice-agent/app/services/factory.py index d8ea899..0f482fd 100644 --- a/backend/voice-agent/app/services/factory.py +++ b/backend/voice-agent/app/services/factory.py @@ -3,7 +3,7 @@ Supports switching between cloud APIs and SageMaker endpoints via configuration: - STT_PROVIDER: "deepgram" (default, cloud API) or "sagemaker" (Deepgram on SageMaker) -- TTS_PROVIDER: "cartesia" (default, cloud API) or "sagemaker" (Deepgram Aura on SageMaker) +- TTS_PROVIDER: "cartesia" (default, cloud API), "elevenlabs", or "sagemaker" (Deepgram Aura on SageMaker) Cloud APIs are the default for simpler deployment without SageMaker endpoints. SageMaker providers use HTTP/2 bidirectional streaming for low-latency, VPC-local inference. @@ -136,6 +136,31 @@ def create_tts_service(config: "PipelineConfig"): encoding="linear16", ) + elif provider == "elevenlabs": + from pipecat.services.elevenlabs.tts import ElevenLabsTTSService + + api_key = os.getenv("ELEVENLABS_API_KEY") + if not api_key: + raise ValueError( + "ELEVENLABS_API_KEY environment variable required for TTS" + ) + + voice_id = _map_voice_id_to_elevenlabs(config.voice_id) + model = os.getenv("ELEVENLABS_MODEL", "eleven_turbo_v2_5") + + logger.info( + "tts_provider_selected", + provider="elevenlabs", + voice_id=voice_id, + model=model, + ) + return ElevenLabsTTSService( + api_key=api_key, + voice_id=voice_id, + model=model, + sample_rate=8000, + ) + else: # Default to Cartesia cloud API from pipecat.services.cartesia.tts import CartesiaTTSService @@ -191,6 +216,56 @@ def _resolve_voice_for_sagemaker(voice_id: str | None) -> str: return cartesia_to_deepgram.get(voice_id, default_voice) +def _map_voice_id_to_elevenlabs(voice_id: str | None) -> str: + """ + Map voice IDs to ElevenLabs format. + + If it's already an ElevenLabs voice ID, returns it directly. + If it's a Cartesia UUID or Deepgram Aura name, maps to a similar ElevenLabs voice. + + Args: + voice_id: Voice ID from any provider + + Returns: + ElevenLabs voice ID + """ + # Default ElevenLabs voice (Rachel - clear female voice) + default_voice = "21m00Tcm4TlvDq8ikWAM" + + if not voice_id: + return default_voice + + # If it's not a Cartesia UUID or Deepgram name, assume it's already an ElevenLabs ID + is_cartesia_uuid = len(voice_id) == 36 and voice_id.count("-") == 4 + is_deepgram_name = voice_id.startswith("aura") + + if not is_cartesia_uuid and not is_deepgram_name: + return voice_id + + # Map Cartesia UUIDs to ElevenLabs equivalents + cartesia_to_elevenlabs = { + "79a125e8-cd45-4c13-8a67-188112f4dd22": "21m00Tcm4TlvDq8ikWAM", # British Lady -> Rachel + "b7d50908-b17c-442d-ad8d-810c63997ed9": "EXAVITQu4vr4xnSDxMaL", # California Girl -> Bella + "5345cf08-6f37-424d-a5d9-8ae1101b9377": "MF3mGyEYCl7XYWbV9V6O", # Sweet Lady -> Emily + "a0e99841-438c-4a64-b679-ae501e7d6091": "VR6AewLTigWG4xSOukaG", # Barbershop Man -> Arnold + "fb26447f-308b-471e-8b00-8e9f04284eb5": "ErXwobaYiN019PkySvjV", # Doctor Mischief -> Antoni + } + + if is_cartesia_uuid: + return cartesia_to_elevenlabs.get(voice_id, default_voice) + + # Map Deepgram Aura voices to ElevenLabs equivalents + deepgram_to_elevenlabs = { + "aura-2-thalia-en": "21m00Tcm4TlvDq8ikWAM", # Thalia -> Rachel + "aura-2-luna-en": "EXAVITQu4vr4xnSDxMaL", # Luna -> Bella + "aura-2-asteria-en": "MF3mGyEYCl7XYWbV9V6O", # Asteria -> Emily + "aura-2-arcas-en": "VR6AewLTigWG4xSOukaG", # Arcas -> Arnold + "aura-2-orpheus-en": "ErXwobaYiN019PkySvjV", # Orpheus -> Antoni + } + + return deepgram_to_elevenlabs.get(voice_id, default_voice) + + def _map_voice_id_to_cartesia(voice_id: str | None) -> str: """ Map voice IDs to Cartesia format. diff --git a/backend/voice-agent/requirements.txt b/backend/voice-agent/requirements.txt index 8a8e657..31e922c 100644 --- a/backend/voice-agent/requirements.txt +++ b/backend/voice-agent/requirements.txt @@ -1,16 +1,17 @@ # Pipecat Voice Pipeline Dependencies for ECS # Pin versions for reproducibility -# Core Pipecat with Daily, Silero VAD, Deepgram, Cartesia, and SageMaker support +# Core Pipecat with Daily, Silero VAD, Deepgram, Cartesia, ElevenLabs, and SageMaker support # - daily: WebRTC transport # - silero: Voice Activity Detection # - deepgram: Cloud STT API + SageMaker STT (DeepgramSageMakerSTTService) # - cartesia: Cloud TTS API +# - elevenlabs: ElevenLabs Cloud TTS API (Turbo v2.5) # - aws: Bedrock LLM support # - sagemaker: HTTP/2 BiDi streaming for SageMaker endpoints (requires Python >= 3.12) # - webrtc: SmallWebRTCTransport for local browser-based prototyping (aiortc) # - runner: FastAPI dev server + prebuilt WebRTC browser UI -pipecat-ai[daily,silero,deepgram,cartesia,aws,sagemaker,webrtc,runner]==0.0.102 +pipecat-ai[daily,silero,deepgram,cartesia,elevenlabs,aws,sagemaker,webrtc,runner]==0.0.102 # AWS SDK - let pip resolve version to match aiobotocore requirements # aiobotocore 2.25.1 requires botocore>=1.40.46 diff --git a/infrastructure/DEPLOYMENT.md b/infrastructure/DEPLOYMENT.md index 478b71d..8d1853f 100644 --- a/infrastructure/DEPLOYMENT.md +++ b/infrastructure/DEPLOYMENT.md @@ -48,7 +48,8 @@ This mode uses Deepgram and Cartesia cloud APIs for STT/TTS. No SageMaker endpoi You will need API keys for: - **[Deepgram](https://console.deepgram.com/)** -- Speech-to-Text (Nova-3 model) -- **[Cartesia](https://play.cartesia.ai/)** -- Text-to-Speech (Sonic model) +- **[Cartesia](https://play.cartesia.ai/)** -- Text-to-Speech (Sonic model) *(default)* +- **[ElevenLabs](https://elevenlabs.io/)** -- Text-to-Speech (Turbo v2.5 model) *(alternative — set `TTS_PROVIDER=elevenlabs`)* ### Step 1: Configure Environment @@ -90,6 +91,9 @@ cat > ../backend/voice-agent/.env << 'EOF' DEEPGRAM_API_KEY=your-deepgram-api-key CARTESIA_API_KEY=your-cartesia-api-key DAILY_API_KEY=your-daily-api-key +# Optional: Use ElevenLabs instead of Cartesia for TTS +# TTS_PROVIDER=elevenlabs +# ELEVENLABS_API_KEY=your-elevenlabs-api-key EOF # Push to Secrets Manager @@ -446,7 +450,8 @@ cd infrastructure | Bedrock Claude Haiku | Yes (pay-per-use) | Yes (pay-per-use) | | Daily.co | Yes (third-party) | Yes (third-party) | | Deepgram Cloud STT | Yes (third-party) | No (self-hosted) | -| Cartesia Cloud TTS | Yes (third-party) | No (self-hosted) | +| Cartesia Cloud TTS | Yes (third-party, default) | No (self-hosted) | +| ElevenLabs Cloud TTS | Yes (third-party, alternative) | No (self-hosted) | **Cloud API mode** does not deploy SageMaker endpoints but routes audio through the public internet. **SageMaker mode** keeps all audio within your VPC. From 1431a295c5374d1a982281956ce2ad73eb711c77 Mon Sep 17 00:00:00 2001 From: Daniel Wirjo Date: Wed, 6 May 2026 10:44:25 +1000 Subject: [PATCH 2/2] chore: Use eleven_multilingual_v2 as default ElevenLabs model Co-Authored-By: Claude Opus 4.6 --- backend/voice-agent/app/services/factory.py | 2 +- infrastructure/DEPLOYMENT.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/backend/voice-agent/app/services/factory.py b/backend/voice-agent/app/services/factory.py index 0f482fd..b3add24 100644 --- a/backend/voice-agent/app/services/factory.py +++ b/backend/voice-agent/app/services/factory.py @@ -146,7 +146,7 @@ def create_tts_service(config: "PipelineConfig"): ) voice_id = _map_voice_id_to_elevenlabs(config.voice_id) - model = os.getenv("ELEVENLABS_MODEL", "eleven_turbo_v2_5") + model = os.getenv("ELEVENLABS_MODEL", "eleven_multilingual_v2") logger.info( "tts_provider_selected", diff --git a/infrastructure/DEPLOYMENT.md b/infrastructure/DEPLOYMENT.md index 8d1853f..9acd221 100644 --- a/infrastructure/DEPLOYMENT.md +++ b/infrastructure/DEPLOYMENT.md @@ -49,7 +49,7 @@ This mode uses Deepgram and Cartesia cloud APIs for STT/TTS. No SageMaker endpoi You will need API keys for: - **[Deepgram](https://console.deepgram.com/)** -- Speech-to-Text (Nova-3 model) - **[Cartesia](https://play.cartesia.ai/)** -- Text-to-Speech (Sonic model) *(default)* -- **[ElevenLabs](https://elevenlabs.io/)** -- Text-to-Speech (Turbo v2.5 model) *(alternative — set `TTS_PROVIDER=elevenlabs`)* +- **[ElevenLabs](https://elevenlabs.io/)** -- Text-to-Speech (Multilingual v2 model) *(alternative — set `TTS_PROVIDER=elevenlabs`)* ### Step 1: Configure Environment