A high-performance real-time voice processing server providing unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.
docker run -d \
-p 3001:3001 \
-e DEEPGRAM_API_KEY=your-key \
saynaai/saynaThe server will be available at http://localhost:3001.
- Deepgram - STT and TTS
- ElevenLabs - STT and TTS
- Google Cloud - STT and TTS (WaveNet, Neural2, Studio voices)
- Microsoft Azure - STT and TTS (400+ neural voices)
- Cartesia - STT and TTS
| Variable | Description | Default |
|---|---|---|
HOST |
Bind address | 0.0.0.0 |
PORT |
Server port | 3001 |
CACHE_PATH |
Cache directory for models and audio | /app/cache |
RUST_LOG |
Log level (error, warn, info, debug, trace) | info |
| Variable | Description |
|---|---|
DEEPGRAM_API_KEY |
Deepgram API key |
ELEVENLABS_API_KEY |
ElevenLabs API key |
GOOGLE_APPLICATION_CREDENTIALS |
Path to Google Cloud service account JSON |
AZURE_SPEECH_SUBSCRIPTION_KEY |
Azure Speech subscription key |
AZURE_SPEECH_REGION |
Azure region (default: eastus) |
CARTESIA_API_KEY |
Cartesia API key |
| Variable | Description | Default |
|---|---|---|
LIVEKIT_URL |
LiveKit server WebSocket URL | ws://localhost:7880 |
LIVEKIT_PUBLIC_URL |
Public LiveKit URL for clients | - |
LIVEKIT_API_KEY |
LiveKit API key | - |
LIVEKIT_API_SECRET |
LiveKit API secret | - |
| Variable | Description | Default |
|---|---|---|
AUTH_REQUIRED |
Enable authentication | false |
AUTH_API_SECRETS_JSON |
API secrets JSON array ([{id, secret}]) |
- |
AUTH_API_SECRET |
Legacy single API secret | - |
AUTH_API_SECRET_ID |
Legacy API secret id for AUTH_API_SECRET |
default |
AUTH_SERVICE_URL |
External auth service URL | - |
AUTH_SIGNING_KEY_PATH |
Path to JWT signing key | - |
AUTH_TIMEOUT_SECONDS |
Auth request timeout | 5 |
Minimal multi-secret example:
AUTH_API_SECRETS_JSON='[{"id":"default","secret":"sk_test_default_123"}]'The same configuration drives both LiveKit Egress uploads and Sayna's
GET /recording/{stream_id} download endpoint. Pick one backend.
Common variables:
| Variable | Description |
|---|---|
RECORDING_BACKEND |
s3 or gcs (auto-detected from the RECORDING_<S3|GCS>_* vars when unset) |
RECORDING_PREFIX |
Object key prefix; {prefix}/{stream_id}/audio.ogg |
Amazon S3 (or S3-compatible: MinIO, Cloudflare R2):
| Variable | Description |
|---|---|
RECORDING_S3_BUCKET |
Bucket name |
RECORDING_S3_REGION |
Region |
RECORDING_S3_ACCESS_KEY |
Access key |
RECORDING_S3_SECRET_KEY |
Secret key |
RECORDING_S3_ENDPOINT |
Optional custom endpoint URL |
RECORDING_S3_FORCE_PATH_STYLE |
true for MinIO/R2 (default: false) |
Google Cloud Storage (provide exactly one credentials variable):
| Variable | Description |
|---|---|
RECORDING_GCS_BUCKET |
Bucket name |
RECORDING_GCS_CREDENTIALS_PATH |
Path to a service-account JSON file |
RECORDING_GCS_CREDENTIALS_JSON |
Inline service-account JSON (alternative to the path) |
version: "3.9"
services:
sayna:
image: saynaai/sayna
ports:
- "3001:3001"
environment:
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
CACHE_PATH: /data/cache
volumes:
- sayna-cache:/data/cache
volumes:
sayna-cache: {}version: "3.9"
services:
livekit:
image: livekit/livekit-server:latest
environment:
LIVEKIT_KEYS: "devkey:secret"
ports:
- "7880:7880"
sayna:
image: saynaai/sayna
depends_on:
- livekit
ports:
- "3001:3001"
environment:
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
LIVEKIT_URL: ws://livekit:7880
LIVEKIT_PUBLIC_URL: ws://localhost:7880
LIVEKIT_API_KEY: devkey
LIVEKIT_API_SECRET: secret
CACHE_PATH: /data/cache
volumes:
- sayna-cache:/data/cache
volumes:
sayna-cache: {}Mount a volume to /data/cache (or your configured CACHE_PATH) to persist:
- VAD and turn detection model assets
- Cached TTS audio outputs
docker run -d \
-p 3001:3001 \
-e DEEPGRAM_API_KEY=your-key \
-e CACHE_PATH=/data/cache \
-v sayna-cache:/data/cache \
saynaai/saynacurl http://localhost:3001/
# Returns: {"status":"OK"}| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check |
/ws |
WebSocket | Real-time voice processing |
/voices |
GET | List available TTS voices |
/speak |
POST | Generate speech from text |
/livekit/token |
POST | Generate LiveKit participant token |
/livekit/webhook |
POST | LiveKit event webhook |
Run without provider API keys for development and testing:
docker run -d -p 3001:3001 saynaai/saynaThen send a WebSocket config with audio_disabled: true:
{
"type": "config",
"config": {
"audio_disabled": true
}
}The default image includes:
- VAD + Turn Detection - Silero-VAD voice activity detection with ONNX-based turn detection
- Noise Filter - DeepFilterNet noise suppression
Pre-downloaded model assets are included in the image.
- amd64 (x86_64)
- arm64 (aarch64)
For complete documentation, visit https://docs.sayna.ai