Skip to content

Latest commit

 

History

History
228 lines (175 loc) · 5.89 KB

File metadata and controls

228 lines (175 loc) · 5.89 KB

Sayna

A high-performance real-time voice processing server providing unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.

Quick Start

docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  saynaai/sayna

The server will be available at http://localhost:3001.

Supported Providers

  • Deepgram - STT and TTS
  • ElevenLabs - STT and TTS
  • Google Cloud - STT and TTS (WaveNet, Neural2, Studio voices)
  • Microsoft Azure - STT and TTS (400+ neural voices)
  • Cartesia - STT and TTS

Environment Variables

Server Configuration

Variable Description Default
HOST Bind address 0.0.0.0
PORT Server port 3001
CACHE_PATH Cache directory for models and audio /app/cache
RUST_LOG Log level (error, warn, info, debug, trace) info

Provider API Keys

Variable Description
DEEPGRAM_API_KEY Deepgram API key
ELEVENLABS_API_KEY ElevenLabs API key
GOOGLE_APPLICATION_CREDENTIALS Path to Google Cloud service account JSON
AZURE_SPEECH_SUBSCRIPTION_KEY Azure Speech subscription key
AZURE_SPEECH_REGION Azure region (default: eastus)
CARTESIA_API_KEY Cartesia API key

LiveKit Integration

Variable Description Default
LIVEKIT_URL LiveKit server WebSocket URL ws://localhost:7880
LIVEKIT_PUBLIC_URL Public LiveKit URL for clients -
LIVEKIT_API_KEY LiveKit API key -
LIVEKIT_API_SECRET LiveKit API secret -

Authentication (Optional)

Variable Description Default
AUTH_REQUIRED Enable authentication false
AUTH_API_SECRETS_JSON API secrets JSON array ([{id, secret}]) -
AUTH_API_SECRET Legacy single API secret -
AUTH_API_SECRET_ID Legacy API secret id for AUTH_API_SECRET default
AUTH_SERVICE_URL External auth service URL -
AUTH_SIGNING_KEY_PATH Path to JWT signing key -
AUTH_TIMEOUT_SECONDS Auth request timeout 5

Minimal multi-secret example:

AUTH_API_SECRETS_JSON='[{"id":"default","secret":"sk_test_default_123"}]'

Recording Storage (Optional)

The same configuration drives both LiveKit Egress uploads and Sayna's GET /recording/{stream_id} download endpoint. Pick one backend.

Common variables:

Variable Description
RECORDING_BACKEND s3 or gcs (auto-detected from the RECORDING_<S3|GCS>_* vars when unset)
RECORDING_PREFIX Object key prefix; {prefix}/{stream_id}/audio.ogg

Amazon S3 (or S3-compatible: MinIO, Cloudflare R2):

Variable Description
RECORDING_S3_BUCKET Bucket name
RECORDING_S3_REGION Region
RECORDING_S3_ACCESS_KEY Access key
RECORDING_S3_SECRET_KEY Secret key
RECORDING_S3_ENDPOINT Optional custom endpoint URL
RECORDING_S3_FORCE_PATH_STYLE true for MinIO/R2 (default: false)

Google Cloud Storage (provide exactly one credentials variable):

Variable Description
RECORDING_GCS_BUCKET Bucket name
RECORDING_GCS_CREDENTIALS_PATH Path to a service-account JSON file
RECORDING_GCS_CREDENTIALS_JSON Inline service-account JSON (alternative to the path)

Docker Compose

version: "3.9"
services:
  sayna:
    image: saynaai/sayna
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

With LiveKit

version: "3.9"
services:
  livekit:
    image: livekit/livekit-server:latest
    environment:
      LIVEKIT_KEYS: "devkey:secret"
    ports:
      - "7880:7880"

  sayna:
    image: saynaai/sayna
    depends_on:
      - livekit
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      LIVEKIT_URL: ws://livekit:7880
      LIVEKIT_PUBLIC_URL: ws://localhost:7880
      LIVEKIT_API_KEY: devkey
      LIVEKIT_API_SECRET: secret
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

Persistent Cache

Mount a volume to /data/cache (or your configured CACHE_PATH) to persist:

  • VAD and turn detection model assets
  • Cached TTS audio outputs
docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  -e CACHE_PATH=/data/cache \
  -v sayna-cache:/data/cache \
  saynaai/sayna

Health Check

curl http://localhost:3001/
# Returns: {"status":"OK"}

API Endpoints

Endpoint Method Description
/ GET Health check
/ws WebSocket Real-time voice processing
/voices GET List available TTS voices
/speak POST Generate speech from text
/livekit/token POST Generate LiveKit participant token
/livekit/webhook POST LiveKit event webhook

Audio-Disabled Mode

Run without provider API keys for development and testing:

docker run -d -p 3001:3001 saynaai/sayna

Then send a WebSocket config with audio_disabled: true:

{
  "type": "config",
  "config": {
    "audio_disabled": true
  }
}

Image Variants

The default image includes:

  • VAD + Turn Detection - Silero-VAD voice activity detection with ONNX-based turn detection
  • Noise Filter - DeepFilterNet noise suppression

Pre-downloaded model assets are included in the image.

Architecture

  • amd64 (x86_64)
  • arm64 (aarch64)

Documentation

For complete documentation, visit https://docs.sayna.ai

Source Code

https://github.com/saynaai/sayna