Sayna

A high-performance real-time voice processing server providing unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.

Quick Start

docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  saynaai/sayna

The server will be available at http://localhost:3001.

Supported Providers

Deepgram - STT and TTS
ElevenLabs - STT and TTS
Google Cloud - STT and TTS (WaveNet, Neural2, Studio voices)
Microsoft Azure - STT and TTS (400+ neural voices)
Cartesia - STT and TTS

Environment Variables

Server Configuration

Variable	Description	Default
`HOST`	Bind address	`0.0.0.0`
`PORT`	Server port	`3001`
`CACHE_PATH`	Cache directory for models and audio	`/app/cache`
`RUST_LOG`	Log level (error, warn, info, debug, trace)	`info`

Provider API Keys

Variable	Description
`DEEPGRAM_API_KEY`	Deepgram API key
`ELEVENLABS_API_KEY`	ElevenLabs API key
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google Cloud service account JSON
`AZURE_SPEECH_SUBSCRIPTION_KEY`	Azure Speech subscription key
`AZURE_SPEECH_REGION`	Azure region (default: `eastus`)
`CARTESIA_API_KEY`	Cartesia API key

LiveKit Integration

Variable	Description	Default
`LIVEKIT_URL`	LiveKit server WebSocket URL	`ws://localhost:7880`
`LIVEKIT_PUBLIC_URL`	Public LiveKit URL for clients	-
`LIVEKIT_API_KEY`	LiveKit API key	-
`LIVEKIT_API_SECRET`	LiveKit API secret	-

Authentication (Optional)

Variable	Description	Default
`AUTH_REQUIRED`	Enable authentication	`false`
`AUTH_API_SECRETS_JSON`	API secrets JSON array (`[{id, secret}]`)	-
`AUTH_API_SECRET`	Legacy single API secret	-
`AUTH_API_SECRET_ID`	Legacy API secret id for `AUTH_API_SECRET`	`default`
`AUTH_SERVICE_URL`	External auth service URL	-
`AUTH_SIGNING_KEY_PATH`	Path to JWT signing key	-
`AUTH_TIMEOUT_SECONDS`	Auth request timeout	`5`

Minimal multi-secret example:

AUTH_API_SECRETS_JSON='[{"id":"default","secret":"sk_test_default_123"}]'

Recording Storage (Optional)

The same configuration drives both LiveKit Egress uploads and Sayna's GET /recording/{stream_id} download endpoint. Pick one backend.

Common variables:

Variable	Description
`RECORDING_BACKEND`	`s3` or `gcs` (auto-detected from the `RECORDING_<S3\|GCS>_*` vars when unset)
`RECORDING_PREFIX`	Object key prefix; `{prefix}/{stream_id}/audio.ogg`

Amazon S3 (or S3-compatible: MinIO, Cloudflare R2):

Variable	Description
`RECORDING_S3_BUCKET`	Bucket name
`RECORDING_S3_REGION`	Region
`RECORDING_S3_ACCESS_KEY`	Access key
`RECORDING_S3_SECRET_KEY`	Secret key
`RECORDING_S3_ENDPOINT`	Optional custom endpoint URL
`RECORDING_S3_FORCE_PATH_STYLE`	`true` for MinIO/R2 (default: `false`)

Google Cloud Storage (provide exactly one credentials variable):

Variable	Description
`RECORDING_GCS_BUCKET`	Bucket name
`RECORDING_GCS_CREDENTIALS_PATH`	Path to a service-account JSON file
`RECORDING_GCS_CREDENTIALS_JSON`	Inline service-account JSON (alternative to the path)

Docker Compose

version: "3.9"
services:
  sayna:
    image: saynaai/sayna
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

With LiveKit

version: "3.9"
services:
  livekit:
    image: livekit/livekit-server:latest
    environment:
      LIVEKIT_KEYS: "devkey:secret"
    ports:
      - "7880:7880"

  sayna:
    image: saynaai/sayna
    depends_on:
      - livekit
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      LIVEKIT_URL: ws://livekit:7880
      LIVEKIT_PUBLIC_URL: ws://localhost:7880
      LIVEKIT_API_KEY: devkey
      LIVEKIT_API_SECRET: secret
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

Persistent Cache

Mount a volume to /data/cache (or your configured CACHE_PATH) to persist:

VAD and turn detection model assets
Cached TTS audio outputs

docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  -e CACHE_PATH=/data/cache \
  -v sayna-cache:/data/cache \
  saynaai/sayna

Health Check

curl http://localhost:3001/
# Returns: {"status":"OK"}

API Endpoints

Endpoint	Method	Description
`/`	GET	Health check
`/ws`	WebSocket	Real-time voice processing
`/voices`	GET	List available TTS voices
`/speak`	POST	Generate speech from text
`/livekit/token`	POST	Generate LiveKit participant token
`/livekit/webhook`	POST	LiveKit event webhook

Audio-Disabled Mode

Run without provider API keys for development and testing:

docker run -d -p 3001:3001 saynaai/sayna

Then send a WebSocket config with audio_disabled: true:

{
  "type": "config",
  "config": {
    "audio_disabled": true
  }
}

Image Variants

The default image includes:

VAD + Turn Detection - Silero-VAD voice activity detection with ONNX-based turn detection
Noise Filter - DeepFilterNet noise suppression

Pre-downloaded model assets are included in the image.

Architecture

amd64 (x86_64)
arm64 (aarch64)

Documentation

For complete documentation, visit https://docs.sayna.ai

Source Code

https://github.com/saynaai/sayna

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sayna

Quick Start

Supported Providers

Environment Variables

Server Configuration

Provider API Keys

LiveKit Integration

Authentication (Optional)

Recording Storage (Optional)

Docker Compose

With LiveKit

Persistent Cache

Health Check

API Endpoints

Audio-Disabled Mode

Image Variants

Architecture

Documentation

Source Code

FilesExpand file tree

docker.md

Latest commit

History

docker.md

File metadata and controls

Sayna

Quick Start

Supported Providers

Environment Variables

Server Configuration

Provider API Keys

LiveKit Integration

Authentication (Optional)

Recording Storage (Optional)

Docker Compose

With LiveKit

Persistent Cache

Health Check

API Endpoints

Audio-Disabled Mode

Image Variants

Architecture

Documentation

Source Code