Actimind Whisper Service

High-performance speech-to-text transcription service built with FastAPI and WhisperX. Provides enterprise-grade audio transcription with support for multiple languages and speaker diarization.

Features

FastAPI-based REST API with automatic documentation
WhisperX integration for accurate transcription
GPU acceleration support with CUDA
Speaker diarization and word-level timestamps
Rate limiting and request throttling
Structured logging with trace IDs
Health check endpoints
Docker support for easy deployment

Requirements

Python >= 3.11, < 3.13
FFmpeg (required for audio processing)
CUDA Toolkit (optional, for GPU acceleration)
4GB+ RAM (8GB+ recommended for larger models)

Installation

Using Poetry (Recommended)

# Install Poetry
pip install poetry

# Install dependencies
poetry install

# Install with development dependencies
poetry install --with dev

Using Docker

# CPU version
docker build -t actimind-whisper .

# GPU version
docker build -f Dockerfile.gpu -t actimind-whisper:gpu .

Configuration

Create a .env file in the project root:

# Model Configuration
WHISPER_MODEL=base
DEVICE=cuda  # or cpu
COMPUTE_TYPE=float16  # float16, int8, float32

# API Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1

# Rate Limiting
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60

# Logging
LOG_LEVEL=INFO

Usage

Development Server

# Start the development server
poetry run dev

# With uvicorn directly
poetry run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Production Server

# Using Docker
docker run -p 8000:8000 -e WHISPER_MODEL=base actimind-whisper

# With GPU support
docker run --gpus all -p 8000:8000 -e WHISPER_MODEL=base -e DEVICE=cuda actimind-whisper:gpu

API Documentation

Once the service is running, visit:

Interactive API docs: http://localhost:8000/docs
Alternative API docs: http://localhost:8000/redoc

Endpoints

POST /transcribe

Upload an audio file for transcription.

Request:

curl -X POST "http://localhost:8000/transcribe" \
  -F "file=@audio.mp3" \
  -F "language=en" \
  -F "task=transcribe"

Response:

{
  "text": "Transcribed text here",
  "segments": [...],
  "language": "en",
  "duration": 10.5
}

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "model": "base",
  "device": "cuda"
}

Development

Available Poetry Scripts

poetry run dev - Start development server
poetry run test - Run test suite
poetry run format - Format code with Black
poetry run lint - Run Ruff linter

Architecture

The service follows a clean architecture pattern:

app/api/ - API endpoints and middleware
app/core/ - Core configuration and model management
app/domain/ - Domain models and exceptions
app/services/ - Business logic and services

Performance Optimization

Model Selection

tiny - Fastest, lowest accuracy (~1GB RAM)
base - Good balance (~1GB RAM)
small - Better accuracy (~2GB RAM)
medium - High accuracy (~5GB RAM)
large-v2 - Best accuracy (~10GB RAM)

GPU Acceleration

For GPU support, ensure CUDA is installed:

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

Compute Type

float32 - Highest accuracy, slowest
float16 - Good balance (requires GPU)
int8 - Fastest, lower accuracy

Troubleshooting

FFmpeg not found

# Ubuntu/Debian
apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

CUDA out of memory

Use a smaller model
Reduce batch size
Use int8 compute type

Slow transcription

Enable GPU acceleration
Use faster-whisper backend
Choose appropriate model size

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.poetry		Dockerfile.poetry
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Actimind Whisper Service

Features

Requirements

Installation

Using Poetry (Recommended)

Using Docker

Configuration

Usage

Development Server

Production Server

API Documentation

Endpoints

POST /transcribe

GET /health

Development

Available Poetry Scripts

Architecture

Performance Optimization

Model Selection

GPU Acceleration

Compute Type

Troubleshooting

FFmpeg not found

CUDA out of memory

Slow transcription

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Actimind Whisper Service

Features

Requirements

Installation

Using Poetry (Recommended)

Using Docker

Configuration

Usage

Development Server

Production Server

API Documentation

Endpoints

POST /transcribe

GET /health

Development

Available Poetry Scripts

Architecture

Performance Optimization

Model Selection

GPU Acceleration

Compute Type

Troubleshooting

FFmpeg not found

CUDA out of memory

Slow transcription

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages