High-performance speech-to-text transcription service built with FastAPI and WhisperX. Provides enterprise-grade audio transcription with support for multiple languages and speaker diarization.
- FastAPI-based REST API with automatic documentation
- WhisperX integration for accurate transcription
- GPU acceleration support with CUDA
- Speaker diarization and word-level timestamps
- Rate limiting and request throttling
- Structured logging with trace IDs
- Health check endpoints
- Docker support for easy deployment
- Python >= 3.11, < 3.13
- FFmpeg (required for audio processing)
- CUDA Toolkit (optional, for GPU acceleration)
- 4GB+ RAM (8GB+ recommended for larger models)
# Install Poetry
pip install poetry
# Install dependencies
poetry install
# Install with development dependencies
poetry install --with dev# CPU version
docker build -t actimind-whisper .
# GPU version
docker build -f Dockerfile.gpu -t actimind-whisper:gpu .Create a .env file in the project root:
# Model Configuration
WHISPER_MODEL=base
DEVICE=cuda # or cpu
COMPUTE_TYPE=float16 # float16, int8, float32
# API Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1
# Rate Limiting
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
# Logging
LOG_LEVEL=INFO# Start the development server
poetry run dev
# With uvicorn directly
poetry run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000# Using Docker
docker run -p 8000:8000 -e WHISPER_MODEL=base actimind-whisper
# With GPU support
docker run --gpus all -p 8000:8000 -e WHISPER_MODEL=base -e DEVICE=cuda actimind-whisper:gpuOnce the service is running, visit:
- Interactive API docs: http://localhost:8000/docs
- Alternative API docs: http://localhost:8000/redoc
Upload an audio file for transcription.
Request:
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@audio.mp3" \
-F "language=en" \
-F "task=transcribe"Response:
{
"text": "Transcribed text here",
"segments": [...],
"language": "en",
"duration": 10.5
}Health check endpoint.
Response:
{
"status": "healthy",
"model": "base",
"device": "cuda"
}poetry run dev- Start development serverpoetry run test- Run test suitepoetry run format- Format code with Blackpoetry run lint- Run Ruff linter
The service follows a clean architecture pattern:
app/api/- API endpoints and middlewareapp/core/- Core configuration and model managementapp/domain/- Domain models and exceptionsapp/services/- Business logic and services
tiny- Fastest, lowest accuracy (~1GB RAM)base- Good balance (~1GB RAM)small- Better accuracy (~2GB RAM)medium- High accuracy (~5GB RAM)large-v2- Best accuracy (~10GB RAM)
For GPU support, ensure CUDA is installed:
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"float32- Highest accuracy, slowestfloat16- Good balance (requires GPU)int8- Fastest, lower accuracy
# Ubuntu/Debian
apt-get install ffmpeg
# macOS
brew install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html- Use a smaller model
- Reduce batch size
- Use
int8compute type
- Enable GPU acceleration
- Use faster-whisper backend
- Choose appropriate model size
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request