Skip to content

onursokullu/Whisper.Api

Repository files navigation

Actimind Whisper Service

High-performance speech-to-text transcription service built with FastAPI and WhisperX. Provides enterprise-grade audio transcription with support for multiple languages and speaker diarization.

Features

  • FastAPI-based REST API with automatic documentation
  • WhisperX integration for accurate transcription
  • GPU acceleration support with CUDA
  • Speaker diarization and word-level timestamps
  • Rate limiting and request throttling
  • Structured logging with trace IDs
  • Health check endpoints
  • Docker support for easy deployment

Requirements

  • Python >= 3.11, < 3.13
  • FFmpeg (required for audio processing)
  • CUDA Toolkit (optional, for GPU acceleration)
  • 4GB+ RAM (8GB+ recommended for larger models)

Installation

Using Poetry (Recommended)

# Install Poetry
pip install poetry

# Install dependencies
poetry install

# Install with development dependencies
poetry install --with dev

Using Docker

# CPU version
docker build -t actimind-whisper .

# GPU version
docker build -f Dockerfile.gpu -t actimind-whisper:gpu .

Configuration

Create a .env file in the project root:

# Model Configuration
WHISPER_MODEL=base
DEVICE=cuda  # or cpu
COMPUTE_TYPE=float16  # float16, int8, float32

# API Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1

# Rate Limiting
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60

# Logging
LOG_LEVEL=INFO

Usage

Development Server

# Start the development server
poetry run dev

# With uvicorn directly
poetry run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Production Server

# Using Docker
docker run -p 8000:8000 -e WHISPER_MODEL=base actimind-whisper

# With GPU support
docker run --gpus all -p 8000:8000 -e WHISPER_MODEL=base -e DEVICE=cuda actimind-whisper:gpu

API Documentation

Once the service is running, visit:

Endpoints

POST /transcribe

Upload an audio file for transcription.

Request:

curl -X POST "http://localhost:8000/transcribe" \
  -F "file=@audio.mp3" \
  -F "language=en" \
  -F "task=transcribe"

Response:

{
  "text": "Transcribed text here",
  "segments": [...],
  "language": "en",
  "duration": 10.5
}

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "model": "base",
  "device": "cuda"
}

Development

Available Poetry Scripts

  • poetry run dev - Start development server
  • poetry run test - Run test suite
  • poetry run format - Format code with Black
  • poetry run lint - Run Ruff linter

Architecture

The service follows a clean architecture pattern:

  • app/api/ - API endpoints and middleware
  • app/core/ - Core configuration and model management
  • app/domain/ - Domain models and exceptions
  • app/services/ - Business logic and services

Performance Optimization

Model Selection

  • tiny - Fastest, lowest accuracy (~1GB RAM)
  • base - Good balance (~1GB RAM)
  • small - Better accuracy (~2GB RAM)
  • medium - High accuracy (~5GB RAM)
  • large-v2 - Best accuracy (~10GB RAM)

GPU Acceleration

For GPU support, ensure CUDA is installed:

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

Compute Type

  • float32 - Highest accuracy, slowest
  • float16 - Good balance (requires GPU)
  • int8 - Fastest, lower accuracy

Troubleshooting

FFmpeg not found

# Ubuntu/Debian
apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

CUDA out of memory

  • Use a smaller model
  • Reduce batch size
  • Use int8 compute type

Slow transcription

  • Enable GPU acceleration
  • Use faster-whisper backend
  • Choose appropriate model size

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

A fast api project that communicates with whisper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors