GitHub - Nyralei/whisperx-api-server: FastAPI server for WhisperX transcription library

Overview

WhisperX API Server is a FastAPI-based server designed to transcribe audio files using pluggable stage backends, with WhisperX (https://github.com/m-bain/WhisperX) as the default implementation. The API offers an OpenAI-like interface that allows users to upload audio files and receive transcription results in various formats. It supports customizable options such as different models, languages, temperature settings, and more.

Features

Audio Transcription: Transcribe audio files using the configured transcription backend.
Model Caching: Load and cache models for reusability and faster performance.
OpenAI-like API, based on https://platform.openai.com/docs/api-reference/audio/createTranscription and https://platform.openai.com/docs/api-reference/audio/createTranslation
Pluggable pipeline stages: choose backend per stage (transcription, alignment, diarization) and mix different backends.

API Endpoints

`POST /v1/audio/transcriptions`

https://platform.openai.com/docs/api-reference/audio/createTranscription

Parameters:

file: The audio file to transcribe.
model (str): Model name for the configured transcription backend. If whisper-1 is provided, it is replaced with the configured default transcription model.
language (str | null): Language code for transcription. Default is config.default_language.
prompt (str | null): Optional transcription prompt. Default is null.
response_format (str): One of text, json, verbose_json, vtt_json, srt, vtt, aud. Default is config.default_response_format.
temperature (float): Temperature setting for transcription. Default is 0.0.
timestamp_granularities[] (list[str]): Timestamp granularity values (segment, word). Default is ["segment"].
stream (bool): OpenAI-compatible streaming flag. Currently accepted but not used by the server. Default is False.
hotwords (str | null): Optional hotwords for transcription. Default is null.
suppress_numerals (bool): Suppress numerals in transcription. Default is True.
highlight_words (bool): Highlight words in subtitle-style outputs (vtt, srt). Default is False.
align (bool): Enable transcription timing alignment. Default is True.
diarize (bool): Enable speaker diarization. Default is False.
speaker_embeddings (bool): Include speaker embeddings during diarization flow. Default is False.
chunk_size (int): Chunk size (seconds) for VAD segment merging. Default is config.whisper.chunk_size.
batch_size (int): Batch size used during inference. Default is config.whisper.batch_size.

Returns: Transcription output in the requested response_format:

json: JSON object with text.
verbose_json: Full transcript JSON object.
vtt_json: Full transcript JSON object plus vtt_text.
text, srt, vtt, aud: Plain text response body.

`POST /v1/audio/translations`

https://platform.openai.com/docs/api-reference/audio/createTranslation

Parameters:

file: The audio file to translate.
model (str): Model name for the configured transcription backend. If whisper-1 is provided, it is replaced with the configured default transcription model.
prompt (str): Optional translation prompt. Default is an empty string.
response_format (str): One of text, json, verbose_json, vtt_json, srt, vtt, aud. Default is config.default_response_format.
temperature (float): Temperature setting for translation. Default is 0.0.
chunk_size (int): Chunk size (seconds) for VAD segment merging. Default is config.whisper.chunk_size.
batch_size (int): Batch size used during inference. Default is config.whisper.batch_size.

Returns: Translation output in the requested response_format (same response behavior as /v1/audio/transcriptions).

`GET /healthcheck`

Returns current API health status as JSON: {"status": "healthy"}.

`GET /models/list`

Lists loaded transcription models.

`POST /models/unload`

Unloads a transcription model from cache.

Parameters:

model (str): Model name to unload.

`POST /models/load`

Loads a transcription model into cache.

Parameters:

model (str): Model name to load.

`GET /align_models/list`

Lists loaded alignment models.

`POST /align_models/unload`

Unloads an alignment model.

Parameters:

language (str): Language code of the alignment model to unload.

`POST /align_models/load`

Loads an alignment model.

Parameters:

language (str): Language code of the alignment model to load.

`GET /diarize_models/list`

Lists loaded diarization models.

`POST /diarize_models/unload`

Unloads a diarization model.

Parameters:

model (str): Diarization model name to unload.

`POST /diarize_models/load`

Loads a diarization model.

Parameters:

model (str): Diarization model name to load.

Backend Selection

You can define default backend per pipeline stage through environment variables:

BACKENDS__TRANSCRIPTION=whisperx
BACKENDS__ALIGNMENT=whisperx
BACKENDS__DIARIZATION=whisperx

By default, only the whisperx backend is registered. Additional backends can be added and combined per stage.

Model management endpoints (/models/*, /align_models/*, /diarize_models/*) operate through the configured stage backends.

Running the API

With Docker:

For CPU:

    docker compose build whisperx-api-server-cpu

    docker compose up whisperx-api-server-cpu

For CUDA (GPU):

    docker compose build whisperx-api-server-cuda

    docker compose up whisperx-api-server-cuda

Contributing

Feel free to submit issues, fork the repository, and send pull requests to contribute to the project.

License

This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github/workflows		.github/workflows
src/whisperx_api_server		src/whisperx_api_server
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.cuda		Dockerfile.cuda
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
cuda-docker-entrypoint.sh		cuda-docker-entrypoint.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

API Endpoints

`POST /v1/audio/transcriptions`

`POST /v1/audio/translations`

`GET /healthcheck`

`GET /models/list`

`POST /models/unload`

`POST /models/load`

`GET /align_models/list`

`POST /align_models/unload`

`POST /align_models/load`

`GET /diarize_models/list`

`POST /diarize_models/unload`

`POST /diarize_models/load`

Backend Selection

Running the API

Contributing

License

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

API Endpoints

POST /v1/audio/transcriptions

POST /v1/audio/translations

GET /healthcheck

GET /models/list

POST /models/unload

POST /models/load

GET /align_models/list

POST /align_models/unload

POST /align_models/load

GET /diarize_models/list

POST /diarize_models/unload

POST /diarize_models/load

Backend Selection

Running the API

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/audio/transcriptions`

`POST /v1/audio/translations`

`GET /healthcheck`

`GET /models/list`

`POST /models/unload`

`POST /models/load`

`GET /align_models/list`

`POST /align_models/unload`

`POST /align_models/load`

`GET /diarize_models/list`

`POST /diarize_models/unload`

`POST /diarize_models/load`

Packages