Audio2Sub is an all-in-one subtitle toolkit that helps you automatically generate, translate, and synchronize subtitles using AI. Whether you need to transcribe audio to subtitles, translate subtitles into another language, or fix out-of-sync subtitle timing, Audio2Sub has you covered.
The toolkit includes three command-line tools:
- audio2sub — Automatically transcribe audio from video or audio files and generate
.srtsubtitles. Uses FFmpeg for media handling, Silero VAD for voice activity detection, and multiple transcription backends. - subtranslator — Translate subtitle files from one language to another using AI. Supports translating between any language pair with backends like Google Gemini, Grok, and OpenAI.
- subaligner — Synchronize and align subtitle timing to match a reference subtitle. Ideal for fixing out-of-sync subtitles or aligning translated subtitles to the original timing, even when the subtitles are in different languages.
Before installing, you must have FFmpeg installed and available in your system's PATH.
You can install Audio2Sub using pip. The default installation includes the faster_whisper backend.
pip install audio2sub[faster_whisper]You have to choose a transcription backend to install, or you will get an error about missing dependencies when you run the program.
To install with a specific transcription backend, see the table in the Backends section below.
To install with subtitle translation and alignment support:
# Install with subtitle translator support (Gemini + Grok + OpenAI)
pip install audio2sub[subtranslator]
# Install with subtitle aligner support (Gemini + Grok + OpenAI)
pip install audio2sub[subaligner]
# Install everything
pip install audio2sub[all]audio2sub my_video.mp4 -o my_video.srt --lang enThis command will transcribe the audio from my_video.mp4 into English and save the subtitles to my_video.srt.
Notes:
- First-Time Use: The first time you run the program, it will download the necessary transcription models. This may take some time and require significant disk space.
- CUDA: Performance significantly degraded without CUDA when using whisper-based local models. The program will raise a warning if CUDA is not available when it starts. If your system has a compatible GPU, install the CUDA Toolkit first. If you are sure CUDA has been installed correctly and still get the warning, you may need to reinstall a compatible PyTorch version manually. The reinstallation of PyTorch may break other dependencies if you choose a different version than what you currently have. In this case, you may need to reinstall those according to the warnings shown.
Use the -t or --transcriber flag to select a different backend.
audio2sub my_audio.wav -o my_audio.srt --lang en -t whisper --model mediumEach transcriber has its own options. To see them, use --help with the transcriber specified.
audio2sub -t faster_whisper --helpTranslate subtitle files between languages while preserving timing and formatting. Supports Gemini, Grok, and OpenAI backends.
To use the tool, you need to install the package by running pip install audio2sub[subtranslator] to get the necessary dependencies.
# Translate Chinese subtitles to English using the default backend (Gemini)
subtranslator my_video_zh.srt -s zh -d en -o my_video_en.srt
# Use OpenAI as the translation backend
subtranslator my_video_zh.srt -s zh -d en -o my_video_en.srt -t openaiAutomatically align subtitle timing to a reference subtitle or an audio transcription, even across different languages. Use this to fix out-of-sync subtitles or adapt timing between video versions.
Don't have a reference subtitle? No problem. Use audio2sub to transcribe your video and generate a reference subtitle file, then use subaligner to align your existing subtitle to match the generated timing.
To use the tool, you need to install the package by running pip install audio2sub[subaligner] to get the necessary dependencies.
# Align Chinese subtitles using English reference timing
subaligner -i chinese.srt -r english_reference.srt -o output.srt --src-lang zh --ref-lang en
# Use OpenAI backend
subaligner -i chinese.srt -r english_reference.srt -o output.srt -a openaiAudio2Sub's three tools can be chained together to create powerful subtitle workflows. Here are some common pipeline patterns:
Generate subtitles from a video, then translate them to another language:
audio2sub my_video.mp4 -o my_video_en.srt --lang en && subtranslator my_video_en.srt -s en -d es -o my_video_es.srtCreate a reference subtitle from your video, then align an existing subtitle to match:
audio2sub my_video_bluray.mp4 -o reference.srt --lang en && subaligner -i existing_subtitle.srt -r reference.srt -o aligned.srt --src-lang zh --ref-lang enAudio2Sub provides official Docker images for easy deployment without managing dependencies.
# audio2sub: Transcribe with GPU support (recommended)
docker run --rm --gpus all -v "$(pwd):/media" xavierlam/audio2sub \
my_video.mp4 -o my_video.srt --lang en
# audio2sub: Without GPU support, whisper backend
docker run --rm -v "$(pwd):/media" xavierlam/audio2sub:whisper \
my_video.mp4 -o my_video.srt --lang en
# subtranslator: Translate subtitles
docker run --rm -v "$(pwd):/media" \
-e GEMINI_API_KEY=your_api_key_here \
xavierlam/subtranslator \
my_video_zh.srt -s zh -d en -o my_video_en.srt
# subaligner: Align subtitle timing
docker run --rm -v "$(pwd):/media" \
-e GEMINI_API_KEY=your_api_key_here \
xavierlam/subaligner \
-i chinese.srt -r english_ref.srt -o output.srt --src-lang zh --ref-lang enUse --gpus all to enable GPU support, use different tags to select different backends.
The same workflows work seamlessly in Docker:
# Transcribe and translate in Docker
docker run --rm --gpus all -v "$(pwd):/media" xavierlam/audio2sub \
my_video.mp4 -o en.srt --lang en && \
docker run --rm -v "$(pwd):/media" -e GEMINI_API_KEY=$GEMINI_API_KEY xavierlam/subtranslator \
en.srt -s en -d zh -o zh.srt
# Generate reference and align
docker run --rm --gpus all -v "$(pwd):/media" xavierlam/audio2sub \
my_video.mp4 -o reference.srt --lang en && \
docker run --rm -v "$(pwd):/media" -e GEMINI_API_KEY=$GEMINI_API_KEY xavierlam/subaligner \
-i my_subtitle.srt -r reference.srt -o aligned.srt --src-lang zh --ref-lang en| Image Tag | Backend | Description |
|---|---|---|
xavierlam/audio2sub:latest |
faster-whisper | Recommended (same as faster-whisper) |
xavierlam/audio2sub:faster-whisper |
faster-whisper | Fast CTranslate2-based Whisper |
xavierlam/audio2sub:whisper |
whisper | Original OpenAI Whisper |
xavierlam/audio2sub:gemini |
gemini | Google Gemini API |
| Image Tag | Description |
|---|---|
xavierlam/subtranslator:latest |
AI subtitle translator with Gemini, Grok and OpenAI support |
| Image Tag | Description |
|---|---|
xavierlam/subaligner:latest |
AI subtitle aligner with Gemini, Grok and OpenAI support |
For detailed Docker documentation, GPU setup, and troubleshooting, see docker/README.md.
Audio2Sub supports the following transcription backends.
| Backend Name | Description |
|---|---|
faster_whisper |
A faster reimplementation of Whisper using CTranslate2. See Faster Whisper. This is the default backend. |
whisper |
The original speech recognition model by OpenAI. See OpenAI Whisper. |
gemini |
Google's Gemini model via their API. Requires a GEMINI_API_KEY environment variable or --api-key argument. |
You should use pip install audio2sub[<backend>] to install the desired backend support and use the corresponding transcriber with the -t flag.
Both the subtitle translator and subtitle aligner share the same set of AI backends.
| Backend Name | Description | API Key Env Var |
|---|---|---|
gemini |
Google's Gemini model. Default backend. | GEMINI_API_KEY |
grok |
xAI's Grok model (OpenAI-compatible API). | GROK_API_KEY |
openai |
OpenAI's GPT models. | OPENAI_API_KEY |
Install with pip install audio2sub[<backend>] to get specific backend support. Or you can install with pip install audio2sub[subtranslator] or pip install audio2sub[subaligner] to get all supported backends.
Note: I am not able to get an OpenAI API key to test the OpenAI backend, I don't know if it works as expected.
Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.