aws-whisper

I bought a video course in German — and then realized I don't speak German.

I needed subtitles for 146 videos averaging 30 minutes each. The obvious solution was OpenAI Whisper API, but at $0.006/min that came out to $33. Not terrible, but it also meant sending content to an external API with content restrictions that could refuse to process certain material.

So I built this instead: spin up an AWS EC2 GPU instance, transcribe everything locally with faster-whisper, translate via Google Translate, and shut it all down automatically. Total cost: ~$1.60 for all 146 videos, no content restrictions, and the transcription quality with distil-large-v3 is excellent.

Transcribes videos to subtitles using AWS EC2 GPU (faster-whisper) and translates to any language via Google Translate. ~$1-3 for 146 videos vs $33 with OpenAI API.

How it works

Extracts audio from videos locally (ffmpeg)
Uploads audio files to S3
Launches an EC2 GPU instance (g4dn.xlarge or g6.xlarge)
Transcribes with faster-whisper on CUDA
Downloads VTT files, translates via Google Translate
Saves <video>.pt.vtt alongside each video
Terminates the instance and cleans up S3

Requirements

AWS account with EC2 GPU quota ("Running On-Demand G and VT instances" ≥ 4 vCPUs)
AWS credentials configured (aws configure or environment variables)

Installation

Option 1: Docker

cp .env.example .env
# edit .env with your credentials

docker build -t aws-whisper .
docker run --rm \
  --env-file .env \
  -v /path/to/videos:/videos \
  aws-whisper /videos --no-spot

Option 2: Manual

Python dependencies

pip install -r requirements.txt

ffmpeg

Ubuntu/Debian:

sudo apt install ffmpeg

macOS:

brew install ffmpeg

Windows:

winget install ffmpeg

AWS CLI

Ubuntu/Debian:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip
unzip awscliv2.zip && sudo ./aws/install
rm -rf awscliv2.zip aws/

macOS:

brew install awscli

Windows:

winget install Amazon.AWSCLI

Configure credentials

Via .env file (recommended):

cp .env.example .env
# edit .env with your credentials
set -a; source .env; set +a
python3 aws_transcribe.py /path/to/videos --no-spot

set -a is required so the variables are exported to the environment (and therefore visible to the Python subprocess) — a plain source .env only sets shell variables, which python3 won't see.

Or via aws configure:

aws configure

Or via environment variables directly:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1

Usage

# Basic (on-demand, default model distil-large-v3, translate to pt-BR)
python3 aws_transcribe.py /path/to/videos --no-spot

# With SSH access to monitor the instance
python3 aws_transcribe.py /path/to/videos --no-spot --ssh

# Custom model and language
python3 aws_transcribe.py /path/to/videos --model large-v3 --lang es --no-spot

# Faster instance (L4 GPU, ~3x faster than T4)
python3 aws_transcribe.py /path/to/videos --instance g6.xlarge --no-spot

# Secure: use IAM instance profile instead of embedding credentials in user-data
python3 aws_transcribe.py /path/to/videos --instance-profile MyWhisperRole --no-spot

# Resume interrupted job (reuses existing S3 bucket and skips already-transcribed files)
python3 aws_transcribe.py /path/to/videos --bucket whisper-<job-id> --no-spot

Options

Option	Default	Description
`--model`	`distil-large-v3`	Whisper model. Options: `tiny`, `base`, `small`, `medium`, `large-v3`, `distil-large-v3`
`--lang`	`pt`	Translation language code (e.g. `es`, `fr`, `de`, `ja`)
`--instance`	`g4dn.xlarge`	EC2 instance type. Recommended: `g6.xlarge` (L4, ~3x faster)
`--region`	`us-east-1`	AWS region
`--no-spot`	off	Use on-demand instead of spot instances
`--bucket`	auto	Reuse existing S3 bucket (for resuming interrupted jobs)
`--timeout`	`180`	Minutes to wait for transcription to complete
`--ssh`	off	Create SSH key pair and security group to monitor the instance
`--instance-profile`	—	IAM instance profile name (avoids embedding credentials in user-data)

Cost estimate

Instance	GPU	Speed	Cost/hr	~146 videos (30 min avg)
g4dn.xlarge	T4	~6 min/video	$0.526	~$7
g6.xlarge	L4	~40 sec/video	$0.805	~$1.60

S3 costs are negligible (<$0.10).

Security

Without --instance-profile, AWS credentials are embedded in the EC2 user-data script (readable via the instance metadata endpoint). For production or shared environments, create an IAM role with S3 access and pass --instance-profile <name>.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
aws_transcribe.py		aws_transcribe.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aws-whisper

How it works

Requirements

Installation

Option 1: Docker

Option 2: Manual

Usage

Options

Cost estimate

Security

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aws-whisper

How it works

Requirements

Installation

Option 1: Docker

Option 2: Manual

Usage

Options

Cost estimate

Security

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages