Skip to content

camileto/aws-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aws-whisper

I bought a video course in German — and then realized I don't speak German.

I needed subtitles for 146 videos averaging 30 minutes each. The obvious solution was OpenAI Whisper API, but at $0.006/min that came out to $33. Not terrible, but it also meant sending content to an external API with content restrictions that could refuse to process certain material.

So I built this instead: spin up an AWS EC2 GPU instance, transcribe everything locally with faster-whisper, translate via Google Translate, and shut it all down automatically. Total cost: ~$1.60 for all 146 videos, no content restrictions, and the transcription quality with distil-large-v3 is excellent.


Transcribes videos to subtitles using AWS EC2 GPU (faster-whisper) and translates to any language via Google Translate. ~$1-3 for 146 videos vs $33 with OpenAI API.

How it works

  1. Extracts audio from videos locally (ffmpeg)
  2. Uploads audio files to S3
  3. Launches an EC2 GPU instance (g4dn.xlarge or g6.xlarge)
  4. Transcribes with faster-whisper on CUDA
  5. Downloads VTT files, translates via Google Translate
  6. Saves <video>.pt.vtt alongside each video
  7. Terminates the instance and cleans up S3

Requirements

  • AWS account with EC2 GPU quota ("Running On-Demand G and VT instances" ≥ 4 vCPUs)
  • AWS credentials configured (aws configure or environment variables)

Installation

Option 1: Docker

cp .env.example .env
# edit .env with your credentials

docker build -t aws-whisper .
docker run --rm \
  --env-file .env \
  -v /path/to/videos:/videos \
  aws-whisper /videos --no-spot

Option 2: Manual

Python dependencies

pip install -r requirements.txt

ffmpeg

Ubuntu/Debian:

sudo apt install ffmpeg

macOS:

brew install ffmpeg

Windows:

winget install ffmpeg

AWS CLI

Ubuntu/Debian:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip
unzip awscliv2.zip && sudo ./aws/install
rm -rf awscliv2.zip aws/

macOS:

brew install awscli

Windows:

winget install Amazon.AWSCLI

Configure credentials

Via .env file (recommended):

cp .env.example .env
# edit .env with your credentials
set -a; source .env; set +a
python3 aws_transcribe.py /path/to/videos --no-spot

set -a is required so the variables are exported to the environment (and therefore visible to the Python subprocess) — a plain source .env only sets shell variables, which python3 won't see.

Or via aws configure:

aws configure

Or via environment variables directly:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1

Usage

# Basic (on-demand, default model distil-large-v3, translate to pt-BR)
python3 aws_transcribe.py /path/to/videos --no-spot

# With SSH access to monitor the instance
python3 aws_transcribe.py /path/to/videos --no-spot --ssh

# Custom model and language
python3 aws_transcribe.py /path/to/videos --model large-v3 --lang es --no-spot

# Faster instance (L4 GPU, ~3x faster than T4)
python3 aws_transcribe.py /path/to/videos --instance g6.xlarge --no-spot

# Secure: use IAM instance profile instead of embedding credentials in user-data
python3 aws_transcribe.py /path/to/videos --instance-profile MyWhisperRole --no-spot

# Resume interrupted job (reuses existing S3 bucket and skips already-transcribed files)
python3 aws_transcribe.py /path/to/videos --bucket whisper-<job-id> --no-spot

Options

Option Default Description
--model distil-large-v3 Whisper model. Options: tiny, base, small, medium, large-v3, distil-large-v3
--lang pt Translation language code (e.g. es, fr, de, ja)
--instance g4dn.xlarge EC2 instance type. Recommended: g6.xlarge (L4, ~3x faster)
--region us-east-1 AWS region
--no-spot off Use on-demand instead of spot instances
--bucket auto Reuse existing S3 bucket (for resuming interrupted jobs)
--timeout 180 Minutes to wait for transcription to complete
--ssh off Create SSH key pair and security group to monitor the instance
--instance-profile IAM instance profile name (avoids embedding credentials in user-data)

Cost estimate

Instance GPU Speed Cost/hr ~146 videos (30 min avg)
g4dn.xlarge T4 ~6 min/video $0.526 ~$7
g6.xlarge L4 ~40 sec/video $0.805 ~$1.60

S3 costs are negligible (<$0.10).

Security

Without --instance-profile, AWS credentials are embedded in the EC2 user-data script (readable via the instance metadata endpoint). For production or shared environments, create an IAM role with S3 access and pass --instance-profile <name>.

About

Transcribe and translate video subtitles cheaply using AWS EC2 GPU spot instances and faster-whisper — ~$1-3 for 146 videos vs $33 with OpenAI's API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors