A comprehensive audio transcription tool powered by Whisper models that converts audio/video files into Chinese transcripts. Features include WebM video conversion, robust error handling, and an intuitive CLI interface.
- ๐ต Multi-format Support: WebM, MP4, MKV, AVI, MP3, WAV, FLAC, OGG, AAC, M4A
- ๐ง Multi-Model AI: Support for 6 different transcription models (local & cloud)
- MediaTek Breeze-ASR-25 (้ป่ชไธญๆๆจกๅ)
- OpenAI Whisper (Base/Small/Medium/Large ๆฌๅฐๆจกๅ)
- OpenAI Whisper API (้ฒ็ซฏๆๅ)
- โก Smart Processing: Automatic chunking for long audio files (30-second segments)
- ๐ง Error Recovery: Comprehensive error handling with retry mechanisms and recovery suggestions
- ๐ป Cross-platform: Apple Silicon (MPS), CPU support with automatic fallback
- ๐ Complete Workflow: Video โ Audio โ Transcription with validation at each step
- ๐ฐ Cost Control: API usage tracking and cost estimation for cloud services
- ๐ Easy to Use: Simple command-line interface with automatic output path generation
- โ User-friendly: Clear success/error messages with recovery suggestions
- ๐ Format Detection: Automatic file type detection and processing
- ๐ Progress Tracking: Real-time processing feedback and status reporting
- ๐ฏ Model Selection: Choose from 6 different AI models with
--modelparameter - ๐ Model Management: List models (
--list-models) and view details (--model-info)
- Python 3.13+
- FFmpeg (for video to audio conversion)
- macOS (Apple Silicon recommended for better performance)
- Sufficient memory to load Whisper models
First, install FFmpeg for video conversion:
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegWindows: Download from https://ffmpeg.org/download.html
Install dependencies using uv:
uv syncFor development with linting tools:
uv sync --extra devThe advanced CLI interface makes transcription incredibly easy:
# Basic transcription - automatic output file generation
uv run python -m cli.main input.webm
# Custom output file
uv run python -m cli.main input.webm my_transcription.txt
# Works with any supported format
uv run python -m cli.main meeting.mp3
uv run python -m cli.main video.mp4
uv run python -m cli.main audio.wavChoose from 6 different AI models for optimal results:
# Use specific model
uv run python -m cli.main audio.mp3 --model local_whisper_base
# List all available models
uv run python -m cli.main --list-models
# Get detailed model information
uv run python -m cli.main --model-info local_breeze
# Use OpenAI API (requires OPENAI_API_KEY environment variable)
export OPENAI_API_KEY="your-api-key"
uv run python -m cli.main audio.mp3 --model openai_api| Model ID | Type | Description | Best For |
|---|---|---|---|
local_breeze |
ๆฌๅฐ | MediaTek Breeze-ASR-25 | ไธญๆ่ช้ณ่ญๅฅ (้ป่ช) |
local_whisper_base |
ๆฌๅฐ | OpenAI Whisper Base | ่ผ้็ดๅค่ช่จ่ฝ้ |
local_whisper_small |
ๆฌๅฐ | OpenAI Whisper Small | ๅนณ่กกๆง่ฝ่ๆบ็ขบๅบฆ |
local_whisper_medium |
ๆฌๅฐ | OpenAI Whisper Medium | ้ซๅ่ณช่ฝ้ |
local_whisper_large |
ๆฌๅฐ | OpenAI Whisper Large | ๆ้ซๆบ็ขบๅบฆ |
openai_api |
้ฒ็ซฏ | OpenAI Whisper API | ๅ ๅฎ่ฃ๏ผๆไฝฟ็จไป่ฒป |
# Show help and all available options
uv run python -m cli.main --help
# Show version information
uv run python -m cli.main --version
# Show supported input formats
uv run python -m cli.main --formats
# Show system diagnostics
uv run python -m cli.main --diagnostics
# Model management commands
uv run python -m cli.main --list-models # List all models
uv run python -m cli.main --model-info <model_id> # Model detailsFor backward compatibility, the original interface works with MP3 files:
# Place your audio file as 'meeting.mp3' in the project root
uv run python main.py$ uv run python -m cli.main presentation.webm --model local_whisper_base
โ
Transcription completed successfully
Input: presentation.webm
Output: presentation_transcription.txt
Model: OpenAI Whisper Base (local)The project includes several linting and formatting tools:
# Run linter
uv run ruff check main.py
# Auto-fix linting issues
uv run ruff check --fix main.py
# Format code
uv run black main.py
# Sort imports
uv run isort main.py
# Type checking
uv run mypy main.pyPre-commit hooks automatically run linting and formatting before each commit:
# Install pre-commit hooks (one-time setup)
uv run pre-commit install
# Run hooks manually on all files
uv run pre-commit run --all-files
# Run hooks on specific files
uv run pre-commit run --files main.pyThe hooks will automatically:
- Remove trailing whitespace
- Fix end-of-file issues
- Check YAML syntax
- Run Ruff linter with auto-fix
- Format code with Ruff formatter
- Format code with Black
- Sort imports with isort
- The program will:
- Automatically detect audio file duration
- Process long audio files in segments
- Display processing progress
- Save complete transcription results to
transcription.txt
- Load Audio: Supports MP3 format, automatically converts sample rate to 16kHz
- Audio Preprocessing: Mono conversion and normalization
- Model Loading: Uses MediaTek Breeze-ASR-25 Whisper model
- Segmented Processing: Splits long audio into 30-second chunks for processing
- Transcription Merging: Combines all segment results into complete transcript
- Result Output: Saves to text file
- Audio Processing:
torchaudio+soundfile - Video Conversion: FFmpeg via
ffmpeg-python - Speech Recognition: Multi-model support (Hugging Face Transformers + OpenAI API)
- Hardware Acceleration: Apple MPS (Metal Performance Shaders)
- Package Management:
uv - Testing: pytest with asyncio support (115+ test cases)
- Code Quality: ruff, black, isort, mypy
- Model Management: Dynamic model loading/unloading with service abstraction
scrible-wise/
โโโ main.py # Legacy transcription program
โโโ cli/ # New CLI interface
โ โโโ main.py # Main CLI entry point
โ โโโ integration.py # CLI integration layer
โโโ transcription/ # Core transcription workflow
โ โโโ workflow.py # Complete processing workflow
โโโ converters/ # Media conversion modules
โ โโโ media_converter.py # WebM to MP3 converter
โโโ validators/ # Audio validation modules
โ โโโ audio_validator.py # Audio file validator
โโโ utils/ # Utility modules
โ โโโ ffmpeg_checker.py # FFmpeg dependency checker
โ โโโ file_detector.py # File type detection
โ โโโ error_recovery.py # Error handling and retry logic
โโโ config/ # Configuration management
โ โโโ model_config.py # Model configuration and management
โโโ services/ # Transcription service abstraction
โ โโโ base.py # Base transcription service interface
โ โโโ local_breeze.py # MediaTek Breeze service
โ โโโ local_whisper.py # Local Whisper service
โ โโโ openai_service.py # OpenAI API service
โโโ exceptions/ # Custom exception hierarchy
โ โโโ base.py # Base exception classes
โ โโโ conversion.py # Conversion-related exceptions
โ โโโ validation.py # Validation-related exceptions
โ โโโ transcription.py # Transcription-related exceptions
โโโ tests/ # Comprehensive test suites (115+ tests)
โโโ test_*.py # Unit tests for all modules
โโโ test_workflow_error_integration.py # Integration tests
Scrible Wise includes comprehensive error handling with automatic recovery suggestions:
$ uv run python -m cli.main broken_video.webm
โ Error: FFmpeg not found. FFmpeg is required for media conversion.
Install it using: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu/Debian)The system automatically:
- โ Detects Issues: Identifies missing dependencies, corrupted files, and format problems
- ๐ Retries Operations: Automatic retry with exponential backoff for temporary failures
- ๐ก Provides Solutions: Clear recovery suggestions for common problems
- ๐งน Cleans Up: Automatic cleanup of temporary files on errors
- First run will download Whisper model, internet connection required
- Processing time depends on audio length and hardware performance
- Recommended to run on Apple Silicon Mac for optimal performance
- Supports Chinese speech recognition, limited effectiveness for other languages
The project includes comprehensive test coverage with 115+ test cases covering all modules:
# Run all tests
uv run pytest -v
# Run specific test module
uv run pytest tests/test_workflow_error_integration.py -v
# Test specific model integration
uv run pytest tests/test_workflow_model_integration.py -v
# Test CLI model selection
uv run pytest tests/test_cli_model_selection.py -v
# Run with coverage
uv run pytest --cov=. --cov-report=html| Issue | Solution |
|---|---|
FFmpeg not found |
Install FFmpeg: brew install ffmpeg (macOS) |
Audio loading failed |
Install audio libraries: uv add soundfile librosa |
CUDA errors |
Program auto-switches to MPS/CPU - no action needed |
Model download fails |
Check internet connection, model downloads on first run |
Memory errors |
Try shorter audio files or use lighter models (base/small) |
OpenAI API errors |
Set OPENAI_API_KEY environment variable |
Unknown model |
Use --list-models to see available models |
- Check the error message for recovery suggestions
- Verify FFmpeg installation:
ffmpeg -version - Test with a smaller audio file
- Check available disk space (models need ~2GB)