Skip to content

musingfox/scribe-wise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Scrible Wise

A comprehensive audio transcription tool powered by Whisper models that converts audio/video files into Chinese transcripts. Features include WebM video conversion, robust error handling, and an intuitive CLI interface.

Features

Core Features

  • ๐ŸŽต Multi-format Support: WebM, MP4, MKV, AVI, MP3, WAV, FLAC, OGG, AAC, M4A
  • ๐Ÿง  Multi-Model AI: Support for 6 different transcription models (local & cloud)
    • MediaTek Breeze-ASR-25 (้ป˜่ชไธญๆ–‡ๆจกๅž‹)
    • OpenAI Whisper (Base/Small/Medium/Large ๆœฌๅœฐๆจกๅž‹)
    • OpenAI Whisper API (้›ฒ็ซฏๆœๅ‹™)
  • โšก Smart Processing: Automatic chunking for long audio files (30-second segments)
  • ๐Ÿ”ง Error Recovery: Comprehensive error handling with retry mechanisms and recovery suggestions
  • ๐Ÿ’ป Cross-platform: Apple Silicon (MPS), CPU support with automatic fallback
  • ๐Ÿ“ Complete Workflow: Video โ†’ Audio โ†’ Transcription with validation at each step
  • ๐Ÿ’ฐ Cost Control: API usage tracking and cost estimation for cloud services

Advanced CLI Interface

  • ๐Ÿš€ Easy to Use: Simple command-line interface with automatic output path generation
  • โœ… User-friendly: Clear success/error messages with recovery suggestions
  • ๐Ÿ” Format Detection: Automatic file type detection and processing
  • ๐Ÿ“Š Progress Tracking: Real-time processing feedback and status reporting
  • ๐ŸŽฏ Model Selection: Choose from 6 different AI models with --model parameter
  • ๐Ÿ“‹ Model Management: List models (--list-models) and view details (--model-info)

System Requirements

  • Python 3.13+
  • FFmpeg (for video to audio conversion)
  • macOS (Apple Silicon recommended for better performance)
  • Sufficient memory to load Whisper models

Installation

Install FFmpeg

First, install FFmpeg for video conversion:

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Install Python Dependencies

Install dependencies using uv:

uv sync

For development with linting tools:

uv sync --extra dev

Quick Start

๐Ÿš€ Simple Usage (Recommended)

The advanced CLI interface makes transcription incredibly easy:

# Basic transcription - automatic output file generation
uv run python -m cli.main input.webm

# Custom output file
uv run python -m cli.main input.webm my_transcription.txt

# Works with any supported format
uv run python -m cli.main meeting.mp3
uv run python -m cli.main video.mp4
uv run python -m cli.main audio.wav

๐ŸŽฏ Model Selection

Choose from 6 different AI models for optimal results:

# Use specific model
uv run python -m cli.main audio.mp3 --model local_whisper_base

# List all available models
uv run python -m cli.main --list-models

# Get detailed model information
uv run python -m cli.main --model-info local_breeze

# Use OpenAI API (requires OPENAI_API_KEY environment variable)
export OPENAI_API_KEY="your-api-key"
uv run python -m cli.main audio.mp3 --model openai_api

๐Ÿ“‹ Available Models

Model ID Type Description Best For
local_breeze ๆœฌๅœฐ MediaTek Breeze-ASR-25 ไธญๆ–‡่ชž้Ÿณ่ญ˜ๅˆฅ (้ป˜่ช)
local_whisper_base ๆœฌๅœฐ OpenAI Whisper Base ่ผ•้‡็ดšๅคš่ชž่จ€่ฝ‰้Œ„
local_whisper_small ๆœฌๅœฐ OpenAI Whisper Small ๅนณ่กกๆ€ง่ƒฝ่ˆ‡ๆบ–็ขบๅบฆ
local_whisper_medium ๆœฌๅœฐ OpenAI Whisper Medium ้ซ˜ๅ“่ณช่ฝ‰้Œ„
local_whisper_large ๆœฌๅœฐ OpenAI Whisper Large ๆœ€้ซ˜ๆบ–็ขบๅบฆ
openai_api ้›ฒ็ซฏ OpenAI Whisper API ๅ…ๅฎ‰่ฃ๏ผŒๆŒ‰ไฝฟ็”จไป˜่ฒป

๐Ÿ“‹ CLI Options

# Show help and all available options
uv run python -m cli.main --help

# Show version information
uv run python -m cli.main --version

# Show supported input formats
uv run python -m cli.main --formats

# Show system diagnostics
uv run python -m cli.main --diagnostics

# Model management commands
uv run python -m cli.main --list-models              # List all models
uv run python -m cli.main --model-info <model_id>    # Model details

๐Ÿ”„ Legacy Usage (Still Supported)

For backward compatibility, the original interface works with MP3 files:

# Place your audio file as 'meeting.mp3' in the project root
uv run python main.py

๐Ÿ’ก Example Output

$ uv run python -m cli.main presentation.webm --model local_whisper_base

โœ… Transcription completed successfully
Input: presentation.webm
Output: presentation_transcription.txt
Model: OpenAI Whisper Base (local)

Development

Code Quality Tools

The project includes several linting and formatting tools:

# Run linter
uv run ruff check main.py

# Auto-fix linting issues
uv run ruff check --fix main.py

# Format code
uv run black main.py

# Sort imports
uv run isort main.py

# Type checking
uv run mypy main.py

Pre-commit Hooks

Pre-commit hooks automatically run linting and formatting before each commit:

# Install pre-commit hooks (one-time setup)
uv run pre-commit install

# Run hooks manually on all files
uv run pre-commit run --all-files

# Run hooks on specific files
uv run pre-commit run --files main.py

The hooks will automatically:

  • Remove trailing whitespace
  • Fix end-of-file issues
  • Check YAML syntax
  • Run Ruff linter with auto-fix
  • Format code with Ruff formatter
  • Format code with Black
  • Sort imports with isort
  1. The program will:
    • Automatically detect audio file duration
    • Process long audio files in segments
    • Display processing progress
    • Save complete transcription results to transcription.txt

Program Flow

  1. Load Audio: Supports MP3 format, automatically converts sample rate to 16kHz
  2. Audio Preprocessing: Mono conversion and normalization
  3. Model Loading: Uses MediaTek Breeze-ASR-25 Whisper model
  4. Segmented Processing: Splits long audio into 30-second chunks for processing
  5. Transcription Merging: Combines all segment results into complete transcript
  6. Result Output: Saves to text file

Technical Architecture

  • Audio Processing: torchaudio + soundfile
  • Video Conversion: FFmpeg via ffmpeg-python
  • Speech Recognition: Multi-model support (Hugging Face Transformers + OpenAI API)
  • Hardware Acceleration: Apple MPS (Metal Performance Shaders)
  • Package Management: uv
  • Testing: pytest with asyncio support (115+ test cases)
  • Code Quality: ruff, black, isort, mypy
  • Model Management: Dynamic model loading/unloading with service abstraction

Module Structure

scrible-wise/
โ”œโ”€โ”€ main.py                         # Legacy transcription program
โ”œโ”€โ”€ cli/                            # New CLI interface
โ”‚   โ”œโ”€โ”€ main.py                     # Main CLI entry point
โ”‚   โ””โ”€โ”€ integration.py              # CLI integration layer
โ”œโ”€โ”€ transcription/                  # Core transcription workflow
โ”‚   โ””โ”€โ”€ workflow.py                 # Complete processing workflow
โ”œโ”€โ”€ converters/                     # Media conversion modules
โ”‚   โ””โ”€โ”€ media_converter.py          # WebM to MP3 converter
โ”œโ”€โ”€ validators/                     # Audio validation modules
โ”‚   โ””โ”€โ”€ audio_validator.py          # Audio file validator
โ”œโ”€โ”€ utils/                          # Utility modules
โ”‚   โ”œโ”€โ”€ ffmpeg_checker.py           # FFmpeg dependency checker
โ”‚   โ”œโ”€โ”€ file_detector.py            # File type detection
โ”‚   โ””โ”€โ”€ error_recovery.py           # Error handling and retry logic
โ”œโ”€โ”€ config/                         # Configuration management
โ”‚   โ””โ”€โ”€ model_config.py             # Model configuration and management
โ”œโ”€โ”€ services/                       # Transcription service abstraction
โ”‚   โ”œโ”€โ”€ base.py                     # Base transcription service interface
โ”‚   โ”œโ”€โ”€ local_breeze.py             # MediaTek Breeze service
โ”‚   โ”œโ”€โ”€ local_whisper.py            # Local Whisper service
โ”‚   โ””โ”€โ”€ openai_service.py           # OpenAI API service
โ”œโ”€โ”€ exceptions/                     # Custom exception hierarchy
โ”‚   โ”œโ”€โ”€ base.py                     # Base exception classes
โ”‚   โ”œโ”€โ”€ conversion.py               # Conversion-related exceptions
โ”‚   โ”œโ”€โ”€ validation.py               # Validation-related exceptions
โ”‚   โ””โ”€โ”€ transcription.py            # Transcription-related exceptions
โ””โ”€โ”€ tests/                          # Comprehensive test suites (115+ tests)
    โ”œโ”€โ”€ test_*.py                   # Unit tests for all modules
    โ””โ”€โ”€ test_workflow_error_integration.py  # Integration tests

Error Handling & Recovery

Scrible Wise includes comprehensive error handling with automatic recovery suggestions:

$ uv run python -m cli.main broken_video.webm

โŒ Error: FFmpeg not found. FFmpeg is required for media conversion.
Install it using: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu/Debian)

The system automatically:

  • โœ… Detects Issues: Identifies missing dependencies, corrupted files, and format problems
  • ๐Ÿ”„ Retries Operations: Automatic retry with exponential backoff for temporary failures
  • ๐Ÿ’ก Provides Solutions: Clear recovery suggestions for common problems
  • ๐Ÿงน Cleans Up: Automatic cleanup of temporary files on errors

Notes

  • First run will download Whisper model, internet connection required
  • Processing time depends on audio length and hardware performance
  • Recommended to run on Apple Silicon Mac for optimal performance
  • Supports Chinese speech recognition, limited effectiveness for other languages

Testing

The project includes comprehensive test coverage with 115+ test cases covering all modules:

# Run all tests
uv run pytest -v

# Run specific test module
uv run pytest tests/test_workflow_error_integration.py -v

# Test specific model integration
uv run pytest tests/test_workflow_model_integration.py -v

# Test CLI model selection
uv run pytest tests/test_cli_model_selection.py -v

# Run with coverage
uv run pytest --cov=. --cov-report=html

Troubleshooting

Common Issues

Issue Solution
FFmpeg not found Install FFmpeg: brew install ffmpeg (macOS)
Audio loading failed Install audio libraries: uv add soundfile librosa
CUDA errors Program auto-switches to MPS/CPU - no action needed
Model download fails Check internet connection, model downloads on first run
Memory errors Try shorter audio files or use lighter models (base/small)
OpenAI API errors Set OPENAI_API_KEY environment variable
Unknown model Use --list-models to see available models

Getting Help

  1. Check the error message for recovery suggestions
  2. Verify FFmpeg installation: ffmpeg -version
  3. Test with a smaller audio file
  4. Check available disk space (models need ~2GB)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages