CLI tool for automated transcription and classification of voice comments from UX research sessions.
Downloads or reads user session recordings → extracts speech → recognizes text with timestamps → groups quotes by user journey stage using an LLM.
- Reads a list of sources from a file (cloud.mail.ru URLs or local video paths)
- Downloads videos via cloud.mail.ru public API (local files — no download needed)
- Extracts audio track via
ffmpeg(WAV 16kHz mono) - Detects voice segments using Silero VAD
- Transcribes speech using Whisper (tiny/base, runs locally)
- Groups quotes by UX journey stage using MiniMax LLM API (one request per video)
- Saves output: one
.txtfile per video
User journey stages:
| Code | Stage | Includes |
|---|---|---|
| A | Search & Selection | Search bar, catalog, filters, sorting, listing |
| B | Exploration | Product card view |
| C | Checkout | Pickup point, date/time, payment method |
- Python 3.10+
- ffmpeg installed and available in
PATH
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
# Verify
ffmpeg -versionThe openai-whisper package registers a whisper console command on install. The application uses the Python API (import whisper), but the CLI is handy for manually checking transcription of individual files:
# Manually transcribe a file (for debugging)
whisper audio.wav --model base --language Russian
# Verify installation
whisper --helpThe
whispercommand appears automatically afterpip install openai-whisper— no separate installation needed.
- CUDA-compatible GPU — speeds up Whisper; without GPU it runs on CPU (slower)
# 1. Clone the repository
git clone <repo_url>
cd uxvt
# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -e ".[dev]"
# 4. Verify tools
ffmpeg -version
whisper --helpcp .env.example .envOpen .env and fill in:
# Required
MINIMAX_API_KEY=your_minimax_api_key_here
# Optional (defaults are fine for most cases)
MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2
MINIMAX_MODEL=MiniMax-Text-01
MINIMAX_MAX_RETRIES=5
MINIMAX_RETRY_BASE_DELAY_SEC=1.0
MINIMAX_RETRY_MAX_DELAY_SEC=60.0
MINIMAX_MAX_PROMPT_CHARS=8000
MINIMAX_BATCH_SIZE=50
WHISPER_MODEL=base
OUTPUT_DIR=./output
TMP_DIR=./tmp
LOG_LEVEL=INFO
VERBOSE=true
⚠️ .envis listed in.gitignoreand will never be committed.
| Variable | Description | Default |
|---|---|---|
MINIMAX_API_KEY |
MiniMax API key — required | — |
MINIMAX_MAX_RETRIES |
Max retry attempts for MiniMax API | 5 |
MINIMAX_RETRY_BASE_DELAY_SEC |
Initial retry delay in seconds (Fibonacci backoff) | 1.0 |
MINIMAX_RETRY_MAX_DELAY_SEC |
Maximum retry delay in seconds | 60.0 |
MINIMAX_MAX_PROMPT_CHARS |
Prompt character limit (batching kicks in if exceeded) | 8000 |
WHISPER_MODEL |
Whisper model: tiny (faster) / base (more accurate) |
base |
OUTPUT_DIR |
Output folder for .txt report files |
./output |
TMP_DIR |
Temp folder for intermediate files (auto-deleted) | ./tmp |
LOG_LEVEL |
Structured log level | INFO |
VERBOSE |
Show progress output in console | true |
Create a plain text file (e.g. links.txt) — one entry per line.
Each entry is either a cloud.mail.ru URL or a local path to a video file:
# Lines starting with # are comments and are ignored
# cloud.mail.ru URLs
https://cloud.mail.ru/public/1DKe/UZ3SBsysM
https://cloud.mail.ru/public/Q4zk/Q1p7QNij1
# Local files
/home/user/videos/5_galina_zoozavr.MP4
./videos/session_03.mp4
# Empty lines are also ignored
python -m uxvt links.txtpython -m uxvt links.txt --output-dir ./reportsWHISPER_MODEL=tiny python -m uxvt links.txtVERBOSE=false python -m uxvt links.txt[1/2] 🔗 https://cloud.mail.ru/public/1DKe/UZ3SBsysM
[1/2] ⬇ Downloading... (562 MB)
[1/2] ✓ Downloaded → tmp/5-galina-wildberries.mp4
[1/2] 🎵 Extracting audio (ffmpeg)...
[1/2] ✓ Audio ready → tmp/5-galina-wildberries.wav
[1/2] 🔊 Voice detection (Silero VAD)...
[1/2] ✓ Voice segments found: 12
[1/2] 📝 Transcription (Whisper base)...
[1/2] ✓ Quotes recognized: 8
[1/2] 🤖 Classification by UJ stage (MiniMax)...
[1/2] ✓ Classification done (A:3, B:2, C:3)
[1/2] 💾 Saved → output/report_5-galina-wildberries.txt
[1/2] 🗑 Temp files deleted
[2/2] 📁 /home/user/videos/5_galina_zoozavr.MP4 ← local file
[2/2] 🎵 Extracting audio (ffmpeg)...
...
════════════════════════════════════════
✅ Processed: 2 ❌ Errors: 0 ⏭ Skipped: 0
Results saved to: ./output/
For each video, a output/report_{slug}.txt file is created:
Video: 5 Galina - Wildberries
Source: https://cloud.mail.ru/public/1DKe/UZ3SBsysM
Processed: 2026-03-15
============================================================
STAGE A: Search & Selection
============================================================
[00:01:12] "Looking for cat food, let me hit search"
[00:02:45] "The filters don't really work well"
============================================================
STAGE B: Product Exploration
============================================================
no comments for this stage
============================================================
STAGE C: Checkout
============================================================
[00:08:03] "I'll pick it up at the pickup point"
============================================================
TOTAL QUOTES: 3
If no speech is detected in the video:
Video: ...
...
no voice comments detected
If the MiniMax API is unavailable — quotes are saved to the report without stage classification, with a warning.
pytest -qpytest tests/unit/ -vpytest tests/integration/ -vpytest tests/unit/test_classifier.py -v
pytest tests/unit/test_downloader.py -v -k "slug"ruff format . && ruff check . && mypy . && pytest -qIntegration tests use real video files from the videos/ folder:
videos/
├── 5_galina_wildberries.MP4
└── 5_galina_zoozavr.MP4
The
videos/folder is not committed (.gitignore). Place files there manually before running integration tests.
If a file is missing — the test is automatically skipped (pytest.mark.skip) and the rest of the suite continues.
Unit tests do not require real videos — all external dependencies (Whisper, Silero VAD, httpx, MiniMax API) are mocked.
uxvt/
├── src/
│ └── uxvt/
│ ├── __main__.py # entry point: python -m uxvt
│ ├── cli.py # CLI argument parsing
│ ├── settings.py # configuration via pydantic-settings
│ ├── pipeline.py # processing orchestrator
│ ├── downloader.py # cloud.mail.ru API downloader
│ ├── audio_extractor.py # ffmpeg wrapper
│ ├── vad.py # Silero VAD wrapper
│ ├── transcriber.py # Whisper wrapper
│ ├── classifier.py # MiniMax API + Fibonacci retry
│ ├── reporter.py # report_*.txt formatting and writing
│ ├── tmp_manager.py # temp file cleanup
│ ├── progress.py # verbose console progress output
│ ├── models.py # dataclasses: VideoSourceResult, ClassifiedReport, etc.
│ └── exceptions.py # domain exceptions
├── tests/
│ ├── conftest.py
│ ├── unit/
│ │ ├── test_downloader.py
│ │ ├── test_audio_extractor.py
│ │ ├── test_vad.py
│ │ ├── test_transcriber.py
│ │ ├── test_classifier.py
│ │ ├── test_reporter.py
│ │ ├── test_tmp_manager.py
│ │ ├── test_progress.py
│ │ └── test_models.py
│ └── integration/
│ ├── conftest.py # video_file_path fixture → videos/
│ └── test_pipeline.py
├── videos/ # test videos (not committed)
├── output/ # results (not committed)
├── tmp/ # temp files (not committed)
├── pyproject.toml
├── .env.example
├── .gitignore
└── README.md
ffmpeg: command not found
→ Install ffmpeg and make sure it's in PATH (see Requirements).
ValidationError: MINIMAX_API_KEY missing
→ Create .env from .env.example and set your API key.
Test skipped: video file not found in videos/
→ Place video files in the videos/ folder (not committed to the repository).
Whisper is slow
→ Switch to WHISPER_MODEL=tiny or use a machine with a GPU.
→ To manually check transcription of a single file: whisper audio.wav --model tiny --language Russian
whisper: command not found
→ Install the package: pip install openai-whisper. The whisper command is registered automatically alongside the Python library.
MiniMax API: LLMClassificationError after all retries
→ Quotes will be saved to the report without stage classification. Check your API key and service availability.