Skip to content

d4rk5eed/uxvt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UX-Research Video Transcriber

CLI tool for automated transcription and classification of voice comments from UX research sessions.

Downloads or reads user session recordings → extracts speech → recognizes text with timestamps → groups quotes by user journey stage using an LLM.


What it does

  1. Reads a list of sources from a file (cloud.mail.ru URLs or local video paths)
  2. Downloads videos via cloud.mail.ru public API (local files — no download needed)
  3. Extracts audio track via ffmpeg (WAV 16kHz mono)
  4. Detects voice segments using Silero VAD
  5. Transcribes speech using Whisper (tiny/base, runs locally)
  6. Groups quotes by UX journey stage using MiniMax LLM API (one request per video)
  7. Saves output: one .txt file per video

User journey stages:

Code Stage Includes
A Search & Selection Search bar, catalog, filters, sorting, listing
B Exploration Product card view
C Checkout Pickup point, date/time, payment method

Requirements

System

  • Python 3.10+
  • ffmpeg installed and available in PATH
# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Verify
ffmpeg -version

Whisper CLI

The openai-whisper package registers a whisper console command on install. The application uses the Python API (import whisper), but the CLI is handy for manually checking transcription of individual files:

# Manually transcribe a file (for debugging)
whisper audio.wav --model base --language Russian

# Verify installation
whisper --help

The whisper command appears automatically after pip install openai-whisper — no separate installation needed.

Optional

  • CUDA-compatible GPU — speeds up Whisper; without GPU it runs on CPU (slower)

Installation

# 1. Clone the repository
git clone <repo_url>
cd uxvt

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate       # Linux / macOS
# .venv\Scripts\activate        # Windows

# 3. Install dependencies
pip install -e ".[dev]"

# 4. Verify tools
ffmpeg -version
whisper --help

Configuration

1. Create .env file

cp .env.example .env

Open .env and fill in:

# Required
MINIMAX_API_KEY=your_minimax_api_key_here

# Optional (defaults are fine for most cases)
MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2
MINIMAX_MODEL=MiniMax-Text-01
MINIMAX_MAX_RETRIES=5
MINIMAX_RETRY_BASE_DELAY_SEC=1.0
MINIMAX_RETRY_MAX_DELAY_SEC=60.0
MINIMAX_MAX_PROMPT_CHARS=8000
MINIMAX_BATCH_SIZE=50
WHISPER_MODEL=base
OUTPUT_DIR=./output
TMP_DIR=./tmp
LOG_LEVEL=INFO
VERBOSE=true

⚠️ .env is listed in .gitignore and will never be committed.

Parameters

Variable Description Default
MINIMAX_API_KEY MiniMax API key — required
MINIMAX_MAX_RETRIES Max retry attempts for MiniMax API 5
MINIMAX_RETRY_BASE_DELAY_SEC Initial retry delay in seconds (Fibonacci backoff) 1.0
MINIMAX_RETRY_MAX_DELAY_SEC Maximum retry delay in seconds 60.0
MINIMAX_MAX_PROMPT_CHARS Prompt character limit (batching kicks in if exceeded) 8000
WHISPER_MODEL Whisper model: tiny (faster) / base (more accurate) base
OUTPUT_DIR Output folder for .txt report files ./output
TMP_DIR Temp folder for intermediate files (auto-deleted) ./tmp
LOG_LEVEL Structured log level INFO
VERBOSE Show progress output in console true

Usage

Input file format

Create a plain text file (e.g. links.txt) — one entry per line. Each entry is either a cloud.mail.ru URL or a local path to a video file:

# Lines starting with # are comments and are ignored

# cloud.mail.ru URLs
https://cloud.mail.ru/public/1DKe/UZ3SBsysM
https://cloud.mail.ru/public/Q4zk/Q1p7QNij1

# Local files
/home/user/videos/5_galina_zoozavr.MP4
./videos/session_03.mp4

# Empty lines are also ignored

Basic run

python -m uxvt links.txt

Custom output directory

python -m uxvt links.txt --output-dir ./reports

Faster Whisper model (less accurate, quicker)

WHISPER_MODEL=tiny python -m uxvt links.txt

Silent mode (no progress output)

VERBOSE=false python -m uxvt links.txt

Console output example

[1/2] 🔗 https://cloud.mail.ru/public/1DKe/UZ3SBsysM
[1/2] ⬇  Downloading... (562 MB)
[1/2] ✓  Downloaded → tmp/5-galina-wildberries.mp4
[1/2] 🎵 Extracting audio (ffmpeg)...
[1/2] ✓  Audio ready → tmp/5-galina-wildberries.wav
[1/2] 🔊 Voice detection (Silero VAD)...
[1/2] ✓  Voice segments found: 12
[1/2] 📝 Transcription (Whisper base)...
[1/2] ✓  Quotes recognized: 8
[1/2] 🤖 Classification by UJ stage (MiniMax)...
[1/2] ✓  Classification done (A:3, B:2, C:3)
[1/2] 💾 Saved → output/report_5-galina-wildberries.txt
[1/2] 🗑  Temp files deleted

[2/2] 📁 /home/user/videos/5_galina_zoozavr.MP4  ← local file
[2/2] 🎵 Extracting audio (ffmpeg)...
...

════════════════════════════════════════
✅ Processed: 2  ❌ Errors: 0  ⏭  Skipped: 0
Results saved to: ./output/

Output file format

For each video, a output/report_{slug}.txt file is created:

Video: 5 Galina - Wildberries
Source: https://cloud.mail.ru/public/1DKe/UZ3SBsysM
Processed: 2026-03-15

============================================================
STAGE A: Search & Selection
============================================================
[00:01:12] "Looking for cat food, let me hit search"
[00:02:45] "The filters don't really work well"

============================================================
STAGE B: Product Exploration
============================================================
no comments for this stage

============================================================
STAGE C: Checkout
============================================================
[00:08:03] "I'll pick it up at the pickup point"

============================================================
TOTAL QUOTES: 3

If no speech is detected in the video:

Video: ...
...

no voice comments detected

If the MiniMax API is unavailable — quotes are saved to the report without stage classification, with a warning.


Testing

Run all tests

pytest -q

Unit tests only (fast, no real videos needed)

pytest tests/unit/ -v

Integration tests only (require files in videos/)

pytest tests/integration/ -v

Specific module

pytest tests/unit/test_classifier.py -v
pytest tests/unit/test_downloader.py -v -k "slug"

Full check before commit

ruff format . && ruff check . && mypy . && pytest -q

Test videos

Integration tests use real video files from the videos/ folder:

videos/
├── 5_galina_wildberries.MP4
└── 5_galina_zoozavr.MP4

The videos/ folder is not committed (.gitignore). Place files there manually before running integration tests.

If a file is missing — the test is automatically skipped (pytest.mark.skip) and the rest of the suite continues.

Unit tests do not require real videos — all external dependencies (Whisper, Silero VAD, httpx, MiniMax API) are mocked.


Project structure

uxvt/
├── src/
│   └── uxvt/
│       ├── __main__.py        # entry point: python -m uxvt
│       ├── cli.py             # CLI argument parsing
│       ├── settings.py        # configuration via pydantic-settings
│       ├── pipeline.py        # processing orchestrator
│       ├── downloader.py      # cloud.mail.ru API downloader
│       ├── audio_extractor.py # ffmpeg wrapper
│       ├── vad.py             # Silero VAD wrapper
│       ├── transcriber.py     # Whisper wrapper
│       ├── classifier.py      # MiniMax API + Fibonacci retry
│       ├── reporter.py        # report_*.txt formatting and writing
│       ├── tmp_manager.py     # temp file cleanup
│       ├── progress.py        # verbose console progress output
│       ├── models.py          # dataclasses: VideoSourceResult, ClassifiedReport, etc.
│       └── exceptions.py      # domain exceptions
├── tests/
│   ├── conftest.py
│   ├── unit/
│   │   ├── test_downloader.py
│   │   ├── test_audio_extractor.py
│   │   ├── test_vad.py
│   │   ├── test_transcriber.py
│   │   ├── test_classifier.py
│   │   ├── test_reporter.py
│   │   ├── test_tmp_manager.py
│   │   ├── test_progress.py
│   │   └── test_models.py
│   └── integration/
│       ├── conftest.py        # video_file_path fixture → videos/
│       └── test_pipeline.py
├── videos/                    # test videos (not committed)
├── output/                    # results (not committed)
├── tmp/                       # temp files (not committed)
├── pyproject.toml
├── .env.example
├── .gitignore
└── README.md

Troubleshooting

ffmpeg: command not found → Install ffmpeg and make sure it's in PATH (see Requirements).

ValidationError: MINIMAX_API_KEY missing → Create .env from .env.example and set your API key.

Test skipped: video file not found in videos/ → Place video files in the videos/ folder (not committed to the repository).

Whisper is slow → Switch to WHISPER_MODEL=tiny or use a machine with a GPU. → To manually check transcription of a single file: whisper audio.wav --model tiny --language Russian

whisper: command not found → Install the package: pip install openai-whisper. The whisper command is registered automatically alongside the Python library.

MiniMax API: LLMClassificationError after all retries → Quotes will be saved to the report without stage classification. Check your API key and service availability.

About

ux-voice-tagger Transcribe screen recordings and classify user voice comments by UX journey stage. Whisper · Silero VAD · MiniMax LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages