We rode from Warsaw to Muscat, Oman β through Turkey, Iraq, Kuwait, Saudi Arabia, and the UAE β and came back with 1 TB of footage across two cameras. Our editor needed to start cutting, but figuring out what was even on each clip was going to take days of scrubbing before any real work could begin.
video-describer points an AI provider at a folder of recordings and comes back with timestamped descriptions of what's happening in each file. Who's there, what they're doing, where they are, what the light looks like. Enough for an editor to know which clips are worth opening before they open them.
It works on common camera and phone video formats backed by ffmpeg, including GoPro, Insta360, and iPhone .mov clips. Whisper transcription is optional β useful when there's actual dialogue you'd want to find later.
VID_20250829_173904 β departure day, Filip and Jadzia pack the motorcycle outside the building
00:15 Filip buckles the roll bags, checks the straps twice
02:30 Jadzia looks at the map on her phone, points at something
08:12 they pull out onto the street, morning light, long shadows
One .txt per file, next to the original. New outputs use the source filename plus .txt, for example video.mp4.txt, so video.mp4 and video.jpg cannot collide. Your editor can grep it, read it, feed it into their own workflow β it's just text.
New .txt files also end with a small metadata footer:
---
source: video.mp4
uuid: d1e2f3a4-...
batch: a3f8b2c1-...
processed: 2026-05-26T12:34:00+00:00
model: claude-sonnet-4-6
Older video.txt outputs from previous versions are still treated as valid legacy results. To update an existing folder without re-processing the media, run:
python3 describe_videos.py /path/to/folder --retrofit-existingThis renames unambiguous legacy files such as video.txt to video.mp4.txt and adds the metadata footer. It does not call any AI provider and does not require an API key. Use --dry-run first if you want to see counters without writing changes.
- macOS (Apple Silicon or Intel), Python 3.11+
- ffmpeg β
brew install ffmpeg - API key for your chosen provider (Anthropic, OpenAI, or Google Gemini)
git clone https://github.com/solarssk/video-describer.git
cd video-describer
./app.shapp.sh creates a virtual environment, installs dependencies, and starts the app β all in one step. On subsequent runs it only reinstalls if requirements.txt has changed.
Open http://localhost:5555. Go to Connectors, paste your API key. Point it at a folder. That's it.
The key is stored locally in config.json β it never leaves your machine.
If the footage has dialogue worth capturing, install a Whisper backend:
# Apple Silicon β runs on the Neural Engine, fast
pip3 install mlx-whisper
# Intel Mac β CPU only
pip3 install faster-whisperYou can run it alongside image analysis or as a standalone transcript. If neither is installed, image analysis still works fine.
After processing, the app can write marker sidecar files next to each .txt:
| Format | File | Works with |
|---|---|---|
| FCPXML | video.mp4.fcpxml |
Final Cut Pro |
| EDL | video.mp4.edl |
DaVinci Resolve |
| FCP7 XML | video.mp4.xml |
Adobe Premiere |
Key moments marked with β in the description become named markers on the timeline. Enable formats in Settings β NLE Export.
Already processed a batch and want to add markers now? Use Convert existing β it reads your .txt files and writes the sidecars at zero API cost. If sidecars already exist (e.g. from an older version with a wrong EDL format), enable Overwrite existing sidecars. Optionally turn on Backup before overwrite to rename the old files to .bak first.
Importing EDL markers into DaVinci Resolve:
- Import your clip into the Media Pool.
- Create a timeline and set its start timecode to
00:00:00:00. - Right-click the timeline in the Media Pool.
- Choose Timelines β Import β Timeline Markers from EDL.
- Select the
.edlfile next to your.txt. - Check the Edit Index β markers should appear at the correct positions.
The timeline start timecode must match the EDL (
00:00:00:00). If the timeline starts at01:00:00:00, markers will be offset or fall outside the clip.
Long batches run in the background. Three ways to know when they're done:
- Browser notification β the browser asks for permission once, then pops a native notification when the batch finishes. Clicking it focuses the tab. Works in Chrome, Firefox, Safari.
- macOS notification β native system notification with filename, cost, and duration.
- Webhook β POST to any URL: Slack, Discord, Make.com, or your own endpoint. Discord embed format is supported automatically.
Configure all three in Settings β Notifications.
If you prefer terminal over browser:
export ANTHROPIC_API_KEY='sk-ant-...'
python3 describe_videos.py /Volumes/GoPro/DCIM/
# with transcription
python3 describe_videos.py . --transcribe --whisper-model medium
# with context
python3 describe_videos.py . \
--people "Filip, Jadzia" \
--context "motorcycle trip, Poland to Oman"- ffmpeg extracts one frame every N seconds (default: 5s, configurable)
- Frames are sent to the AI provider with a system prompt that tells it who the people are and what the trip is about
- The AI returns a timestamped description
- The description is saved as a
.txtnext to the original file
For Insta360 .insv files, it detects both lenses and analyzes them separately.
The app scans the selected folder non-recursively and processes files with these extensions:
| Type | Extensions | Notes |
|---|---|---|
| Video | .mp4, .mov, .avi, .mkv, .mts, .m2ts, .insv |
Includes typical iPhone .mov clips. Actual codec support depends on your local ffmpeg. |
| Photos | .jpg, .jpeg, .png |
iPhone .heic / .heif photos are not currently included. |
Unsupported files are ignored when scanning a folder, and a directly selected unsupported file is shown as unsupported in the UI.
AI & analysis
- π€ Claude, OpenAI GPT-4o, Google Gemini β switch providers in Settings; each has its own model and pricing config
- πΌοΈ Image analysis β frames extracted by ffmpeg, described by AI with timestamps
- ποΈ Speech transcription β optional Whisper integration; mlx-whisper on Apple Silicon, faster-whisper on Intel; auto-fallback to lighter model when the system overheats
- π‘οΈ Thermal protection β Whisper steps down to a lighter model automatically during long batches if the Mac overheats
Batch & workflow
- πΎ Batch resume β if the batch stops (crash, power loss, Stop button), the app stores a manifest in
batch_state.jsonwith one UUID and status per file; on next launch it offers to pick up from file 7/15, $0.43 already spent - π° Budget guard β set a USD cap before starting; the batch stops gracefully before it would exceed it
- β File selection β deselect individual files from the list before starting
- π Folder summary β after each batch,
_summary.txtis written: one line per file with a short description, plus totals - π Convert existing β generate NLE sidecars from already-processed
.txtfiles, no AI calls, no API cost - π Existing output retrofit β upgrade old
stem.txtnaming toname.ext.txtand add metadata footers without re-processing
Export
- π¬ NLE export β FCPXML (Final Cut Pro), EDL (DaVinci Resolve), FCP7 XML (Premiere); β key moments become timeline markers
Notifications
- π Browser notification β Web Notifications API; pops when batch finishes, click focuses the tab
- π macOS notification β native system popup with filename, cost, duration
- π Webhook β POST to Slack, Discord, Make.com, or any HTTP endpoint
UI & observability
- π Live cost tracking β token count and running USD cost in the header
- π Pre-flight check β verifies the API key and ffmpeg before doing any heavy work
- π Log file β everything written to the UI is also appended to
logs/debug.log(daily rotation, 30 days, gitignored) - π PL / EN UI β language dropdown with flag emojis, independent from output language
- βοΈ Settings tab β model, pricing, frame interval, system prompt β editable in the UI without touching files
With claude-sonnet-4-6 at default settings (up to 100 frames per video, 640 px wide):
roughly $0.15β0.25 per 30-minute recording
Live token count and running cost are shown in the header while processing.
Everything is in the Settings tab. For direct edits, see config.json (created from config.default.json on first launch):
| Field | Default | What it does |
|---|---|---|
ai.provider |
anthropic |
Active AI provider (anthropic, openai, gemini) |
ai.anthropic.model |
claude-sonnet-4-6 |
Claude model |
frames.video_width_px |
640 |
Smaller = cheaper, lower detail |
frames.max_per_video |
100 |
Cap per file |
defaults.output_language |
pl |
Language for output scaffolding; independent from UI language |
defaults.people |
β | Pre-filled people list |
defaults.context |
β | Pre-filled trip context |
whisper.default_model |
medium |
Starting Whisper model |
notifications.browser_notify |
false |
Browser Web Notification on batch done |
notifications.macos_notify |
false |
macOS system notification on batch done |
notifications.webhook_url |
β | Webhook POST URL |
The system prompt lives in prompts/system.md. Change it to change the output language, tone, or format. PL and EN presets are available in Settings. The UI language toggle does not change output language.
video-describer/
βββ app.sh β one-command launcher: creates venv, installs deps, starts app
βββ web_app.py β Waitress/Flask app, HTTP endpoints, SSE
βββ processor.py β batch loop, resume state, cost/log plumbing
βββ batch_metadata.py β batch manifest + .txt metadata helpers
βββ describe_videos.py β media/frame/transcription helpers + CLI
βββ output_paths.py β new/legacy output path handling
βββ retrofit_outputs.py β safe upgrade path for existing .txt outputs
βββ timefmt.py β timestamp formatting
βββ nle_export.py β FCPXML / EDL / FCP7 XML sidecar export
βββ config_loader.py
βββ config.default.json β factory settings (in git)
βββ config.json β your settings + API key (gitignored)
βββ providers/
β βββ base.py
β βββ anthropic_provider.py
β βββ openai_provider.py
β βββ gemini_provider.py
βββ prompts/
β βββ system.pl.default.md
β βββ system.en.default.md
β βββ system.md β your prompt (gitignored)
βββ templates/index.html
βββ static/
β βββ style.css
β βββ app.js
β βββ icons/ β favicon + notification icon
β βββ i18n/pl.json, en.json
βββ tools/
βββ macos_path_picker.swift β native folder/file picker (compiled on first use)
Implement AIProvider from providers/base.py β two methods: verify() and describe(). Register it in providers/__init__.py and add a config block under ai.<name> in config.default.json.
Desert Horizons 2025 β Warsaw to Muscat, Oman, through Turkey, Iraq, Kuwait, Saudi Arabia, and the UAE. 11,000+ km on a BMW R1250GS, two cameras, about 1 TB of raw footage.
MiΕosz, who does post-production for our YouTube channel, needed to start cutting. But before any editing, someone had to figure out what was on each clip. That was going to take days.
Editors are already using AI in their workflows. This is the part before that: giving them a map of the material before they open a single file.
External disk ejection β when the batch writes .txt files to an external volume, macOS Spotlight indexes them automatically, which can delay the "eject" command for a few seconds after processing ends. If that's annoying, disable Spotlight for the volume:
sudo mdutil -i off /Volumes/your-diskOr add it via System Settings β Siri & Spotlight β Spotlight Privacy.
MIT