Skip to content

solarssk/video-describer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

81 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Video Describer

CI Python macOS License: MIT


We rode from Warsaw to Muscat, Oman β€” through Turkey, Iraq, Kuwait, Saudi Arabia, and the UAE β€” and came back with 1 TB of footage across two cameras. Our editor needed to start cutting, but figuring out what was even on each clip was going to take days of scrubbing before any real work could begin.

video-describer points an AI provider at a folder of recordings and comes back with timestamped descriptions of what's happening in each file. Who's there, what they're doing, where they are, what the light looks like. Enough for an editor to know which clips are worth opening before they open them.

It works on common camera and phone video formats backed by ffmpeg, including GoPro, Insta360, and iPhone .mov clips. Whisper transcription is optional β€” useful when there's actual dialogue you'd want to find later.


πŸ“„ What the output looks like

VID_20250829_173904 β€” departure day, Filip and Jadzia pack the motorcycle outside the building
00:15  Filip buckles the roll bags, checks the straps twice
02:30  Jadzia looks at the map on her phone, points at something
08:12  they pull out onto the street, morning light, long shadows

One .txt per file, next to the original. New outputs use the source filename plus .txt, for example video.mp4.txt, so video.mp4 and video.jpg cannot collide. Your editor can grep it, read it, feed it into their own workflow β€” it's just text.

New .txt files also end with a small metadata footer:

---
source: video.mp4
uuid: d1e2f3a4-...
batch: a3f8b2c1-...
processed: 2026-05-26T12:34:00+00:00
model: claude-sonnet-4-6

Older video.txt outputs from previous versions are still treated as valid legacy results. To update an existing folder without re-processing the media, run:

python3 describe_videos.py /path/to/folder --retrofit-existing

This renames unambiguous legacy files such as video.txt to video.mp4.txt and adds the metadata footer. It does not call any AI provider and does not require an API key. Use --dry-run first if you want to see counters without writing changes.


βš™οΈ Requirements


πŸš€ Quick start

git clone https://github.com/solarssk/video-describer.git
cd video-describer
./app.sh

app.sh creates a virtual environment, installs dependencies, and starts the app β€” all in one step. On subsequent runs it only reinstalls if requirements.txt has changed.

Open http://localhost:5555. Go to Connectors, paste your API key. Point it at a folder. That's it.

The key is stored locally in config.json β€” it never leaves your machine.


πŸŽ™οΈ Speech transcription (optional)

If the footage has dialogue worth capturing, install a Whisper backend:

# Apple Silicon β€” runs on the Neural Engine, fast
pip3 install mlx-whisper

# Intel Mac β€” CPU only
pip3 install faster-whisper

You can run it alongside image analysis or as a standalone transcript. If neither is installed, image analysis still works fine.


🎬 NLE export (optional)

After processing, the app can write marker sidecar files next to each .txt:

Format File Works with
FCPXML video.mp4.fcpxml Final Cut Pro
EDL video.mp4.edl DaVinci Resolve
FCP7 XML video.mp4.xml Adobe Premiere

Key moments marked with β˜… in the description become named markers on the timeline. Enable formats in Settings β†’ NLE Export.

Already processed a batch and want to add markers now? Use Convert existing β€” it reads your .txt files and writes the sidecars at zero API cost. If sidecars already exist (e.g. from an older version with a wrong EDL format), enable Overwrite existing sidecars. Optionally turn on Backup before overwrite to rename the old files to .bak first.

Importing EDL markers into DaVinci Resolve:

  1. Import your clip into the Media Pool.
  2. Create a timeline and set its start timecode to 00:00:00:00.
  3. Right-click the timeline in the Media Pool.
  4. Choose Timelines β†’ Import β†’ Timeline Markers from EDL.
  5. Select the .edl file next to your .txt.
  6. Check the Edit Index β€” markers should appear at the correct positions.

The timeline start timecode must match the EDL (00:00:00:00). If the timeline starts at 01:00:00:00, markers will be offset or fall outside the clip.


πŸ”” Notifications (optional)

Long batches run in the background. Three ways to know when they're done:

  • Browser notification β€” the browser asks for permission once, then pops a native notification when the batch finishes. Clicking it focuses the tab. Works in Chrome, Firefox, Safari.
  • macOS notification β€” native system notification with filename, cost, and duration.
  • Webhook β€” POST to any URL: Slack, Discord, Make.com, or your own endpoint. Discord embed format is supported automatically.

Configure all three in Settings β†’ Notifications.


πŸ’» CLI

If you prefer terminal over browser:

export ANTHROPIC_API_KEY='sk-ant-...'

python3 describe_videos.py /Volumes/GoPro/DCIM/

# with transcription
python3 describe_videos.py . --transcribe --whisper-model medium

# with context
python3 describe_videos.py . \
  --people "Filip, Jadzia" \
  --context "motorcycle trip, Poland to Oman"

πŸ” How it works

  1. ffmpeg extracts one frame every N seconds (default: 5s, configurable)
  2. Frames are sent to the AI provider with a system prompt that tells it who the people are and what the trip is about
  3. The AI returns a timestamped description
  4. The description is saved as a .txt next to the original file

For Insta360 .insv files, it detects both lenses and analyzes them separately.


πŸ“ Supported files

The app scans the selected folder non-recursively and processes files with these extensions:

Type Extensions Notes
Video .mp4, .mov, .avi, .mkv, .mts, .m2ts, .insv Includes typical iPhone .mov clips. Actual codec support depends on your local ffmpeg.
Photos .jpg, .jpeg, .png iPhone .heic / .heif photos are not currently included.

Unsupported files are ignored when scanning a folder, and a directly selected unsupported file is shown as unsupported in the UI.


✨ Features

AI & analysis

  • πŸ€– Claude, OpenAI GPT-4o, Google Gemini β€” switch providers in Settings; each has its own model and pricing config
  • πŸ–ΌοΈ Image analysis β€” frames extracted by ffmpeg, described by AI with timestamps
  • πŸŽ™οΈ Speech transcription β€” optional Whisper integration; mlx-whisper on Apple Silicon, faster-whisper on Intel; auto-fallback to lighter model when the system overheats
  • 🌑️ Thermal protection β€” Whisper steps down to a lighter model automatically during long batches if the Mac overheats

Batch & workflow

  • πŸ’Ύ Batch resume β€” if the batch stops (crash, power loss, Stop button), the app stores a manifest in batch_state.json with one UUID and status per file; on next launch it offers to pick up from file 7/15, $0.43 already spent
  • πŸ’° Budget guard β€” set a USD cap before starting; the batch stops gracefully before it would exceed it
  • βœ… File selection β€” deselect individual files from the list before starting
  • πŸ“ Folder summary β€” after each batch, _summary.txt is written: one line per file with a short description, plus totals
  • πŸ”„ Convert existing β€” generate NLE sidecars from already-processed .txt files, no AI calls, no API cost
  • πŸ” Existing output retrofit β€” upgrade old stem.txt naming to name.ext.txt and add metadata footers without re-processing

Export

  • 🎬 NLE export β€” FCPXML (Final Cut Pro), EDL (DaVinci Resolve), FCP7 XML (Premiere); β˜… key moments become timeline markers

Notifications

  • πŸ”” Browser notification β€” Web Notifications API; pops when batch finishes, click focuses the tab
  • 🍎 macOS notification β€” native system popup with filename, cost, duration
  • πŸ”— Webhook β€” POST to Slack, Discord, Make.com, or any HTTP endpoint

UI & observability

  • πŸ“Š Live cost tracking β€” token count and running USD cost in the header
  • πŸ” Pre-flight check β€” verifies the API key and ffmpeg before doing any heavy work
  • πŸ“‹ Log file β€” everything written to the UI is also appended to logs/debug.log (daily rotation, 30 days, gitignored)
  • 🌐 PL / EN UI β€” language dropdown with flag emojis, independent from output language
  • βš™οΈ Settings tab β€” model, pricing, frame interval, system prompt β€” editable in the UI without touching files

πŸ’΅ Cost

With claude-sonnet-4-6 at default settings (up to 100 frames per video, 640 px wide):

roughly $0.15–0.25 per 30-minute recording

Live token count and running cost are shown in the header while processing.


πŸ› οΈ Configuration

Everything is in the Settings tab. For direct edits, see config.json (created from config.default.json on first launch):

Field Default What it does
ai.provider anthropic Active AI provider (anthropic, openai, gemini)
ai.anthropic.model claude-sonnet-4-6 Claude model
frames.video_width_px 640 Smaller = cheaper, lower detail
frames.max_per_video 100 Cap per file
defaults.output_language pl Language for output scaffolding; independent from UI language
defaults.people β€” Pre-filled people list
defaults.context β€” Pre-filled trip context
whisper.default_model medium Starting Whisper model
notifications.browser_notify false Browser Web Notification on batch done
notifications.macos_notify false macOS system notification on batch done
notifications.webhook_url β€” Webhook POST URL

The system prompt lives in prompts/system.md. Change it to change the output language, tone, or format. PL and EN presets are available in Settings. The UI language toggle does not change output language.


πŸ—‚οΈ Project structure

video-describer/
β”œβ”€β”€ app.sh                   β€” one-command launcher: creates venv, installs deps, starts app
β”œβ”€β”€ web_app.py               β€” Waitress/Flask app, HTTP endpoints, SSE
β”œβ”€β”€ processor.py             β€” batch loop, resume state, cost/log plumbing
β”œβ”€β”€ batch_metadata.py        β€” batch manifest + .txt metadata helpers
β”œβ”€β”€ describe_videos.py       β€” media/frame/transcription helpers + CLI
β”œβ”€β”€ output_paths.py          β€” new/legacy output path handling
β”œβ”€β”€ retrofit_outputs.py      β€” safe upgrade path for existing .txt outputs
β”œβ”€β”€ timefmt.py               β€” timestamp formatting
β”œβ”€β”€ nle_export.py            β€” FCPXML / EDL / FCP7 XML sidecar export
β”œβ”€β”€ config_loader.py
β”œβ”€β”€ config.default.json      β€” factory settings (in git)
β”œβ”€β”€ config.json              β€” your settings + API key (gitignored)
β”œβ”€β”€ providers/
β”‚   β”œβ”€β”€ base.py
β”‚   β”œβ”€β”€ anthropic_provider.py
β”‚   β”œβ”€β”€ openai_provider.py
β”‚   └── gemini_provider.py
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ system.pl.default.md
β”‚   β”œβ”€β”€ system.en.default.md
β”‚   └── system.md            β€” your prompt (gitignored)
β”œβ”€β”€ templates/index.html
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ style.css
β”‚   β”œβ”€β”€ app.js
β”‚   β”œβ”€β”€ icons/               β€” favicon + notification icon
β”‚   └── i18n/pl.json, en.json
└── tools/
    └── macos_path_picker.swift  β€” native folder/file picker (compiled on first use)

πŸ”Œ Adding a provider

Implement AIProvider from providers/base.py β€” two methods: verify() and describe(). Register it in providers/__init__.py and add a config block under ai.<name> in config.default.json.


🌍 Origin

Desert Horizons 2025 β€” Warsaw to Muscat, Oman, through Turkey, Iraq, Kuwait, Saudi Arabia, and the UAE. 11,000+ km on a BMW R1250GS, two cameras, about 1 TB of raw footage.

MiΕ‚osz, who does post-production for our YouTube channel, needed to start cutting. But before any editing, someone had to figure out what was on each clip. That was going to take days.

Editors are already using AI in their workflows. This is the part before that: giving them a map of the material before they open a single file.


πŸ’‘ Tips

External disk ejection β€” when the batch writes .txt files to an external volume, macOS Spotlight indexes them automatically, which can delay the "eject" command for a few seconds after processing ends. If that's annoying, disable Spotlight for the volume:

sudo mdutil -i off /Volumes/your-disk

Or add it via System Settings β†’ Siri & Spotlight β†’ Spotlight Privacy.


πŸ“„ License

MIT

About

Generates timestamped descriptions of GoPro / Insta360 footage using Claude AI. Built for editors who need a map of the material before they start cutting.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors