framedex

A queryable knowledge base for your video archive.

Turn a scattered video archive — across multiple SSDs and years — into a portable, plain-text knowledge base. Each clip gets a .description.md sidecar with GPS location + place name, a speaker-diarized multilingual transcript, an English translation (if needed), face detection, and an AI vision scene description with a keep/review/cull rating.

Sidecars live next to the videos. Originals are never modified. Local-first, non-destructive, resumable.

framedex is a Claude Code skill. It installs the vidx command-line tool.

Install

# Clone into your Claude Code skills directory
git clone git@github.com:Simbastack-hq/framedex.git ~/.claude/skills/framedex

# Install deps + pre-download the Whisper + face-detection models
python3 ~/.claude/skills/framedex/scripts/setup.py

Quick start

# 1. Get a Hugging Face token + accept pyannote terms (one-time, for diarization)
#    https://huggingface.co/pyannote/speaker-diarization-3.1   (click Agree)
#    https://huggingface.co/pyannote/segmentation-3.0          (click Agree)
#    https://huggingface.co/settings/tokens                    (create read token)
export HF_TOKEN=hf_yourTokenHere

# 2. (Optional) Set an Anthropic API key — only needed for --backend api
export ANTHROPIC_API_KEY=sk-ant-...

# 3. Add aliases to ~/.zshrc
alias vidx="python3 $HOME/.claude/skills/framedex/scripts/index_videos.py"
alias vidx-summary="python3 $HOME/.claude/skills/framedex/scripts/trip_summary.py"
alias vidx-master="python3 $HOME/.claude/skills/framedex/scripts/master_index.py"
alias vidx-query="python3 $HOME/.claude/skills/framedex/scripts/query.py"

# 4. Test on 5 clips before unleashing on a full drive
vidx /Volumes/SSD-2024 --max-files 5

# 5. Inspect the sidecars. If happy, run the full drive.
vidx /Volumes/SSD-2024

# 6. After indexing, generate folder summaries + a master index
vidx-summary /Volumes/SSD-2024
vidx-master  /Volumes/SSD-2024

Per-clip pipeline

ffprobe → metadata (duration, codec, resolution, creation date)
exiftool → GPS lat/lon/altitude
Nominatim → reverse-geocoded place name (rate-limited 1/sec, polite UA)
ffmpeg → 5 evenly-spaced JPEG frames (≤1920px wide)
ffmpeg → mono 16k WAV
WhisperX → Whisper transcribe + word-level alignment + pyannote diarization
WhisperX translate mode → English translation (non-English only)
insightface → face detection + 512-dim embeddings on the same frames
Vision model → single-call structured description (Scene/Subjects/Action/Mood/Shot type/Use cases) + keep/review/cull rating
Write [filename].description.md next to the video

What sidecars look like

---
file: IMG_4827.mov
path: /Volumes/SSD-2024/2024-08-construction/drone/IMG_4827.mov
parent_folder: drone
duration_seconds: 12.3
resolution: 3840x2160
codec: hvc1
size_bytes: 245678912
creation_time: 2024-08-14T07:23:11Z
location:
  lat: 37.7456
  lon: -119.5936
  altitude_m: 1842.5
  place: "Yosemite Valley, Mariposa County, USA"
language_detected: es
speaker_count: 2
rating: keep
indexed_at: 2026-05-17T14:32:01
---

# IMG_4827.mov

## Description

**Scene:** Wide drone aerial of a construction site at golden hour...
**Subjects:** Three workers in high-vis vests near a partially-built structure...
**Action:** Drone slowly orbits; workers carry materials between two structures.
**Mood:** Industrious, expansive, hopeful.
**Shot type:** Drone aerial, slow orbit.
**Use cases:**
- Construction milestone post
- "From the ground up" origin-story reel
- B-roll behind a voiceover

## Transcript (es, 2 speakers)

[SPEAKER_00] (00:00:01) Pon esta viga aquí primero.
[SPEAKER_01] (00:00:04) Sí, vale.
[SPEAKER_00] (00:00:07) Cuidado con el ángulo.

## English translation

Place this beam here first. Yes, OK. Careful with the angle.

Optional folder context

Drop .video-context.md at the root of any scan target to give the vision model better priors:

/Volumes/SSD-2024/.video-context.md
---
This drive contains construction-site footage, 2023-2026. Many clips
are drone aerials, crew training, and site walkthroughs. Languages mix
English and Spanish.
---

Without it, descriptions are generic.

Proper-noun biasing

A .video-context.md can also carry a line of names Whisper should spell correctly:

**Whisper proper nouns:** Yosemite, El Capitan, Half Dome, ...

These get passed to Whisper as initial_prompt + hotwords so place names and people names in speech don't come back garbled. A second regex pass (~/.framedex/whisper_fixes.json) catches anything the prompt bias misses.

Multiple SSDs

Run on each drive separately:

vidx /Volumes/SSD-2023
vidx /Volumes/SSD-2024
vidx /Volumes/SSD-2025

Each drive ends up self-contained with its own sidecars + _INDEX.json. Knowledge travels with the data. The face DB at ~/.framedex/faces.db is centralized so cross-drive person queries work.

Common flags

Flag	Purpose
`--dry-run`	Show what would be processed; no API/model calls
`--max-files N`	Stop after N clips (testing)
`--force`	Re-process clips even if a sidecar exists
`--whisper-model large-v3`	Higher quality, slower (default is large-v3-turbo)
`--no-diarize`	Skip speaker diarization (faster; no HF_TOKEN needed)
`--no-faces`	Skip face detection + embeddings
`--no-geocode`	Skip Nominatim reverse geocoding (GPS still recorded)
`--max-duration MINUTES`	Skip clips longer than N minutes (default: 30; 0 = no limit)
`--exclude PATTERN`	Skip paths matching substring (repeatable)
`--backend cli\|api\|local`	Vision backend (see below)
`--vision-model haiku\|sonnet`	Claude model for `cli`/`api`. Default `haiku`
`--local-base-url URL`	Override LM Studio endpoint (default `http://localhost:1234/v1`)
`--local-model NAME`	Specify which loaded model to use when LM Studio has multiple
`--no-whisper-prompt`	Disable proper-noun biasing
`--whisper-fixes PATH`	Override the canonical-name regex fixes file

Vision backends

Backend	What it uses	Speed	Cost	Privacy
`cli` (default)	`claude -p` via a Claude Max subscription	~10-30s/clip	$0 marginal	Frames sent to Anthropic
`api`	Anthropic SDK with an API key	~2-3s/clip	~$0.002/clip (Haiku)	Frames sent to Anthropic
`local`	LM Studio (or any OpenAI-compatible server)	~3-90s/clip	$0	Fully local, fully offline

For huge archives, api is fastest. For routine indexing on a Max plan, cli is free. For full privacy, local keeps everything on-device.

Privacy

Component	Local or cloud?
ffmpeg, exiftool, Whisper, pyannote, insightface	Local
Nominatim reverse geocode	Cloud — sends lat/lon only, never video. Skip with `--no-geocode`
Vision (`--backend cli`/`api`)	Cloud — sends 5 JPEG frames + a transcript snippet per clip
Vision (`--backend local`)	Fully local
Face DB (`~/.framedex/faces.db`)	Local only, never uploaded

Languages

Whisper supports 99 languages with auto-detection. For non-English clips the script automatically runs a second translate-mode pass and stores the English version alongside the original transcript. For best quality on important non-English footage:

vidx /Volumes/SSD-2024 --whisper-model large-v3 --force

Speaker diarization

WhisperX uses pyannote/speaker-diarization-3.1 under the hood. First-time setup requires:

A Hugging Face account + read token (HF_TOKEN env var)
Clicking "Agree" on both pyannote model pages (linked in Quick start)

If HF_TOKEN is missing, the script logs a notice and continues without diarization. Transcripts still work; they just won't have speaker labels.

Resumable + idempotent

Already-indexed clips are skipped on re-runs (a sidecar existing = done). Ctrl-C any time; a restart picks up where it stopped. --force regenerates everything.

Troubleshooting

"Missing dependency: whisperx" — Run setup.py.

"Failed to load diarization pipeline" — You didn't accept the pyannote model terms on Hugging Face. Visit the two model pages, click Agree, then re-run.

Whisper model download stalls — setup.py --skip-model-download, then index_videos.py downloads on first use. Make sure you have disk space (~3GB for large-v3, ~1.5GB for turbo).

"No GPS data in this file" — Many clips don't have GPS metadata. The script handles this silently — the frontmatter just omits the location block.

Apple Silicon GPU not used — CTranslate2 (via WhisperX) currently runs on CPU on M-series Macs. For archive indexing, CPU is plenty fast (10-30× realtime).

Companion tools

Command	Script	Purpose
`vidx`	`index_videos.py`	Main indexer
`vidx-summary`	`trip_summary.py`	Recursive per-folder summaries
`vidx-master`	`master_index.py`	Drive-level `_INDEX.md` + `_INDEX.json`
`vidx-query`	`query.py`	Filter sidecars by rating, lighting, person, keyword, location, language

vidx-query /Volumes/SSD-2024 --rating keep --time-of-day golden_hour
vidx-query /Volumes/SSD-2024 --rating cull                  # the cull pile
vidx-query /Volumes/SSD-2024 --keyword drone --keyword landscape
vidx-query /Volumes/SSD-2024 --place-contains California --language es

Known limitations

Frame sampling is evenly-spaced, not scene-detected
pyannote diarization degrades on heavy ambient noise (wind, music, crowd)
WhisperX runs on CPU on Apple Silicon
Face cluster IDs are temporary hashes until the vidx-faces labeling tool ships — embeddings are captured now, so no re-indexing will be needed
RAW photo support not yet (videos only)

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

framedex

Install

Quick start

Per-clip pipeline

What sidecars look like

Optional folder context

Proper-noun biasing

Multiple SSDs

Common flags

Vision backends

Privacy

Languages

Speaker diarization

Resumable + idempotent

Troubleshooting

Companion tools

Known limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

framedex

Install

Quick start

Per-clip pipeline

What sidecars look like

Optional folder context

Proper-noun biasing

Multiple SSDs

Common flags

Vision backends

Privacy

Languages

Speaker diarization

Resumable + idempotent

Troubleshooting

Companion tools

Known limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages