Drop an Instagram link. Get structured knowledge in your second brain.
kilogram is a skill for Hermes Agent that turns Instagram Reels, carousels, and posts into searchable, structured notes. It extracts what actually matters — audio transcripts, slide text, captions — and writes it into your wiki automatically.
You find a useful Reel — a top-10 movie list, a travel guide, a how-to carousel. You want to save it. Instagram's bookmarks are a black hole: no search, no structure, no context.
kilogram fixes that. Send the link to Hermes, confirm the plan, and the post becomes a proper wiki page: titled, categorized, linked to related concepts, and searchable forever.
Works best with llm-wiki — Andrej Karpathy's second brain skill bundled into Hermes. kilogram feeds content into it; llm-wiki lets you query and browse it.
What gets extracted:
- Audio speech → transcript (via Whisper)
- Slide text → OCR (via Tesseract)
- Post caption / description → parsed directly
- Single images → thumbnail OCR or vision
One link at a time. kilogram shows you exactly what it's going to write before touching your wiki — you confirm, then it writes.
kilogram is a Hermes skill (SKILL.md) with a set of Python scripts. The agent orchestrates the extraction pipeline; scripts do all the deterministic work and return JSON.
| Tool | Purpose |
|---|---|
| Hermes Agent | The autonomous agent runtime |
yt-dlp |
Download video, audio, description |
ffmpeg + ffprobe |
Audio extraction, frame sampling |
tesseract |
OCR for slides and thumbnails |
faster-whisper |
Local speech-to-text |
| Python 3.10+ | Script runtime |
Python packages (see requirements.txt):
faster-whisper
Pillow
requests
# 1. Clone into your Hermes skills directory
git clone https://github.com/michaelradionov/kilogram.git \
~/.hermes/skills/research/kilogram
# 2. Install Python dependencies
cd ~/.hermes/skills/research/kilogram
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Make sure system deps are available
which yt-dlp ffmpeg tesseract # all three should resolveDrop an Instagram URL in the Hermes chat. kilogram activates automatically on:
- Any
instagram.com/reel/...,instagram.com/p/...link - "add to wiki", "save this reel", "add to my second brain"
- "show me the wiki", "what movies do I have"
kilogram will:
- Check for duplicates
- Extract content (description → OCR → transcript, fastest path first)
- Show you a dry-run plan with all files to be created/modified
- Wait for your confirmation
- Write to wiki, clean up temp files, show a transparent report
Instagram URL
│
├─ kilogram_prepare.py ← normalize URL, dedup check
│
├─ Description / caption ← fastest: yt-dlp or embed extractor
│ └─ useful? → write raw + concept, stop
│
├─ Pipeline A: Reel
│ ├─ frame_00 OCR ← TOP-N title slide check
│ │ └─ found? → extract N frames, skip Whisper
│ └─ Whisper transcript ← if no TOP-N
│
├─ Pipeline B: Carousel
│ ├─ embed extractor (primary, no browser needed)
│ └─ browser fallback (if embed returns 0 slides)
│
├─ Pipeline C: Single Image
│ └─ thumbnail OCR → vision fallback
│
└─ Pipeline D: Video-Carousel (N slides in video)
├─ kilogram_frames.py --n N --calibrate
└─ batch OCR → vision fallback
| Script | Does | Returns |
|---|---|---|
kilogram_prepare.py URL --wiki PATH |
URL normalization, dedup | JSON: canonical_url, dedupe.duplicate |
instagram_embed_images.py URL --out DIR --json |
Download carousel slides via /embed/ |
JSON: count, files, caption |
instagram_embed_images.py URL --caption-only |
Caption only, no download | JSON: caption |
kilogram_frames.py video.mp4 --frame0-only |
Extract frame at t=0.5s | JSON: frames[0].path, duration |
kilogram_frames.py video.mp4 --n N --calibrate |
Sample exactly N frames | JSON: frames, content_start |
kilogram_ocr.py file1 file2 --top-n-check |
Batch OCR + TOP-N detection | JSON array: text, confidence, top_n |
kilogram_dedup_frames.py ocr.json |
Deduplicate adjacent frames | JSON: unique_frames |
kilogram_whisper.py audio.m4a |
Transcribe audio | JSON: text, word_count, duration |
Content is only written to a concept page if the source is reliable:
| Level | Source | Action |
|---|---|---|
high |
Whisper transcript ≥20 words, OCR high, useful description | raw + concept |
medium |
Vision fallback, OCR medium | raw + concept |
low |
Web search snippets, OCR empty | raw only, no concept |
blocked |
403, deleted, private | raw stub only |
kilogram writes into an llm-wiki compatible layout:
$WIKI/
├── raw/transcripts/ ← instagram-{type}-{shortcode}.md
├── concepts/ ← one file per topic/place/film
├── index.md ← running list + total_count
└── log.md ← append-only ingest log
# Smoke test — checks scripts, yt-dlp, embed, OCR, Whisper/Pillow availability
python3 scripts/kilogram_test.py --wiki ~/wiki
# Single URL
python3 scripts/kilogram_test.py --url "https://www.instagram.com/p/DVgBqwDiEcP/"
# Without downloading media
python3 scripts/kilogram_test.py --skip-downloadTest URLs for e2e runs: references/test-reels.md
kilogram writes standard Markdown. If you don't use llm-wiki, you can point $WIKI at any directory and use the raw files and concepts directly — Obsidian, Logseq, or plain files all work.
MIT — use it, fork it, extend it.
