kilogram

Drop an Instagram link. Get structured knowledge in your second brain.

kilogram is a skill for Hermes Agent that turns Instagram Reels, carousels, and posts into searchable, structured notes. It extracts what actually matters — audio transcripts, slide text, captions — and writes it into your wiki automatically.

How it works

For everyone

You find a useful Reel — a top-10 movie list, a travel guide, a how-to carousel. You want to save it. Instagram's bookmarks are a black hole: no search, no structure, no context.

kilogram fixes that. Send the link to Hermes, confirm the plan, and the post becomes a proper wiki page: titled, categorized, linked to related concepts, and searchable forever.

Works best with llm-wiki — Andrej Karpathy's second brain skill bundled into Hermes. kilogram feeds content into it; llm-wiki lets you query and browse it.

What gets extracted:

Audio speech → transcript (via Whisper)
Slide text → OCR (via Tesseract)
Post caption / description → parsed directly
Single images → thumbnail OCR or vision

One link at a time. kilogram shows you exactly what it's going to write before touching your wiki — you confirm, then it writes.

For developers

kilogram is a Hermes skill (SKILL.md) with a set of Python scripts. The agent orchestrates the extraction pipeline; scripts do all the deterministic work and return JSON.

Prerequisites

Tool	Purpose
Hermes Agent	The autonomous agent runtime
`yt-dlp`	Download video, audio, description
`ffmpeg` + `ffprobe`	Audio extraction, frame sampling
`tesseract`	OCR for slides and thumbnails
`faster-whisper`	Local speech-to-text
Python 3.10+	Script runtime

Python packages (see requirements.txt):

faster-whisper
Pillow
requests

Installation

# 1. Clone into your Hermes skills directory
git clone https://github.com/michaelradionov/kilogram.git \
  ~/.hermes/skills/research/kilogram

# 2. Install Python dependencies
cd ~/.hermes/skills/research/kilogram
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 3. Make sure system deps are available
which yt-dlp ffmpeg tesseract  # all three should resolve

Usage

Drop an Instagram URL in the Hermes chat. kilogram activates automatically on:

Any instagram.com/reel/..., instagram.com/p/... link
"add to wiki", "save this reel", "add to my second brain"
"show me the wiki", "what movies do I have"

kilogram will:

Check for duplicates
Extract content (description → OCR → transcript, fastest path first)
Show you a dry-run plan with all files to be created/modified
Wait for your confirmation
Write to wiki, clean up temp files, show a transparent report

Pipeline overview

Instagram URL
    │
    ├─ kilogram_prepare.py   ← normalize URL, dedup check
    │
    ├─ Description / caption  ← fastest: yt-dlp or embed extractor
    │       └─ useful? → write raw + concept, stop
    │
    ├─ Pipeline A: Reel
    │       ├─ frame_00 OCR  ← TOP-N title slide check
    │       │       └─ found? → extract N frames, skip Whisper
    │       └─ Whisper transcript ← if no TOP-N
    │
    ├─ Pipeline B: Carousel
    │       ├─ embed extractor (primary, no browser needed)
    │       └─ browser fallback (if embed returns 0 slides)
    │
    ├─ Pipeline C: Single Image
    │       └─ thumbnail OCR → vision fallback
    │
    └─ Pipeline D: Video-Carousel (N slides in video)
            ├─ kilogram_frames.py --n N --calibrate
            └─ batch OCR → vision fallback

Scripts reference

Script	Does	Returns
`kilogram_prepare.py URL --wiki PATH`	URL normalization, dedup	JSON: `canonical_url`, `dedupe.duplicate`
`instagram_embed_images.py URL --out DIR --json`	Download carousel slides via `/embed/`	JSON: `count`, `files`, `caption`
`instagram_embed_images.py URL --caption-only`	Caption only, no download	JSON: `caption`
`kilogram_frames.py video.mp4 --frame0-only`	Extract frame at t=0.5s	JSON: `frames[0].path`, `duration`
`kilogram_frames.py video.mp4 --n N --calibrate`	Sample exactly N frames	JSON: `frames`, `content_start`
`kilogram_ocr.py file1 file2 --top-n-check`	Batch OCR + TOP-N detection	JSON array: `text`, `confidence`, `top_n`
`kilogram_dedup_frames.py ocr.json`	Deduplicate adjacent frames	JSON: `unique_frames`
`kilogram_whisper.py audio.m4a`	Transcribe audio	JSON: `text`, `word_count`, `duration`

Confidence system

Content is only written to a concept page if the source is reliable:

Level	Source	Action
`high`	Whisper transcript ≥20 words, OCR high, useful description	raw + concept
`medium`	Vision fallback, OCR medium	raw + concept
`low`	Web search snippets, OCR empty	raw only, no concept
`blocked`	403, deleted, private	raw stub only

Wiki structure

kilogram writes into an llm-wiki compatible layout:

$WIKI/
├── raw/transcripts/     ← instagram-{type}-{shortcode}.md
├── concepts/            ← one file per topic/place/film
├── index.md             ← running list + total_count
└── log.md               ← append-only ingest log

Running tests

# Smoke test — checks scripts, yt-dlp, embed, OCR, Whisper/Pillow availability
python3 scripts/kilogram_test.py --wiki ~/wiki

# Single URL
python3 scripts/kilogram_test.py --url "https://www.instagram.com/p/DVgBqwDiEcP/"

# Without downloading media
python3 scripts/kilogram_test.py --skip-download

Test URLs for e2e runs: references/test-reels.md

Works without llm-wiki too

kilogram writes standard Markdown. If you don't use llm-wiki, you can point $WIKI at any directory and use the raw files and concepts directly — Obsidian, Logseq, or plain files all work.

License

MIT — use it, fork it, extend it.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
references		references
scripts		scripts
CHANGELOG.md		CHANGELOG.md
README.md		README.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kilogram

How it works

For everyone

For developers

Prerequisites

Installation

Usage

Pipeline overview

Scripts reference

Confidence system

Wiki structure

Running tests

Works without llm-wiki too

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

kilogram

How it works

For everyone

For developers

Prerequisites

Installation

Usage

Pipeline overview

Scripts reference

Confidence system

Wiki structure

Running tests

Works without llm-wiki too

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages