Skip to content

michaelradionov/kilogram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kilogram

Drop an Instagram link. Get structured knowledge in your second brain.

kilogram is a skill for Hermes Agent that turns Instagram Reels, carousels, and posts into searchable, structured notes. It extracts what actually matters — audio transcripts, slide text, captions — and writes it into your wiki automatically.


How it works

kilogram pipeline


For everyone

You find a useful Reel — a top-10 movie list, a travel guide, a how-to carousel. You want to save it. Instagram's bookmarks are a black hole: no search, no structure, no context.

kilogram fixes that. Send the link to Hermes, confirm the plan, and the post becomes a proper wiki page: titled, categorized, linked to related concepts, and searchable forever.

Works best with llm-wiki — Andrej Karpathy's second brain skill bundled into Hermes. kilogram feeds content into it; llm-wiki lets you query and browse it.

What gets extracted:

  • Audio speech → transcript (via Whisper)
  • Slide text → OCR (via Tesseract)
  • Post caption / description → parsed directly
  • Single images → thumbnail OCR or vision

One link at a time. kilogram shows you exactly what it's going to write before touching your wiki — you confirm, then it writes.


For developers

kilogram is a Hermes skill (SKILL.md) with a set of Python scripts. The agent orchestrates the extraction pipeline; scripts do all the deterministic work and return JSON.

Prerequisites

Tool Purpose
Hermes Agent The autonomous agent runtime
yt-dlp Download video, audio, description
ffmpeg + ffprobe Audio extraction, frame sampling
tesseract OCR for slides and thumbnails
faster-whisper Local speech-to-text
Python 3.10+ Script runtime

Python packages (see requirements.txt):

faster-whisper
Pillow
requests

Installation

# 1. Clone into your Hermes skills directory
git clone https://github.com/michaelradionov/kilogram.git \
  ~/.hermes/skills/research/kilogram

# 2. Install Python dependencies
cd ~/.hermes/skills/research/kilogram
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 3. Make sure system deps are available
which yt-dlp ffmpeg tesseract  # all three should resolve

Usage

Drop an Instagram URL in the Hermes chat. kilogram activates automatically on:

  • Any instagram.com/reel/..., instagram.com/p/... link
  • "add to wiki", "save this reel", "add to my second brain"
  • "show me the wiki", "what movies do I have"

kilogram will:

  1. Check for duplicates
  2. Extract content (description → OCR → transcript, fastest path first)
  3. Show you a dry-run plan with all files to be created/modified
  4. Wait for your confirmation
  5. Write to wiki, clean up temp files, show a transparent report

Pipeline overview

Instagram URL
    │
    ├─ kilogram_prepare.py   ← normalize URL, dedup check
    │
    ├─ Description / caption  ← fastest: yt-dlp or embed extractor
    │       └─ useful? → write raw + concept, stop
    │
    ├─ Pipeline A: Reel
    │       ├─ frame_00 OCR  ← TOP-N title slide check
    │       │       └─ found? → extract N frames, skip Whisper
    │       └─ Whisper transcript ← if no TOP-N
    │
    ├─ Pipeline B: Carousel
    │       ├─ embed extractor (primary, no browser needed)
    │       └─ browser fallback (if embed returns 0 slides)
    │
    ├─ Pipeline C: Single Image
    │       └─ thumbnail OCR → vision fallback
    │
    └─ Pipeline D: Video-Carousel (N slides in video)
            ├─ kilogram_frames.py --n N --calibrate
            └─ batch OCR → vision fallback

Scripts reference

Script Does Returns
kilogram_prepare.py URL --wiki PATH URL normalization, dedup JSON: canonical_url, dedupe.duplicate
instagram_embed_images.py URL --out DIR --json Download carousel slides via /embed/ JSON: count, files, caption
instagram_embed_images.py URL --caption-only Caption only, no download JSON: caption
kilogram_frames.py video.mp4 --frame0-only Extract frame at t=0.5s JSON: frames[0].path, duration
kilogram_frames.py video.mp4 --n N --calibrate Sample exactly N frames JSON: frames, content_start
kilogram_ocr.py file1 file2 --top-n-check Batch OCR + TOP-N detection JSON array: text, confidence, top_n
kilogram_dedup_frames.py ocr.json Deduplicate adjacent frames JSON: unique_frames
kilogram_whisper.py audio.m4a Transcribe audio JSON: text, word_count, duration

Confidence system

Content is only written to a concept page if the source is reliable:

Level Source Action
high Whisper transcript ≥20 words, OCR high, useful description raw + concept
medium Vision fallback, OCR medium raw + concept
low Web search snippets, OCR empty raw only, no concept
blocked 403, deleted, private raw stub only

Wiki structure

kilogram writes into an llm-wiki compatible layout:

$WIKI/
├── raw/transcripts/     ← instagram-{type}-{shortcode}.md
├── concepts/            ← one file per topic/place/film
├── index.md             ← running list + total_count
└── log.md               ← append-only ingest log

Running tests

# Smoke test — checks scripts, yt-dlp, embed, OCR, Whisper/Pillow availability
python3 scripts/kilogram_test.py --wiki ~/wiki

# Single URL
python3 scripts/kilogram_test.py --url "https://www.instagram.com/p/DVgBqwDiEcP/"

# Without downloading media
python3 scripts/kilogram_test.py --skip-download

Test URLs for e2e runs: references/test-reels.md


Works without llm-wiki too

kilogram writes standard Markdown. If you don't use llm-wiki, you can point $WIKI at any directory and use the raw files and concepts directly — Obsidian, Logseq, or plain files all work.


License

MIT — use it, fork it, extend it.

About

Skill for saving and recalling IG content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages