Surgical phrase splicing from real movies and TV shows via yarn.co.
Give it any long phrase — it greedily cuts the line into the longest possible runs of words that were actually spoken on screen, then assembles a final clip from those pieces. Classic fragmovie genre, automated to the millisecond.
100% local, no API keys, no cloud calls. Speech recognition runs on-device with faster-whisper.
"Sentient is the best company in the world."
Four variants of the same phrase, each spliced word-by-word from real movies and TV shows. Every clip pulled from yarn.co + playphrase.me and normalised to a single 854x480 / 48 kHz container so the concat is seamless.
sentient_v14.mp4
sentient_v13.mp4
sentient_v11.mp4
sentient_v10.mp4
npx skills add solyanviktor-star/BumblebeeThis installs Bumblebee into your agent's skills directory (Claude Code, Cursor, GitHub Copilot, and other compatible agents). The agent will automatically activate it on prompts like "splice a fragmovie of <phrase>" or "build a video where actors say <phrase>".
"I don't get it, why does my Claude keep getting banned. I'm sick of buying new accounts."
|
v split on .!? -> each sentence handled independently
|
[ greedy splitter — chunks of up to 6 words ]
|
"I don't get it why" --+
"does" --+ for each chunk:
"Claude" --+ getyarn.io -> 8 candidates (curl_cffi, bypasses CF)
"keep getting" --+ download mp4
"I'm sick of" --+ faster-whisper word-timestamps (local, no API)
"new accounts" --+ word_matcher: exact match
... | FFmpeg cut to the millisecond
v
concat into output/final.mp4
(with short audio fades at every splice +
a ~180ms breathing pause between sentences)
Words that nobody ever said in any clip are skipped.
python bumblebee.py "Sentient is the best company" --variants 4 -o sentient.mp4Generates 4 files (sentient_v1.mp4, _v2, _v3, _v4) where every variant avoids clips already used by previous ones. You get different cuts with different actors, different movies, sometimes even different segmentation of the same phrase.
git clone https://github.com/solyanviktor-star/Bumblebee.git
cd Bumblebee
pip install -r requirements.txtYou also need FFmpeg on PATH (or set FFMPEG_BIN to its path).
Bumblebee uses playphrase.me as an automatic secondary source whenever yarn.co fails to cover a chunk. playphrase has 10x-1000x more clips per phrase, so installing it dramatically improves coverage on rare words and longer phrases. Without it, those words get skipped and Bumblebee asks the orchestrator to substitute a synonym.
pip install playwright
playwright install chromium # one-time, ~120 MBThe Chromium bootstrap (~10-15s) runs lazily — only on the first yarn miss
of a given run, never if yarn covers the whole phrase. If you skip this
step, Bumblebee still works in yarn-only mode; you can also force-disable
playphrase per run with --no-playphrase.
That's it. No API keys, no .env, nothing else to configure. The first run
downloads the Whisper model (~244 MB for small.en) into the HuggingFace
cache; every run after that is fully offline.
- Python 3.9+
- FFmpeg
- ~250 MB free disk space for the speech model
- Recommended: playwright + Chromium (~120 MB) for the playphrase fallback
No GPU required. If you have a CUDA GPU, set WHISPER_DEVICE=cuda for a roughly 5x speedup on transcription.
# One video from one phrase
python bumblebee.py "I am your father"
# Several phrases — each is processed and they're concatenated in order
python bumblebee.py "I am your father" "Houston we have a problem"
# 5 different cuts of the same phrase, no clip reuse
python bumblebee.py "Sentient is the best" -o sentient.mp4 --variants 5The final file lands in output/<name>.mp4.
yarn.co's public HTML is hard-capped at 20 unique clips per phrase. When yarn fails to cover a chunk, Bumblebee automatically falls back to playphrase.me, which often has 10x-1000x more matches (73,000 clips for "open" vs yarn's 20). Its API delivers word-timestamps natively, so playphrase clips skip the faster-whisper step entirely.
The fallback is lazy: the headless Chromium bootstrap (~10-15s, one-time
per run) only runs if yarn actually misses. Phrases that yarn covers fully
never touch playwright. See the Install section for the one-time
playwright + Chromium setup; once installed, every run uses playphrase as
needed. Pass --no-playphrase to force-disable it for a single run.
| Variable | Default | Purpose |
|---|---|---|
WHISPER_MODEL |
small.en |
Model name. Use base.en for speed, medium.en for accuracy. |
WHISPER_DEVICE |
cpu |
Set to cuda if you have an NVIDIA GPU. |
WHISPER_COMPUTE_TYPE |
int8 (cpu) / float16 (cuda) |
Inference quantization. |
FFMPEG_BIN |
ffmpeg |
Path to ffmpeg binary if not on PATH. |
Bumblebee/
|- bumblebee.py <- CLI entry point
|- SKILL.md <- Claude Code skill manifest
|- src/
| |- phrase_splitter.py <- greedy longest-match with optional shuffling/exclusion
| |- yarn_search.py <- phrase -> clip_ids (curl_cffi, bypasses Cloudflare)
| |- downloader.py <- clip_id -> local mp4 (curl_cffi, bypasses CF on y.yarn.co)
| |- transcriber.py <- faster-whisper word-timestamps + cache
| |- word_matcher.py <- exact start/end of target words with apostrophe-fuzz
| |- cutter.py <- FFmpeg cut + audio fade at splice points
| |- concat.py <- concat demuxer
|- cache/ <- downloaded clips and transcripts (reused across runs)
|- output/ <- final reels and intermediate parts in _parts/
- yarn.co indexes English-language media only.
- Whisper sometimes transcribes short tokens like "I", "a", "my" as part of a longer word, so single short words tend to get skipped.
- Word order is strict: "can we" and "we can" are different matches (a swap-fuzzy is on the TODO list).
- yarn.co sits behind Cloudflare. Solved with
curl_cffiandimpersonate='chrome'(which replays a real Chrome TLS fingerprint).
MIT — see LICENSE.
Built end-to-end with Claude Code.