video-skill-extractor

video-skill-extractor turns narrated videos into structured, timeline-ready skill steps.

Current pipeline supports:

transcription (OpenAI-compatible Whisper endpoint)
transcript parsing + chunking
AI step extraction
per-step frame extraction (ffmpeg)
AI enrichment (reasoning + VLM, two-pass visual analysis)
markdown rendering
provider health checks

1) Requirements

Python 3.11+
uv
Docker (for local model services, optional)
ffmpeg binary is handled via imageio-ffmpeg in this project

2) Install

cd /Users/mg/src/course-step-extractor
uv sync --dev

Sanity checks:

uv run ruff check .
uv run pytest -q

3) OpenClaw / ClawHub installation

If you want to run this through OpenClaw as a skill:

# install skill from ClawHub into your OpenClaw workspace
npx -y clawhub install video-skill --workdir ~/.openclaw/workspace

The skill installs to:

~/.openclaw/workspace/skills/video-skill

Then in that directory:

cd ~/.openclaw/workspace/skills/video-skill
uv sync --dev
cp config.example.json config.json

Validate provider endpoints before first run:

uv run video-skill config-validate --config config.json
uv run video-skill providers-ping --config config.json --path /v1/models

You can now run the same commands documented below from this installed skill directory.

4) Model setup (local/self-hosted)

A. Download models

./scripts/bootstrap_models.sh

B. Start model stack

docker compose -f deploy/docker-compose.models.yml up -d

C. Verify services are up

docker compose -f deploy/docker-compose.models.yml ps

5) Configure `config.json`

Create from template:

cp config.example.json config.json

Set the 3 provider roles:

transcription → Whisper/OpenAI-compatible ASR endpoint
- supports optional language (default "en"; use "auto" to enable autodetect)
reasoning → reasoning model endpoint
vlm → vision-language model endpoint

Use served model IDs from /v1/models (not raw filenames unless the server exposes those as IDs).

Validate + ping:

uv run video-skill config-validate --config config.json
uv run video-skill providers-ping --config config.json --path /v1/models

6) CLI quick usage

uv run video-skill --help

Key commands:

transcribe
transcript-parse
transcript-chunk
steps-extract
frames-extract
steps-enrich
markdown-render

7) End-to-end run (manual stages)

Example video: datasets/demo/zac-game.mp4

# 1) ASR
uv run video-skill transcribe \
  --video datasets/demo/zac-game.mp4 \
  --out datasets/demo/zac-game.whisper.json \
  --config config.json
# optional override: --language auto  (or --language es, --language fr, ...)

# 2) Parse transcript
uv run video-skill transcript-parse \
  --input datasets/demo/zac-game.whisper.json \
  --out datasets/demo/zac-game.segments.jsonl

# 3) Chunk transcript
uv run video-skill transcript-chunk \
  --segments datasets/demo/zac-game.segments.jsonl \
  --out datasets/demo/zac-game.chunks.jsonl \
  --window-s 120 \
  --overlap-s 15

# 4) Extract steps (AI)
uv run video-skill steps-extract \
  --segments datasets/demo/zac-game.segments.jsonl \
  --clips-manifest datasets/demo/lesson1.clips.jsonl \
  --chunks datasets/demo/zac-game.chunks.jsonl \
  --mode ai \
  --config config.json \
  --out datasets/demo/zac-game.steps.ai.jsonl

# 5) Extract per-step frames for VLM grounding
uv run video-skill frames-extract \
  --video datasets/demo/zac-game.mp4 \
  --steps datasets/demo/zac-game.steps.ai.jsonl \
  --out-dir datasets/demo/frames_zac_game \
  --manifest-out datasets/demo/zac-game.frames_manifest.jsonl \
  --sample-count 2

# 6) Enrich steps (AI, two-pass visual)
uv run video-skill steps-enrich \
  --steps datasets/demo/zac-game.steps.ai.jsonl \
  --frames-manifest datasets/demo/zac-game.frames_manifest.jsonl \
  --out datasets/demo/zac-game.steps.enriched.ai.jsonl \
  --mode ai \
  --config config.json

# 7) Render markdown
uv run video-skill markdown-render \
  --steps datasets/demo/zac-game.steps.enriched.ai.jsonl \
  --out datasets/demo/zac-game.md \
  --title "Zac Game - Skill Steps"

8) Enrichment modes

--mode heuristic
- no model calls; deterministic baseline
--mode ai-direct
- VLM-only enrichment path
--mode ai
- reasoning + VLM orchestration (recommended)

steps-enrich prints progress per step/stage and summary telemetry:

parse_errors
transient_recovered
unresolved_final

9) Testing and quality gates

make verify

This runs lint + tests with coverage gate (>=90%).

10) Output artifacts

Typical outputs:

*.whisper.json
*.segments.jsonl
*.chunks.jsonl
*.steps.ai.jsonl
*.frames_manifest.jsonl
*.steps.enriched.ai.jsonl
optional *.errors.jsonl for parse/call telemetry

11) Next direction

The project is evolving toward a generalized video skill library with OTIO-ready timeline metadata and editor/robotics adapters.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
deploy		deploy
scripts		scripts
src/video_skill_extractor		src/video_skill_extractor
tests		tests
.gitignore		.gitignore
BACKLOG.md		BACKLOG.md
IMPLEMENTATION_REFS.md		IMPLEMENTATION_REFS.md
INSTRUCTIONS.md		INSTRUCTIONS.md
Makefile		Makefile
README.md		README.md
SKILL.md		SKILL.md
config.example.json		config.example.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock
v2-research.md		v2-research.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

video-skill-extractor

1) Requirements

2) Install

3) OpenClaw / ClawHub installation

4) Model setup (local/self-hosted)

A. Download models

B. Start model stack

C. Verify services are up

5) Configure `config.json`

6) CLI quick usage

7) End-to-end run (manual stages)

8) Enrichment modes

9) Testing and quality gates

10) Output artifacts

11) Next direction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

video-skill-extractor

1) Requirements

2) Install

3) OpenClaw / ClawHub installation

4) Model setup (local/self-hosted)

A. Download models

B. Start model stack

C. Verify services are up

5) Configure config.json

6) CLI quick usage

7) End-to-end run (manual stages)

8) Enrichment modes

9) Testing and quality gates

10) Output artifacts

11) Next direction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5) Configure `config.json`

Packages