name

video-skill

description

Run the video-skill pipeline to convert narrated videos into structured step data and enriched timeline-ready outputs. Use when a user asks to process a video into steps, run transcription/chunking/extraction/enrichment, debug provider connectivity, or generate markdown from extracted skills.

metadata

openclaw

requires

bins

ffmpeg

python3

Video Skill

Use this skill to run video-skill end-to-end or stage-by-stage.

First-time setup (no repo clone required)

Use one of these setup paths:

A) Run from local source repo (recommended while iterating):

cd /path/to/videoskill
uv sync --dev
cp config.example.json config.json

Then run commands with uv run, for example:

uv run video-skill --help

Then run video-skill ... directly from your working directory.

Verify providers before processing:

video-skill config-validate --config config.json
video-skill providers-ping --config config.json --path /v1/models

Standard workflow (recommended)

Run from your working directory where config.json and data paths are valid.

video-skill transcribe --video <video.mp4> --out <name>.whisper.json --config config.json
video-skill transcript-parse --input <name>.whisper.json --out <name>.segments.jsonl
video-skill transcript-chunk --segments <name>.segments.jsonl --out <name>.chunks.jsonl --window-s 120 --overlap-s 15
video-skill steps-extract --segments <name>.segments.jsonl --clips-manifest <clips>.jsonl --chunks <name>.chunks.jsonl --mode ai --config config.json --out <name>.steps.ai.jsonl
video-skill frames-extract --video <video.mp4> --steps <name>.steps.ai.jsonl --out-dir <frames_dir> --manifest-out <name>.frames_manifest.jsonl --sample-count 2
video-skill steps-enrich --steps <name>.steps.ai.jsonl --frames-manifest <name>.frames_manifest.jsonl --out <name>.steps.enriched.ai.jsonl --mode ai --config config.json
video-skill markdown-render --steps <name>.steps.enriched.ai.jsonl --out <name>.md --title "<Title>"

Modes

--mode heuristic: deterministic, no model calls
--mode ai-direct: VLM-centric enrichment
--mode ai: reasoning + VLM orchestration (default for quality)

Prefer --mode ai unless user asks for debugging or reduced model usage.

Reliability and diagnostics

steps-enrich emits:

per-step progress logs
summary metrics: parse_errors, transient_recovered, unresolved_final
detailed *.errors.jsonl when any errors occur

If runs fail unexpectedly:

re-run providers-ping
inspect *.errors.jsonl by stage (sampling_plan, vlm_judge, vlm_select_frames, vlm_signal_pass, reasoning_finalize)
verify endpoint DNS/host reachability

Validation gate before claiming success

Always run:

video-skill --help

Use make verify only when working from the source repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video Skill

First-time setup (no repo clone required)

Standard workflow (recommended)

Modes

Reliability and diagnostics

Validation gate before claiming success

FilesExpand file tree

SKILL.md

Latest commit

History

SKILL.md

File metadata and controls

Video Skill

First-time setup (no repo clone required)

Standard workflow (recommended)

Modes

Reliability and diagnostics

Validation gate before claiming success