Give Codex a local video perception layer.
This Codex plugin extracts timestamped frames with ffmpeg, analyzes video structure with ffmpeg filters, and transcribes audio through Gemini API, local Whisper, or OpenAI Whisper. Codex receives a compact packet of metadata, frame images, scene/silence analysis, and timestamped transcript segments.
This project is a Codex-oriented port of Jordan Vasconcelos's MIT-licensed claude-video-vision. The original copyright and license are preserved in LICENSE.
video_info: reads duration, resolution, codec, frame rate, file size, and audio presence.video_analyze: detects scene changes, black intervals, silence, freeze, motion profile, blur, exposure, loudness, and optional transcription.video_watch: extracts frames and audio/transcript for an initial pass.video_detail: drills into specific timestamp windows at higher resolution or FPS.video_configure: saves backend and extraction preferences.video_setup: checks dependencies and suggests install steps.
The plugin is a perception layer. The agent still does the interpretation: summarizing a video, comparing it to a storyboard, identifying missing beats, or producing a review packet.
The plugin includes two Codex skills:
video-perception: general video inspection workflow.storyboard-review: compares a render against a storyboard, script, shot list, or production brief.
- Node.js 20+
ffmpegandffprobe- Optional audio backend:
- Gemini API with
GEMINI_API_KEY - local Whisper via
whisper.cppor Pythonopenai-whisper - OpenAI Whisper API with
OPENAI_API_KEY
- Gemini API with
Fastest path: ask your Codex agent to install it for you.
Install Codex Video Vision from https://github.com/EyrieCommander/codex-video-vision. Clone it to ~/.codex-video-vision/plugin, run npm ci and npm run build in mcp-server, install the video-perception and storyboard-review skills from the GitHub repo into ~/.codex/skills, add an [mcp_servers.codex-video-vision] entry to ~/.codex/config.toml that runs node against ~/.codex-video-vision/plugin/mcp-server/dist/index.js, run npm run smoke:mcp, and then tell me to restart Codex.
The skill install command your agent should use is:
/usr/bin/python3 ~/.codex/skills/.system/skill-installer/scripts/install-skill-from-github.py \
--repo EyrieCommander/codex-video-vision \
--path skills/video-perception skills/storyboard-reviewThe MCP server entry should look like this, with the absolute path adjusted if you cloned elsewhere:
[mcp_servers.codex-video-vision]
command = "node"
args = ["/Users/you/.codex-video-vision/plugin/mcp-server/dist/index.js"]Restart Codex after installing. The $ skill menu is populated from installed skills, so the repo merely existing on disk is not enough.
Build the MCP server:
cd mcp-server
npm install
npm run build
npm test
npm run smoke:mcpThe plugin manifest lives at .codex-plugin/plugin.json. The local MCP config in .mcp.json starts the built server with:
{
"codex-video-vision": {
"command": "node",
"args": ["./mcp-server/dist/index.js"]
}
}Configuration is stored under:
~/.codex-video-vision/config.json
~/.codex-video-vision/.env
~/.codex-video-vision/models/
~/.codex-video-vision/sessions/
API keys can come from the process environment or from a gitignored .env file. The server loads OPENAI_API_KEY and GEMINI_API_KEY from the shell environment first, then from .env in the current working directory, then from ~/.codex-video-vision/.env.
Example user-level key file:
mkdir -p ~/.codex-video-vision
printf 'OPENAI_API_KEY=your_openai_api_key_here\n' > ~/.codex-video-vision/.env
chmod 600 ~/.codex-video-vision/.env- Start with
video_info. - For videos longer than 30 seconds, run
video_analyzebefore extracting frames. - Use the transcript, scene changes, and silence intervals to choose frame segments.
- Run
video_watchfor an overview. - Use
video_detailfor close inspection around key moments. - Avoid re-extracting cached frames when
enable_indexis on.
For educational video production, the useful Codex task is often not "summarize this video" but "compare this render against the intended script." A good review packet should include:
- Source video path and duration.
- Storyboard path and intended coverage.
- Scenes or shots visibly covered.
- Missing or weak story beats.
- Transcript/caption status.
- Visual consistency issues.
- App integration status.
- Recommendation: accept, revise, extend, or regenerate.
The repository metadata points to https://github.com/EyrieCommander/codex-video-vision. Before a release, run npm run build, npm test, npm run smoke:mcp, and npm pack --dry-run from mcp-server. The GitHub release workflow is manual until npm Trusted Publisher setup is intentional. If publishing to npm later, switch .mcp.json from local node ./mcp-server/dist/index.js to the published package command.
The hero image in assets/hero.avif is inherited from the upstream claude-video-vision project. Replace it before using a distinct project brand or marketplace listing.
MIT. Based on claude-video-vision by Jordan Vasconcelos.