Codex Video Vision

Give Codex a local video perception layer.

This Codex plugin extracts timestamped frames with ffmpeg, analyzes video structure with ffmpeg filters, and transcribes audio through Gemini API, local Whisper, or OpenAI Whisper. Codex receives a compact packet of metadata, frame images, scene/silence analysis, and timestamped transcript segments.

This project is a Codex-oriented port of Jordan Vasconcelos's MIT-licensed claude-video-vision. The original copyright and license are preserved in LICENSE.

What It Does

video_info: reads duration, resolution, codec, frame rate, file size, and audio presence.
video_analyze: detects scene changes, black intervals, silence, freeze, motion profile, blur, exposure, loudness, and optional transcription.
video_watch: extracts frames and audio/transcript for an initial pass.
video_detail: drills into specific timestamp windows at higher resolution or FPS.
video_configure: saves backend and extraction preferences.
video_setup: checks dependencies and suggests install steps.

The plugin is a perception layer. The agent still does the interpretation: summarizing a video, comparing it to a storyboard, identifying missing beats, or producing a review packet.

The plugin includes two Codex skills:

video-perception: general video inspection workflow.
storyboard-review: compares a render against a storyboard, script, shot list, or production brief.

Requirements

Node.js 20+
ffmpeg and ffprobe
Optional audio backend:
- Gemini API with GEMINI_API_KEY
- local Whisper via whisper.cpp or Python openai-whisper
- OpenAI Whisper API with OPENAI_API_KEY

Install In Codex

Fastest path: ask your Codex agent to install it for you.

Install Codex Video Vision from https://github.com/EyrieCommander/codex-video-vision. Clone it to ~/.codex-video-vision/plugin, run npm ci and npm run build in mcp-server, install the video-perception and storyboard-review skills from the GitHub repo into ~/.codex/skills, add an [mcp_servers.codex-video-vision] entry to ~/.codex/config.toml that runs node against ~/.codex-video-vision/plugin/mcp-server/dist/index.js, run npm run smoke:mcp, and then tell me to restart Codex.

The skill install command your agent should use is:

/usr/bin/python3 ~/.codex/skills/.system/skill-installer/scripts/install-skill-from-github.py \
  --repo EyrieCommander/codex-video-vision \
  --path skills/video-perception skills/storyboard-review

The MCP server entry should look like this, with the absolute path adjusted if you cloned elsewhere:

[mcp_servers.codex-video-vision]
command = "node"
args = ["/Users/you/.codex-video-vision/plugin/mcp-server/dist/index.js"]

Restart Codex after installing. The $ skill menu is populated from installed skills, so the repo merely existing on disk is not enough.

Local Development

Build the MCP server:

cd mcp-server
npm install
npm run build
npm test
npm run smoke:mcp

The plugin manifest lives at .codex-plugin/plugin.json. The local MCP config in .mcp.json starts the built server with:

{
  "codex-video-vision": {
    "command": "node",
    "args": ["./mcp-server/dist/index.js"]
  }
}

Configuration is stored under:

~/.codex-video-vision/config.json
~/.codex-video-vision/.env
~/.codex-video-vision/models/
~/.codex-video-vision/sessions/

API keys can come from the process environment or from a gitignored .env file. The server loads OPENAI_API_KEY and GEMINI_API_KEY from the shell environment first, then from .env in the current working directory, then from ~/.codex-video-vision/.env.

Example user-level key file:

mkdir -p ~/.codex-video-vision
printf 'OPENAI_API_KEY=your_openai_api_key_here\n' > ~/.codex-video-vision/.env
chmod 600 ~/.codex-video-vision/.env

Recommended Agent Workflow

Start with video_info.
For videos longer than 30 seconds, run video_analyze before extracting frames.
Use the transcript, scene changes, and silence intervals to choose frame segments.
Run video_watch for an overview.
Use video_detail for close inspection around key moments.
Avoid re-extracting cached frames when enable_index is on.

Storyboard Review Use Case

For educational video production, the useful Codex task is often not "summarize this video" but "compare this render against the intended script." A good review packet should include:

Source video path and duration.
Storyboard path and intended coverage.
Scenes or shots visibly covered.
Missing or weak story beats.
Transcript/caption status.
Visual consistency issues.
App integration status.
Recommendation: accept, revise, extend, or regenerate.

Publishing Notes

The repository metadata points to https://github.com/EyrieCommander/codex-video-vision. Before a release, run npm run build, npm test, npm run smoke:mcp, and npm pack --dry-run from mcp-server. The GitHub release workflow is manual until npm Trusted Publisher setup is intentional. If publishing to npm later, switch .mcp.json from local node ./mcp-server/dist/index.js to the published package command.

The hero image in assets/hero.avif is inherited from the upstream claude-video-vision project. Replace it before using a distinct project brand or marketplace listing.

License

MIT. Based on claude-video-vision by Jordan Vasconcelos.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.codex-plugin		.codex-plugin
.github		.github
assets		assets
commands		commands
mcp-server		mcp-server
skills		skills
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Video Vision

What It Does

Requirements

Install In Codex

Local Development

Recommended Agent Workflow

Storyboard Review Use Case

Publishing Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codex Video Vision

What It Does

Requirements

Install In Codex

Local Development

Recommended Agent Workflow

Storyboard Review Use Case

Publishing Notes

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages