The Agent OS for AI Video Generation
Orchestrate multiple AI models. Automate entire video pipelines.
From script to publish — one command, one flow, zero babysitting.
Quick Start • Features • ClawFlow • Architecture • Models • Contributing
git clone https://github.com/moose-lab/videoclaw.git
cd videoclaw
uv sync # Python 3.12+, installs deps + creates .venv
uv run claw --help # Done. No activation needed.Or activate the venv to use
clawdirectly:source .venv/bin/activate && claw --help
VideoClaw doesn't generate videos. It orchestrates the models that do.
Think Kubernetes for containers, but for AI video generation.
You've tried Sora, Runway, Kling, CogVideo... Each is impressive alone. But making a real video still means:
- Writing prompts for each shot manually
- Waiting, downloading, re-uploading between tools
- No idea what it costs until the bill arrives
- Starting from scratch when one shot fails
- Manually stitching, adding subtitles, music, voiceover
VideoClaw fixes all of this.
# Clone and install (requires Python 3.12+ and uv)
git clone https://github.com/moose-lab/videoclaw.git
cd videoclaw
uv sync # Install dependencies + create .venv
# Option A: Use uv run (recommended, no activation needed)
uv run claw --help
uv run claw doctor # Check system readiness
# Option B: Activate virtualenv, then use claw directly
source .venv/bin/activate
claw --help
claw doctor
# Generate a video from a single prompt
uv run claw generate "A 30-second product intro for a smart watch, cinematic style"
# Or run individual stages independently
uv run claw video "A cat riding a skateboard" -d 5 -o cat.mp4
uv run claw image "Character portrait" --provider gemini -o portrait.png
uv run claw tts "Hello world" --lang en -o hello.mp3
uv run claw storyboard "Product unboxing" -d 30 -o shots.json
# Agent-friendly: JSON output for programmatic use
uv run claw -j video "sunset over ocean" -o sunset.mp4
# → {"ok": true, "command": "video", "data": {"path": "...", "cost_usd": 0.05}, "error": null}
# Or run a YAML pipeline
uv run claw flow run examples/product-promo.yamlDefine your entire video pipeline in a version-controllable YAML file:
name: product-promo
variables:
product: "VideoClaw"
steps:
- id: script
type: script_gen
params:
prompt: "Write a promo for {{product}}"
- id: storyboard
type: storyboard
depends_on: [script]
- id: hero_shot
type: video_gen
depends_on: [storyboard]
params:
prompt: "{{product}} logo reveal, cinematic"
model_id: sora
- id: narration
type: tts
depends_on: [script]
- id: compose
type: compose
depends_on: [hero_shot, narration]
- id: render
type: render
depends_on: [compose]Features: variable interpolation ({{var}}), dependency validation, cycle detection, parallel execution of independent steps.
claw flow validate my-pipeline.yaml # Check without running
claw flow run my-pipeline.yaml # Execute the pipelineVideoClaw includes a complete production pipeline for TikTok-format Western AI short dramas — from script import to published episode:
# Import a script and set up the series
claw drama import script.docx --title "Satan in a Suit" --language en
# Design character turnaround sheets for visual consistency
claw drama design-characters <series_id>
# Preview Seedance 2.0 prompts before spending API credits
claw drama preview-prompts <series_id>
# Run the full pipeline: design → generate → audit → fix → export
claw drama pipeline <series_id> --episode 1
# Or run individual stages
claw drama run <series_id> --max-shots 5 # Test with first 5 shots
claw drama audit <series_id> # Vision QA with Claude
claw drama audit-regen <series_id> # Auto-fix failing shots
claw drama export <series_id> # Export deliverablesKey capabilities:
- Seedance 2.0 video generation (9:16 vertical, 720p)
- Character consistency via Universal Reference turnaround sheets
- Vision QA with Claude for automated shot quality review
- Self-correcting audit-regen loop — bad shots are auto-detected and regenerated
- Multi-episode series with cross-episode continuity
One pipeline, multiple models. VideoClaw picks the best model for each shot based on your strategy — quality, speed, or cost.
Same 30s video:
All Sora: $2.50 ~3 min
VideoClaw hybrid: $0.47 ~2 min <- auto-routes simple shots locally
VideoClaw all-local: $0.00 ~6 min
The Director takes your prompt and uses an LLM to produce a structured production plan: scene breakdown, visual descriptions, camera movements, voiceover script, and music style. Supports prompt refinement based on reviewer feedback.
Protocol-based AI agents that think, act, review, and collaborate. Four built-in agents:
| Agent | Role | Wraps |
|---|---|---|
| DirectorAgent | Production planning, prompt refinement | Director, DramaPlanner |
| CameramanAgent | Visual prompt enhancement, shot generation | PromptEnhancer, VideoGenerator |
| ReviewerAgent | Vision QA, quality validation | VisionAuditor, QualityValidator |
| ProducerAgent | Pipeline orchestration, budget tracking | DramaRunner, CostTracker |
Agents plug into the DAG Executor via AgentTeam.install_handlers() — zero changes to the core pipeline. Third-party agents are auto-discovered via entry points.
Real-time per-node cost display. Budget guards. Optimization hints. Know exactly what every video costs.
Dependency-aware parallel execution. Shots generate concurrently. If one fails, others keep running. Resume from any checkpoint.
Designed for local inference on Mac. MPS backend support for PyTorch-based models.
You --> AgentTeam --> DirectorAgent --> Planner --> DAG Executor
| | |
| v +--------+--------+
| CameramanAgent v v v
| | [Seedance] [Kling] [Mock]
| v | | |
| ReviewerAgent +--------+--------+
| | v
v v Compose → Render
ProducerAgent --> Quality Gate |
v
Output / Publish
Seven-layer design:
| Layer | Purpose |
|---|---|
| Interface | CLI (claw) + REST API (optional) |
| Gateway | FastAPI server, WebSocket progress |
| Agent Runtime | AgentTeam, Director, Reviewer, Cameraman, Producer |
| Orchestration | DAG Planner, Executor, Event Bus, State Manager |
| Generation | Script, Storyboard, Video, TTS, Music, Compose |
| Model Adapters | Protocol-based adapters (Seedance, Kling, OpenAI, etc.) |
| Distribution | Publishers (YouTube, TikTok, Bilibili) |
| Category | Models | Mode |
|---|---|---|
| Video | Seedance 2.0, Kling, Sora (OpenAI), MiniMax, ZhipuAI, CogVideoX | Cloud + Local |
| LLM | Claude, GPT, Qwen, DeepSeek, Ollama (via LiteLLM) | Cloud + Local |
| TTS | Edge-TTS, Fish-Speech, ElevenLabs, ChatTTS | Cloud + Local |
| Music | Suno, Udio, MusicGen | Cloud + Local |
Adding a new model? Implement the
VideoModelAdapterprotocol (4 async methods). No ABC inheritance needed.
All commands support
--json / -jfor structured JSON output (agent-friendly).
# Full pipeline
claw generate <prompt> # Script → shots → compose → render
claw generate <prompt> --dry-run # Preview DAG without executing
# Single-stage commands (run each step independently)
claw video <prompt> # Generate a single video clip
claw image <prompt> # Generate a single image
claw tts <text> # Text-to-speech (supports stdin pipe)
claw storyboard <prompt> # Decompose prompt into shot list
claw compose <v1.mp4> <v2.mp4> ... # Compose multiple clips together
claw render <input.mp4> # Encode/render final video
claw subtitle <scenes.json> # Generate SRT/ASS subtitles
# ClawFlow YAML pipelines
claw flow run <file.yaml> # Execute a pipeline
claw flow validate <file.yaml> # Validate without running
# AI short drama series (full production pipeline)
claw drama new <synopsis> # Create series from concept
claw drama import <script.docx> # Import complete script (locked mode)
claw drama plan <id> # Plan episodes via LLM
claw drama script <id> # Generate episode scripts
claw drama design-characters <id> # Generate turnaround sheets
claw drama design-scenes <id> # Generate scene reference images
claw drama assign-voices <id> # Assign voice profiles
claw drama preview-prompts <id> # Preview Seedance 2.0 prompts
claw drama run <id> # Execute generation pipeline
claw drama audit <id> # Vision QA with Claude
claw drama audit-regen <id> # Auto-fix failing shots
claw drama pipeline <id> # Full pipeline (design → run → audit)
claw drama regen-shot <id> <shot> # Regenerate single shot
claw drama export <id> # Export deliverables
claw drama list # List all series
claw drama show <id> # Show series details
# Management
claw config show # View all config (API keys masked)
claw config check # Validate config completeness
claw doctor # System health check
claw model list # List model adapters
claw project list # List all projects
claw project show <id> # Show project details
claw project delete <id> # Delete project and assets# Start the server
uvicorn videoclaw.server.app:create_app --factory
# Endpoints
GET /health # Health check
POST /api/projects/ # Create project
GET /api/projects/ # List projects
GET /api/projects/{id} # Get project details
DELETE /api/projects/{id} # Delete project
POST /api/generate/ # Start generation pipeline
POST /api/generate/flow # Run a ClawFlow pipeline
GET /api/generate/{id}/status# Check generation status
WS /ws/{project_id} # Real-time progress updatesdocker compose up
# API available at http://localhost:8000videoclaw/
├── src/videoclaw/
│ ├── cli/ # CLI package (Typer + Rich)
│ │ ├── _app.py # App definition, validators, helpers
│ │ ├── _output.py # JSON output mode (OutputContext)
│ │ ├── stage.py # Single-stage commands (video/image/tts/...)
│ │ ├── generate.py # Full pipeline command
│ │ ├── drama.py # Drama series commands
│ │ ├── config_cmd.py # Config management
│ │ └── ... # doctor, model, project, template, flow
│ ├── config.py # Configuration (Pydantic Settings)
│ ├── core/ # Director, DAG engine, state, events
│ ├── agents/ # Video Agent framework (Director, Reviewer, Cameraman, Producer)
│ ├── models/ # Model adapters, registry, LLM wrapper
│ ├── generation/ # Script, storyboard, video, audio, compose
│ ├── drama/ # AI short drama orchestration
│ ├── cost/ # Cost tracking + budget guards
│ ├── flow/ # ClawFlow YAML parser + runner
│ ├── server/ # FastAPI REST API (optional, headless)
│ ├── storage/ # Local filesystem storage
│ ├── publishers/ # YouTube, Bilibili publishers
│ └── utils/ # FFmpeg helpers
├── examples/ # Example ClawFlow YAML pipelines
├── tests/ # Unit + integration tests
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml
# Set environment variables or use .env file
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
For Sora/GPT | OpenAI API key |
ANTHROPIC_API_KEY |
For Claude | Anthropic API key |
VIDEOCLAW_DEFAULT_LLM |
No | Default LLM (default: gpt-4o) |
VIDEOCLAW_DEFAULT_VIDEO_MODEL |
No | Default video model (default: mock) |
VIDEOCLAW_PROJECTS_DIR |
No | Project storage path (default: ./projects) |
VIDEOCLAW_BUDGET_DEFAULT_USD |
No | Default budget cap (default: 10.0) |
git clone https://github.com/moose-lab/videoclaw.git
cd videoclaw
uv sync --all-extras # Install all deps including dev/server
# or: make dev
uv run pytest tests/ -v # Run tests
uv run ruff check src/ tests/ # Lint
# or: make test / make lint- Phase 1: Core engine, DAG executor, model adapters, CLI, cost tracking
- Phase 2: FastAPI server, WebSocket, storage, publishers, test suite
- Phase 3: ClawFlow YAML engine, integration tests, Docker
- Phase 4: Director LLM integration, GitHub Actions CI, flow templates
- Phase 5: AI Short Drama orchestration, Seedance 2.0, Vision QA, audit-regen loop
- Phase 6: Agent framework (Director, Reviewer, Cameraman, Producer), AgentTeam, entry-point discovery
- Phase 7: Multi-agent collaboration, MCP server, skill/tool integration
- Phase 8: Plugin marketplace (ClawHub) + universal video orchestration platform
Modified MIT — see LICENSE for details.