VisualGrammar: Shot-Level Visual Language Analysis for Animation

VisualGrammar is a shot-level visual language analysis system for animation. It translates Bruce Block style concepts into computable evidence, reviewable labels, and editor-ready outputs.

Why this project exists

Most video tooling stops at either low-level CV signals or high-level black-box semantics. Shot Analyzer is designed to sit in the middle:

measure visual elements at the shot level
preserve a structured label contract instead of only free-form descriptions
support human review and calibration instead of pretending the first output is always correct
export results into downstream editorial workflows

This repository is therefore both a research-and-tooling codebase and a practical workflow prototype.

What the system does today

Current end-to-end workflow:

detect shots from an input video
extract structured visual evidence
assign shot-level labels and review flags
inspect results in a local review UI
correct labels through human annotation
export data and marker-oriented outputs for downstream use

Current supported runtime surfaces include:

shot analysis over video files
local review web UI
shot-level 8-field annotation persistence
segment-level rhythm review persistence
calibration-note generation
marker CSV and rich marker CSV export
FCPXML sidecar export for editorial inspection

Visual-element coverage

Phase 1 core labels:

duration_class: short, medium, long
tone_class: high_key, mid_key, low_key
temperature_class: warm, cool, neutral
saturation_class: low_saturation, mid_saturation, high_saturation
motion_class: static, slow, active

Current runtime extensions:

line_dominance_class: horizontal, vertical, diagonal, curved, mixed
shape_family_class: rectilinear, curvilinear, mixed, indeterminate
space_depth_class: flat, layered, deep

Each runtime extension also exposes confidence and/or review-oriented fields so the system can support calibration instead of only one-pass auto labeling.

System architecture

At a high level, the repository is organized around three layers:

Measurement layer
- shot segmentation
- classical CV evidence extraction
- structured shot- and segment-level records
Decision and review layer
- rule / threshold / scorecard-based labels
- confidence and needs_review routing
- human correction and calibration workflow
Workflow and export layer
- review UI
- JSON / CSV / Parquet outputs
- marker export
- FCPXML sidecar export

This layered structure is the backbone of the project.

Quick start

This repository is public-facing and runnable today, but it is still an active research-and-tooling codebase rather than a fully packaged end-user product. The easiest public demo path uses generated or committed sample media rather than private assets.

Install

macOS / Linux:

python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install -r requirements.txt

Windows (PowerShell):

python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt

Fastest demo

Open the committed minimal demo in the local review UI:

. .venv/bin/activate   # or .venv\Scripts\Activate.ps1 on Windows
python3 -m src.main serve \
  --data-dir examples/minimal-demo \
  --video-dir examples/minimal-demo \
  --review-db ./tmp/shot-analyzer-reviews.db

This path requires no asset generation step. It serves a tiny committed sample bundle from the repository.

End-to-end generated demo

Run the pipeline on a fully public generated clip:

. .venv/bin/activate
python3 scripts/generate_synthetic_video.py --output ./tmp/shot-analyzer-synthetic.avi
python3 -m src.main analyze \
  --input ./tmp/shot-analyzer-synthetic.avi \
  --output ./tmp/shot-analyzer-results.json \
  --format json \
  --threshold 12
python3 -m src.main serve \
  --data-dir ./tmp \
  --review-db ./tmp/shot-analyzer-reviews.db

Then open the local server shown in the terminal and inspect:

shot annotation
segment review
confidence and review flags

If you want a richer generated sample instead of the tiny synthetic fixture:

. .venv/bin/activate
python3 scripts/generate_sample_animation_10shots.py

This writes:

examples/generated/sample_animation_10shots.mp4
examples/generated/sample_animation_10shots.json
examples/generated/sample_animation_10shots_marker_list.csv
examples/generated/sample_animation_10shots_marker_list.md
examples/generated/sample_animation_10shots_markers.fcpxml

Basic usage

For normal public use, you do not need config.py.

Run shot analysis:

. .venv/bin/activate
python3 -m src.main analyze --input /path/to/video.mp4 --output output.json

Start the local review web server:

. .venv/bin/activate
python3 -m src.main serve --data-dir ./output --review-db reviews.db

If you also want in-browser shot preview from the source asset, pass a source video directory:

. .venv/bin/activate
python3 -m src.main serve \
  --data-dir ./output \
  --review-db reviews.db \
  --video-dir /path/to/your/source/videos

The web UI supports these human-in-the-loop workflows:

click a shot row to open the annotation drawer and save human labels for duration, tone, temperature, saturation, motion, line, shape, and space
click 优先标注 in the shot filter bar to move unannotated, needs-review, and low-confidence shots to the front
if --video-dir is configured and the source filename stem matches video_id, the shot drawer can jump to and play the current shot directly in the browser
click a segment row to open the segment rhythm review drawer and save human_rhythm_type

Export surfaces

Structured outputs currently include:

shot JSON / CSV / Parquet
segment JSON / CSV / Parquet
calibration markdown summaries
marker CSV and rich marker CSV
FCPXML sidecar outputs

These are designed for both analysis and downstream editorial inspection.

Documentation map

If you are new to the repository, start here:

README.md: installation, demo path, and repo overview
docs/PUBLIC_DEMO_GUIDE.md: public demo flows using generated or committed assets only
docs/ANNOTATION_AND_CALIBRATION_WORKFLOW.md: which UI and output is for which task
examples/README.md: where public sample artifacts live

Core docs:

docs/LABEL_SPEC.md
docs/EVIDENCE_FIELD_OVERVIEW.md
docs/PRODUCT_VISION.md
docs/ROADMAP_PHASE1.md

Calibration and modeling evolution:

docs/CALIBRATION_DOMAIN_DRIFT_AND_CLASS_COLLAPSE.md
docs/TREE_MODEL_INTEGRATION_PLAN.md
docs/NEURAL_NETWORK_JOINT_OPTIMIZATION_ROADMAP.md
docs/VLM_AGENT_LAYER_FUTURE_WORK.md

Maintainer-only flows are intentionally split out:

docs/MAINTAINER_WORKFLOWS.md

Project governance:

CONTRIBUTING.md
CODE_OF_CONDUCT.md
SECURITY.md

Current limitations

Important current limitations:

the current analysis core is primarily classical CV
performance and calibration can drift across titles or styles
some labels remain conservative and intentionally route ambiguous cases to review
public demo media is synthetic or generated; real production media stays outside the repository

These are known design realities, not hidden caveats.

Future direction

The intended evolution path is staged rather than abrupt:

maintain the current structured classical-CV backbone
add better calibration and collapse-aware monitoring
build multi-title calibration pools instead of calibrating against one series only
introduce tree models over structured evidence fields
later add neural embeddings and multi-task optimization
add a VLM / agent semantic layer above the structured pipeline

This keeps the repository interpretable while still leaving room for stronger learned decision layers and agent-style workflow support.

Validation

. .venv/bin/activate
python3 -m unittest discover -s tests -v
python3 -m src.main --help

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
data		data
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
config.example.py		config.example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisualGrammar: Shot-Level Visual Language Analysis for Animation

Why this project exists

What the system does today

Visual-element coverage

System architecture

Quick start

Install

Fastest demo

End-to-end generated demo

Basic usage

Export surfaces

Documentation map

Current limitations

Future direction

Validation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisualGrammar: Shot-Level Visual Language Analysis for Animation

Why this project exists

What the system does today

Visual-element coverage

System architecture

Quick start

Install

Fastest demo

End-to-end generated demo

Basic usage

Export surfaces

Documentation map

Current limitations

Future direction

Validation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages