VisualGrammar is a shot-level visual language analysis system for animation. It translates Bruce Block style concepts into computable evidence, reviewable labels, and editor-ready outputs.
Most video tooling stops at either low-level CV signals or high-level black-box semantics. Shot Analyzer is designed to sit in the middle:
- measure visual elements at the
shotlevel - preserve a structured label contract instead of only free-form descriptions
- support human review and calibration instead of pretending the first output is always correct
- export results into downstream editorial workflows
This repository is therefore both a research-and-tooling codebase and a practical workflow prototype.
Current end-to-end workflow:
- detect shots from an input video
- extract structured visual evidence
- assign shot-level labels and review flags
- inspect results in a local review UI
- correct labels through human annotation
- export data and marker-oriented outputs for downstream use
Current supported runtime surfaces include:
- shot analysis over video files
- local review web UI
- shot-level 8-field annotation persistence
- segment-level rhythm review persistence
- calibration-note generation
- marker CSV and rich marker CSV export
- FCPXML sidecar export for editorial inspection
Phase 1 core labels:
duration_class:short,medium,longtone_class:high_key,mid_key,low_keytemperature_class:warm,cool,neutralsaturation_class:low_saturation,mid_saturation,high_saturationmotion_class:static,slow,active
Current runtime extensions:
line_dominance_class:horizontal,vertical,diagonal,curved,mixedshape_family_class:rectilinear,curvilinear,mixed,indeterminatespace_depth_class:flat,layered,deep
Each runtime extension also exposes confidence and/or review-oriented fields so the system can support calibration instead of only one-pass auto labeling.
At a high level, the repository is organized around three layers:
- Measurement layer
- shot segmentation
- classical CV evidence extraction
- structured shot- and segment-level records
- Decision and review layer
- rule / threshold / scorecard-based labels
- confidence and
needs_reviewrouting - human correction and calibration workflow
- Workflow and export layer
- review UI
- JSON / CSV / Parquet outputs
- marker export
- FCPXML sidecar export
This layered structure is the backbone of the project.
This repository is public-facing and runnable today, but it is still an active research-and-tooling codebase rather than a fully packaged end-user product. The easiest public demo path uses generated or committed sample media rather than private assets.
macOS / Linux:
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install -r requirements.txtWindows (PowerShell):
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txtOpen the committed minimal demo in the local review UI:
. .venv/bin/activate # or .venv\Scripts\Activate.ps1 on Windows
python3 -m src.main serve \
--data-dir examples/minimal-demo \
--video-dir examples/minimal-demo \
--review-db ./tmp/shot-analyzer-reviews.dbThis path requires no asset generation step. It serves a tiny committed sample bundle from the repository.
Run the pipeline on a fully public generated clip:
. .venv/bin/activate
python3 scripts/generate_synthetic_video.py --output ./tmp/shot-analyzer-synthetic.avi
python3 -m src.main analyze \
--input ./tmp/shot-analyzer-synthetic.avi \
--output ./tmp/shot-analyzer-results.json \
--format json \
--threshold 12
python3 -m src.main serve \
--data-dir ./tmp \
--review-db ./tmp/shot-analyzer-reviews.dbThen open the local server shown in the terminal and inspect:
- shot annotation
- segment review
- confidence and review flags
If you want a richer generated sample instead of the tiny synthetic fixture:
. .venv/bin/activate
python3 scripts/generate_sample_animation_10shots.pyThis writes:
examples/generated/sample_animation_10shots.mp4examples/generated/sample_animation_10shots.jsonexamples/generated/sample_animation_10shots_marker_list.csvexamples/generated/sample_animation_10shots_marker_list.mdexamples/generated/sample_animation_10shots_markers.fcpxml
For normal public use, you do not need config.py.
Run shot analysis:
. .venv/bin/activate
python3 -m src.main analyze --input /path/to/video.mp4 --output output.jsonStart the local review web server:
. .venv/bin/activate
python3 -m src.main serve --data-dir ./output --review-db reviews.dbIf you also want in-browser shot preview from the source asset, pass a source video directory:
. .venv/bin/activate
python3 -m src.main serve \
--data-dir ./output \
--review-db reviews.db \
--video-dir /path/to/your/source/videosThe web UI supports these human-in-the-loop workflows:
- click a shot row to open the annotation drawer and save human labels for duration, tone, temperature, saturation, motion, line, shape, and space
- click
优先标注in the shot filter bar to move unannotated, needs-review, and low-confidence shots to the front - if
--video-diris configured and the source filename stem matchesvideo_id, the shot drawer can jump to and play the current shot directly in the browser - click a segment row to open the segment rhythm review drawer and save
human_rhythm_type
Structured outputs currently include:
- shot JSON / CSV / Parquet
- segment JSON / CSV / Parquet
- calibration markdown summaries
- marker CSV and rich marker CSV
- FCPXML sidecar outputs
These are designed for both analysis and downstream editorial inspection.
If you are new to the repository, start here:
README.md: installation, demo path, and repo overviewdocs/PUBLIC_DEMO_GUIDE.md: public demo flows using generated or committed assets onlydocs/ANNOTATION_AND_CALIBRATION_WORKFLOW.md: which UI and output is for which taskexamples/README.md: where public sample artifacts live
Core docs:
docs/LABEL_SPEC.mddocs/EVIDENCE_FIELD_OVERVIEW.mddocs/PRODUCT_VISION.mddocs/ROADMAP_PHASE1.md
Calibration and modeling evolution:
docs/CALIBRATION_DOMAIN_DRIFT_AND_CLASS_COLLAPSE.mddocs/TREE_MODEL_INTEGRATION_PLAN.mddocs/NEURAL_NETWORK_JOINT_OPTIMIZATION_ROADMAP.mddocs/VLM_AGENT_LAYER_FUTURE_WORK.md
Maintainer-only flows are intentionally split out:
docs/MAINTAINER_WORKFLOWS.md
Project governance:
CONTRIBUTING.mdCODE_OF_CONDUCT.mdSECURITY.md
Important current limitations:
- the current analysis core is primarily classical CV
- performance and calibration can drift across titles or styles
- some labels remain conservative and intentionally route ambiguous cases to review
- public demo media is synthetic or generated; real production media stays outside the repository
These are known design realities, not hidden caveats.
The intended evolution path is staged rather than abrupt:
- maintain the current structured classical-CV backbone
- add better calibration and collapse-aware monitoring
- build multi-title calibration pools instead of calibrating against one series only
- introduce tree models over structured evidence fields
- later add neural embeddings and multi-task optimization
- add a VLM / agent semantic layer above the structured pipeline
This keeps the repository interpretable while still leaving room for stronger learned decision layers and agent-style workflow support.
. .venv/bin/activate
python3 -m unittest discover -s tests -v
python3 -m src.main --helpMIT