Skip to content

Leslie-ller/VisualGrammar

VisualGrammar: Shot-Level Visual Language Analysis for Animation

English | 简体中文

VisualGrammar is a shot-level visual language analysis system for animation. It translates Bruce Block style concepts into computable evidence, reviewable labels, and editor-ready outputs.

Why this project exists

Most video tooling stops at either low-level CV signals or high-level black-box semantics. Shot Analyzer is designed to sit in the middle:

  • measure visual elements at the shot level
  • preserve a structured label contract instead of only free-form descriptions
  • support human review and calibration instead of pretending the first output is always correct
  • export results into downstream editorial workflows

This repository is therefore both a research-and-tooling codebase and a practical workflow prototype.

What the system does today

Current end-to-end workflow:

  1. detect shots from an input video
  2. extract structured visual evidence
  3. assign shot-level labels and review flags
  4. inspect results in a local review UI
  5. correct labels through human annotation
  6. export data and marker-oriented outputs for downstream use

Current supported runtime surfaces include:

  • shot analysis over video files
  • local review web UI
  • shot-level 8-field annotation persistence
  • segment-level rhythm review persistence
  • calibration-note generation
  • marker CSV and rich marker CSV export
  • FCPXML sidecar export for editorial inspection

Visual-element coverage

Phase 1 core labels:

  • duration_class: short, medium, long
  • tone_class: high_key, mid_key, low_key
  • temperature_class: warm, cool, neutral
  • saturation_class: low_saturation, mid_saturation, high_saturation
  • motion_class: static, slow, active

Current runtime extensions:

  • line_dominance_class: horizontal, vertical, diagonal, curved, mixed
  • shape_family_class: rectilinear, curvilinear, mixed, indeterminate
  • space_depth_class: flat, layered, deep

Each runtime extension also exposes confidence and/or review-oriented fields so the system can support calibration instead of only one-pass auto labeling.

System architecture

At a high level, the repository is organized around three layers:

  1. Measurement layer
    • shot segmentation
    • classical CV evidence extraction
    • structured shot- and segment-level records
  2. Decision and review layer
    • rule / threshold / scorecard-based labels
    • confidence and needs_review routing
    • human correction and calibration workflow
  3. Workflow and export layer
    • review UI
    • JSON / CSV / Parquet outputs
    • marker export
    • FCPXML sidecar export

This layered structure is the backbone of the project.

Quick start

This repository is public-facing and runnable today, but it is still an active research-and-tooling codebase rather than a fully packaged end-user product. The easiest public demo path uses generated or committed sample media rather than private assets.

Install

macOS / Linux:

python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install -r requirements.txt

Windows (PowerShell):

python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt

Fastest demo

Open the committed minimal demo in the local review UI:

. .venv/bin/activate   # or .venv\Scripts\Activate.ps1 on Windows
python3 -m src.main serve \
  --data-dir examples/minimal-demo \
  --video-dir examples/minimal-demo \
  --review-db ./tmp/shot-analyzer-reviews.db

This path requires no asset generation step. It serves a tiny committed sample bundle from the repository.

End-to-end generated demo

Run the pipeline on a fully public generated clip:

. .venv/bin/activate
python3 scripts/generate_synthetic_video.py --output ./tmp/shot-analyzer-synthetic.avi
python3 -m src.main analyze \
  --input ./tmp/shot-analyzer-synthetic.avi \
  --output ./tmp/shot-analyzer-results.json \
  --format json \
  --threshold 12
python3 -m src.main serve \
  --data-dir ./tmp \
  --review-db ./tmp/shot-analyzer-reviews.db

Then open the local server shown in the terminal and inspect:

  • shot annotation
  • segment review
  • confidence and review flags

If you want a richer generated sample instead of the tiny synthetic fixture:

. .venv/bin/activate
python3 scripts/generate_sample_animation_10shots.py

This writes:

  • examples/generated/sample_animation_10shots.mp4
  • examples/generated/sample_animation_10shots.json
  • examples/generated/sample_animation_10shots_marker_list.csv
  • examples/generated/sample_animation_10shots_marker_list.md
  • examples/generated/sample_animation_10shots_markers.fcpxml

Basic usage

For normal public use, you do not need config.py.

Run shot analysis:

. .venv/bin/activate
python3 -m src.main analyze --input /path/to/video.mp4 --output output.json

Start the local review web server:

. .venv/bin/activate
python3 -m src.main serve --data-dir ./output --review-db reviews.db

If you also want in-browser shot preview from the source asset, pass a source video directory:

. .venv/bin/activate
python3 -m src.main serve \
  --data-dir ./output \
  --review-db reviews.db \
  --video-dir /path/to/your/source/videos

The web UI supports these human-in-the-loop workflows:

  • click a shot row to open the annotation drawer and save human labels for duration, tone, temperature, saturation, motion, line, shape, and space
  • click 优先标注 in the shot filter bar to move unannotated, needs-review, and low-confidence shots to the front
  • if --video-dir is configured and the source filename stem matches video_id, the shot drawer can jump to and play the current shot directly in the browser
  • click a segment row to open the segment rhythm review drawer and save human_rhythm_type

Export surfaces

Structured outputs currently include:

  • shot JSON / CSV / Parquet
  • segment JSON / CSV / Parquet
  • calibration markdown summaries
  • marker CSV and rich marker CSV
  • FCPXML sidecar outputs

These are designed for both analysis and downstream editorial inspection.

Documentation map

If you are new to the repository, start here:

  • README.md: installation, demo path, and repo overview
  • docs/PUBLIC_DEMO_GUIDE.md: public demo flows using generated or committed assets only
  • docs/ANNOTATION_AND_CALIBRATION_WORKFLOW.md: which UI and output is for which task
  • examples/README.md: where public sample artifacts live

Core docs:

  • docs/LABEL_SPEC.md
  • docs/EVIDENCE_FIELD_OVERVIEW.md
  • docs/PRODUCT_VISION.md
  • docs/ROADMAP_PHASE1.md

Calibration and modeling evolution:

  • docs/CALIBRATION_DOMAIN_DRIFT_AND_CLASS_COLLAPSE.md
  • docs/TREE_MODEL_INTEGRATION_PLAN.md
  • docs/NEURAL_NETWORK_JOINT_OPTIMIZATION_ROADMAP.md
  • docs/VLM_AGENT_LAYER_FUTURE_WORK.md

Maintainer-only flows are intentionally split out:

  • docs/MAINTAINER_WORKFLOWS.md

Project governance:

  • CONTRIBUTING.md
  • CODE_OF_CONDUCT.md
  • SECURITY.md

Current limitations

Important current limitations:

  • the current analysis core is primarily classical CV
  • performance and calibration can drift across titles or styles
  • some labels remain conservative and intentionally route ambiguous cases to review
  • public demo media is synthetic or generated; real production media stays outside the repository

These are known design realities, not hidden caveats.

Future direction

The intended evolution path is staged rather than abrupt:

  1. maintain the current structured classical-CV backbone
  2. add better calibration and collapse-aware monitoring
  3. build multi-title calibration pools instead of calibrating against one series only
  4. introduce tree models over structured evidence fields
  5. later add neural embeddings and multi-task optimization
  6. add a VLM / agent semantic layer above the structured pipeline

This keeps the repository interpretable while still leaving room for stronger learned decision layers and agent-style workflow support.

Validation

. .venv/bin/activate
python3 -m unittest discover -s tests -v
python3 -m src.main --help

License

MIT

About

VisualGrammar: Shot-Level Visual Language Analysis for Animation

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors