FormationAI

Turn any dance video into a top-down formation map — built for student choreographers at large-scale showcases like Cal Poly's Illuminate, LanternFest, and CultureFest.

The Problem

Cal Poly's showcases bring together dozens of non-audition dance groups on one stage. Student choreographers are managing 50–80 people with no professional tools — drawing diagrams by hand, pausing and rewinding videos, guessing at spacing. We fix that.

Getting Started

Prerequisites

Python 3.12+
Node.js 20.17+
ffmpeg (required for video merging)
uv (Python package manager) - Install uv

Install ffmpeg on Windows:

winget install ffmpeg

Install uv:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

Backend Setup

Using uv (recommended)

cd backend
uv sync
uv run python main.py

Using pip (alternative)

cd backend
pip install -r requirements.txt
python main.py

Backend runs on http://localhost:8000. YOLOv11 model (~7MB) downloads automatically on first startup.

Note: If you see YOLO warmup failed on startup with pip, run:
pip install torch torchvision ultralytics>=8.4.0

Frontend Setup

cd frontend
npm install
npm run dev

Frontend runs on http://localhost:5173.

Environment

Copy the example env file:

cp frontend/.env.example frontend/.env

Default points to http://localhost:8000 — no changes needed for local dev.

Formation Detection

The app includes auto-detection of stable formations using multi-signal analysis:

Motion detection (frame difference)
People counting (YOLO)
Scene cut detection (edge analysis)
Temporal stability requirements

Detection Presets

Choose from three presets in the UI:

Strict - Fewer false positives, best for clean practice videos
Balanced - Good default for most videos (recommended)
Loose - Catches more formations, best for fast choreography

Fine-Tuning Detection

Getting too many false positives? See DETECTION_TUNING.md for:

Parameter explanations
Troubleshooting guide
Example configurations for different video types
How to customize detection behavior

Full-Video Tracking (NEW!)

Track dancers throughout the entire video with persistent IDs and occlusion handling:

Key Features

✅ Persistent IDs - Same dancer keeps same ID from start to finish
✅ Occlusion Handling - Infers positions when dancers are hidden behind others
✅ Trajectory Tracking - Records complete movement paths
✅ Re-identification - Matches dancers after temporary occlusion

Quick Start

# Start tracking (takes 2-5 minutes for 4-min video)
POST /tracking/track/{session_id}
{
  "sample_rate": 5,
  "confidence": 0.3,
  "tracker": "botsort"
}

# Get dancers at any timestamp with consistent IDs
POST /tracking/dancers-at-timestamp/{session_id}
{
  "timestamp": 45.5
}

# Create visualization video with IDs and trajectories
POST /tracking/visualize/{session_id}

When to Use

Use Per-Frame Detection (current default):

Quick results needed (seconds)
Only specific timestamps matter
No occlusions in video

Use Full-Video Tracking:

Need consistent IDs across video
Dancers frequently occluded
Analyzing full choreography
Can wait 2-5 minutes

See TRACKING_SYSTEM.md for complete documentation.

1. Video Ingestion

Accept a YouTube URL as input
Download video server-side (yt-dlp)
Extract and store video metadata (title, duration, thumbnail)

2. Timestamp Selection

Auto-detect formation timestamps (stable groupings held 3+ seconds)
Allow users to manually select/override timestamps from the video timeline
Generate a screenshot (JPEG) for each selected timestamp

3. Dancer Detection & Identification

Scan the full video to count total number of dancers
Tag each dancer with a consistent ID across frames (number + AI-generated description e.g. "Dancer 3 — red shirt, left side")
Generate a name-to-person mapping (ID → visual description)
Display tagged screenshots so users can verify accuracy

4. Environment Detection

Identify the stage/floor plane from the video (used to calibrate depth perception)
Extract camera angle and perspective reference points

5. Formation Extraction

For each timestamp screenshot, detect dancer positions (x, y in frame)
Apply perspective transform (homography) to convert front-facing camera view → top-down bird's-eye view
Render top-down formation diagram with numbered dancer dots

6. Data Export

Store all data locally per session
Export as JSON: { timestamp, screenshot_path, formation: [{ id, label, x, y }] }
Allow download of JSON + associated JPEGs as a zip

7. Deployment

Deploy frontend on Vercel
Deploy backend (FastAPI) on Railway or Render

Stage 2: Later (Post-MVP)

Stage dimension input (e.g. 20ft x 30ft) to scale the top-down view accurately
Multi-video support (compare formations across rehearsals)
Formation timeline scrubber — animated playback of formation changes
Export to PDF formation sheet (printable for rehearsal)
Shareable links per formation set
Support for other video sources (direct upload, Vimeo)
Choreographer notes per formation
Multi-group support for full showcase planning (multiple acts, shared stage)

Tech Stack

Layer	Tech
Frontend	React + Tailwind, deployed on Vercel
Backend	FastAPI (Python), deployed on Railway
Video Download	yt-dlp (server-side)
Pose Estimation	YOLOv8-pose or MediaPipe
Dancer Tracking	ByteTrack (consistent IDs across frames)
Depth / Perspective	Depth Anything v2 API + OpenCV homography
Storage	Local filesystem per session (S3 later)
Export	JSON + JPEG zip download

Team

Person	Responsibility
1	Frontend — UI, timeline, formation canvas renderer
2	Backend — API, video pipeline, file storage, deployment
3	ML — pose estimation, dancer tracking, ID assignment
4	CV — environment detection, perspective transform, top-down mapping

Demo Flow (Hackathon)

Paste a YouTube link to a K-pop or Illuminate-style practice video
App processes video, detects dancers, assigns IDs
User sees timestamped formation list — clicks a timestamp
Side-by-side view: original screenshot (left) + top-down formation map (right)
Download JSON + images

The Pitch

"Cal Poly's Illuminate, LanternFest, and CultureFest bring together dozens of dance groups on one stage — non-audition, open to everyone, celebrating culture and community. The student choreographers making it happen are managing 50–80 people with no professional tools. We built the tool that gives them a professional workflow."

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.kiro/specs		.kiro/specs
.vscode		.vscode
backend		backend
frontend		frontend
.gitignore		.gitignore
ADD-FORMATION-FEATURE.md		ADD-FORMATION-FEATURE.md
COOKIE-SETUP.md		COOKIE-SETUP.md
COOKIE-STATUS.md		COOKIE-STATUS.md
FIX-RENDER-COOKIES-NOW.md		FIX-RENDER-COOKIES-NOW.md
IMPLEMENTATION-SUMMARY.md		IMPLEMENTATION-SUMMARY.md
LICENSE.md		LICENSE.md
PRD.md		PRD.md
QUICK-FIX.md		QUICK-FIX.md
README.md		README.md
RENDER-CHECKLIST.md		RENDER-CHECKLIST.md
RENDER-COOKIES-FIX.md		RENDER-COOKIES-FIX.md
RENDER-COOKIES-SIMPLE.md		RENDER-COOKIES-SIMPLE.md
RENDER-SECRET-FILES.md		RENDER-SECRET-FILES.md
RENDER-SETUP-FINAL.md		RENDER-SETUP-FINAL.md
ROBUST-DOWNLOADER.md		ROBUST-DOWNLOADER.md
SOLUTION-SUMMARY.md		SOLUTION-SUMMARY.md
example-choreo-video.mp4		example-choreo-video.mp4
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FormationAI

The Problem

Getting Started

Prerequisites

Backend Setup

Using uv (recommended)

Using pip (alternative)

Frontend Setup

Environment

Formation Detection

Detection Presets

Fine-Tuning Detection

Full-Video Tracking (NEW!)

Key Features

Quick Start

When to Use

1. Video Ingestion

2. Timestamp Selection

3. Dancer Detection & Identification

4. Environment Detection

5. Formation Extraction

6. Data Export

7. Deployment

Stage 2: Later (Post-MVP)

Tech Stack

Team

Demo Flow (Hackathon)

The Pitch

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FormationAI

The Problem

Getting Started

Prerequisites

Backend Setup

Using uv (recommended)

Using pip (alternative)

Frontend Setup

Environment

Formation Detection

Detection Presets

Fine-Tuning Detection

Full-Video Tracking (NEW!)

Key Features

Quick Start

When to Use

1. Video Ingestion

2. Timestamp Selection

3. Dancer Detection & Identification

4. Environment Detection

5. Formation Extraction

6. Data Export

7. Deployment

Stage 2: Later (Post-MVP)

Tech Stack

Team

Demo Flow (Hackathon)

The Pitch

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages