Soccer Video Processing Pipeline

First time? Start here: START_HERE.md — a step-by-step guide that gets you from zero to your first goalkeeper reel.

Intended use: This software is designed for youth and amateur soccer analytics and personal experimentation. It is not affiliated with, endorsed by, or approved by any professional league, federation, or governing body.

Automatic event detection system that processes soccer match recordings and produces:

Goalkeeper Reel — saves, goal kicks, distribution, corners, one-on-ones
Highlights Reel — shots, goals, near-misses

Drop in a video, set your team colors, and get a reel back. No manual tagging.

How It Works

The detection pipeline has three phases:

Motion + Audio — dense frame-differencing finds activity spikes; audio analysis detects whistles and crowd surges. Together they produce ~300 candidate moments per match.
VLM Classification — a vision-language model (Qwen3-VL, Claude, or any OpenAI-compatible endpoint) classifies each candidate: shot, save, goal, corner, goal kick, throw-in, etc.
Structural Inference — post-VLM phases use restart patterns to refine classifications. Shot followed by corner = parry save. Shot followed by quick goal kick = miss. Shot with no restart = catch. Kickoff after shot = goal.

After detection, clips are cut with event-specific padding, merged when overlapping, and assembled into reels via FFmpeg.

Prerequisites

Everything below is what you need to provide. The rest is included in the repo or auto-installed on first run.

Required

Resource	What	Notes
Docker Desktop	Runs Redis, and optionally the worker + API	docker.com/products/docker-desktop
FFmpeg	Frame extraction and reel assembly	Auto-installed in Docker mode; on Mac, `setup.sh` installs via Homebrew if missing
Video files	MP4 recordings of soccer matches	Sideline camera at ~50m, 1080p or 4K. Phone/GoPro/camcorder all work
~30 GB scratch space	Per concurrent job, for frame extraction and temporary files	Configurable via `WORKING_DIR` (default: `/tmp/soccer-pipeline`)

Vision-Language Model (VLM) — Recommended

The pipeline uses motion detection and audio analysis to find candidate moments, then a VLM classifies them (shot, save, goal, corner, etc.). Without a VLM, you get unclassified motion candidates only — the reel will include many false positives.

You need one of the following VLM backends:

Option	Cost	Hardware	Accuracy	Setup
Anthropic API (Claude)	~$2-5/match	None (cloud)	High	Set `ANTHROPIC_API_KEY` in `.env`
Self-hosted vLLM	Free (after hardware)	2x RTX 3090 or equivalent (~48 GB VRAM)	Highest	Run vLLM with a vision model, point `VLLM_URL` at it
Ollama	Free	1x GPU with >=16 GB VRAM	Medium	Run Ollama, set `VLLM_URL=http://localhost:11434/v1`
Any OpenAI-compatible API	Varies	Varies	Varies	Set `VLLM_URL` to the endpoint
None	Free	None	Low	Set `VLLM_ENABLED=false` and `VLM_ENABLED=false`

Which should I pick?

Easiest: Anthropic API — sign up at console.anthropic.com, get an API key, paste it in .env. No GPU needed. A full match costs a few dollars.
Best accuracy, free: Self-hosted vLLM with Qwen/Qwen3-VL-32B-Instruct-FP8 on 2x RTX 3090 (tensor-parallel). This is what the pipeline was tuned on.
Budget GPU: Ollama with a smaller vision model (e.g., llava or minicpm-v) on a single GPU. Lower accuracy but functional.

The VLM backend is configured entirely through environment variables — no code changes needed:

# Option A: Anthropic API (easiest)
VLM_ENABLED=true
ANTHROPIC_API_KEY=sk-ant-...
VLLM_ENABLED=false

# Option B: Self-hosted vLLM or any OpenAI-compatible endpoint
VLLM_ENABLED=true
VLLM_URL=http://your-server:8000      # default: http://localhost:8000
VLLM_MODEL=Qwen/Qwen3-VL-32B-Instruct-FP8
VLM_ENABLED=false                      # disable Claude fallback (optional)

# Option C: Ollama (OpenAI-compatible mode)
VLLM_ENABLED=true
VLLM_URL=http://localhost:11434/v1
VLLM_MODEL=llava
VLM_ENABLED=false

# Option D: No VLM (motion + audio only)
VLLM_ENABLED=false
VLM_ENABLED=false

When both vLLM and Anthropic are configured, the pipeline tries vLLM first and falls back to Claude if it's unavailable.

Hardware Tiers

The pipeline auto-detects your hardware on make deploy:

Tier	Detection	What runs where	Expected speed
NVIDIA GPU	`nvidia-smi` + Docker toolkit	Everything in Docker with GPU passthrough	~60 min / match
Apple Silicon (M1-M4)	MPS detection	Redis in Docker, worker + API native with MPS acceleration	~60 min / match
CPU only	Fallback	Everything in Docker, CPU inference	~120 min / match

GPU acceleration affects YOLO object detection (Phase 1). The VLM runs on its own backend — your local GPU is not used for VLM inference unless you host it locally.

Storage Layout

Source directory (read-only)     Output directory (writable)
├── game1.mp4                    ├── {job-id}/
├── game2.mp4                    │   ├── goalkeeper_reel.mp4
└── ...                          │   └── highlights_reel.mp4
                                 └── ...

These can be local directories, NAS mounts (NFS/SMB), or any mounted filesystem. Set them via:

NAS_MOUNT_PATH=/path/to/your/videos      # required
NAS_OUTPUT_PATH=/path/to/your/output      # required

Quick Start

# 1. Deploy (auto-detects GPU, generates .env, starts stack)
make deploy

# 2. Open the web dashboard
open http://localhost:8080/ui

# 3. Or submit via CLI
python infra/scripts/pipeline_cli.py submit game.mp4 --reel keeper,highlights

# 4. Monitor progress
python infra/scripts/pipeline_cli.py status <job_id> --watch

The web dashboard at http://localhost:8080/ui lets you submit jobs, pick jerseys, set game start time, and download reels — no terminal needed after initial setup.

Architecture

Source video         Detection Pipeline                        Output
                     ┌─────────────────────────────────┐
  game.mp4  ──────►  │  Phase 1: Motion scan            │
                     │  (frame-differencing, ~300 spikes)│
                     │  ↓                                │
                     │  Phase 2: Audio boost             │
                     │  (whistles, crowd surges)         │
                     │  ↓                                │
                     │  Phase 3: VLM classification      │
                     │  (Qwen3-VL / Claude / Ollama)     │
                     │  ↓                                │  ──►  goalkeeper_reel.mp4
                     │  Phases 3a-3g: Structural         │  ──►  highlights_reel.mp4
                     │  inference (goals, saves, corners) │
                     │  ↓                                │
                     │  Segmentation + Assembly (FFmpeg)  │
                     └─────────────────────────────────┘

Detection Phases

Phase	Name	What it does
1	Motion scan	Dense frame-differencing to find activity spikes
2	Audio boost	Whistle + crowd surge detection, fills motion gaps
2b-2d	Match structure	Halftime detection, gap filling, spot-check probes
3	VLM classify	Single-pass classification of each candidate
3a.5	Save reclass	Goal kicks with save evidence in reasoning -> save
3a.6	Shot reclass	Rejected verdicts describing shots -> shot
3b	Goal inference	Kickoff rescan to upgrade shots to goals
3c	Set-piece inference	Corner/goal-kick rescan after shots
3c.5	Restart reclass	Shot + corner = parry, shot + quick goal kick = miss
3d	Corner scan	Independent corner detection in coverage gaps
3e	Reverse restart	Work backwards from restarts to find missed shots
3f	Shot scan	Binary re-probe of remaining rejected candidates
3g	Catch scan	Structural catch inference + VLM probe for GK holding ball

See docs/event-detection-reference.md for full details on each phase.

API

FastAPI app at http://localhost:8080:

Endpoint	Description
`POST /jobs`	Submit a job (idempotent via SHA-256)
`GET /jobs`	List all jobs
`GET /jobs/{id}/status`	Progress + status
`POST /jobs/{id}/pause`	Pause processing
`POST /jobs/{id}/cancel`	Cancel processing
`GET /events/{id}/list`	List detected events
`PUT /events/{id}/{event_id}`	Override an event classification
`POST /events/{id}/assemble`	Re-assemble reels after edits
`GET /reels/{id}/{type}`	Download a reel
`GET /ui`	Web dashboard
`GET /health`	Liveness check

Event Types

17 event types defined in src/detection/models.py. Each has per-type clip padding, confidence thresholds, and reel target mappings. Key types:

Type	Reel	Detection method
`goal`	both	VLM + kickoff inference
`shot_on_target`	both	VLM + shot reclassification
`shot_off_target`	highlights	Restart reclassification + reverse restart
`shot_stop_diving`	goalkeeper	VLM catch + parry inference (shot + corner)
`catch`	goalkeeper	Structural (no restart) + VLM probe
`corner_kick`	goalkeeper	VLM + independent corner scan
`goal_kick`	goalkeeper	VLM + set-piece inference

Development

make setup              # Install Python deps
make deploy             # Auto-detect hardware, generate .env, start stack
make up                 # Start stack (assumes .env exists)
make down               # Stop all services
make test-unit          # Fast tests, no infra needed (354 tests, <1s)
make test-integration   # Requires Docker
make generate-fixtures  # Create synthetic test videos
make check-gpu          # Verify NVIDIA GPU + container toolkit

Run a single test:

pytest tests/unit/test_clipper.py::test_merge_overlapping_clips -m unit -v

Key Design Decisions

Streaming-first: Video is never fully loaded into RAM — chunked FFmpeg processing throughout
GPU-optional: NVIDIA CUDA, Apple MPS, or CPU — auto-detected at deploy time
NAS-safe: Source files are never modified; output is atomic-write with retry
VLM-agnostic: Any OpenAI-compatible vision endpoint works; Claude API as fallback
Structural over visual: Post-VLM inference uses restart patterns (corner, goal kick, kickoff) rather than trying to visually identify save types at 50m distance

Project Structure

src/
  api/           FastAPI app, routes, Celery worker, web UI
  detection/     Motion scan, audio detection, VLM verifier, pipeline orchestrator
  ingestion/     File intake, SHA-256 hashing, job store, NAS watcher
  segmentation/  Clip boundary computation, padding, merging
  assembly/      FFmpeg clip extraction, reel composition
  tracking/      ByteTrack player tracking (used by spatial filter)
infra/
  scripts/       setup.sh, clean-resubmit.sh, pipeline_cli.py
  docker-compose*.yml
  Dockerfile.*
docs/            Architecture, event detection reference, runbooks
tests/           Unit (354), integration, E2E

Disclaimer

This software is intended for youth and amateur soccer analytics and personal experimentation. It is not affiliated with, endorsed by, or approved by any professional league, federation, or governing body.

This project does not grant rights to process or redistribute copyrighted video content. Users are responsible for compliance with applicable rights and league policies.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.github/workflows		.github/workflows
agents		agents
docs		docs
infra		infra
scripts		scripts
src		src
state		state
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
PIPELINE_STATUS.md		PIPELINE_STATUS.md
README.md		README.md
START_HERE.md		START_HERE.md
calibration.json		calibration.json
calibration_refined.json		calibration_refined.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup-team.sh		setup-team.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soccer Video Processing Pipeline

How It Works

Prerequisites

Required

Vision-Language Model (VLM) — Recommended

Hardware Tiers

Storage Layout

Quick Start

Architecture

Detection Phases

API

Event Types

Development

Key Design Decisions

Project Structure

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Soccer Video Processing Pipeline

How It Works

Prerequisites

Required

Vision-Language Model (VLM) — Recommended

Hardware Tiers

Storage Layout

Quick Start

Architecture

Detection Phases

API

Event Types

Development

Key Design Decisions

Project Structure

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages