Skip to content

AbhishekGitBot/Crowd-Stampede-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

CrowdStampedeAnalysis

An end-to-end real-time computer vision pipeline that fuses object detection, multi-object tracking, and density regression to quantify crowd risk — before a stampede happens.


Project Overview

The Problem: Crowd disasters are among the most preventable yet deadliest public safety failures. Events like the 2010 Love Parade disaster and the 2022 Itaewon crush claim lives not from sudden chaos, but from the gradual, invisible accumulation of dangerous density and directional flow — conditions that are entirely detectable with the right instrumentation.

Traditional crowd management relies on human observation: slow, subjective, and unscalable. A security operator watching 40 feeds cannot simultaneously estimate local densities, compute velocity vectors, and generate risk indices frame by frame.

The Solution: This project builds a production-grade computer vision system that autonomously processes video footage — from CCTV streams to event recordings — and outputs per-frame density maps, motion analytics, trajectory histories, and calibrated stampede risk scores. The system is validated on benchmark crowd datasets (UMN, UCSD) and is designed for extension to live RTSP streams and multi-camera venue deployments.

This serves as a technical proof-of-work demonstrating mastery of the full CV stack: detection, tracking, counting regression, motion analysis, and evaluation pipelines.


Key Features

  • Automated Data Pipeline — Ingests raw video files, extracts per-frame metadata (resolution, FPS, duration), and organizes outputs into structured directories for fully reproducible experimentation.
  • YOLOv8 Person Detection — Real-time bounding box inference using ultralytics, filtered to the person class with configurable confidence thresholds. Outputs annotated frames and per-frame detection CSVs.
  • DeepSORT Multi-Object Tracking — Kalman Filter state estimation + Re-Identification appearance descriptor matching for stable track ID assignment across frames. Computes per-track displacement, velocity, and direction vectors over rolling temporal windows.
  • Density Regression Fusion — Combines raw detection counts with pre-trained density regression models (CSRNet / MCNN) and metadata-driven occlusion scaling factors to produce robust crowd count estimates that significantly outperform detection-only baselines.
  • Stampede Risk Index (SRI) — A composite, calibrated risk score aggregating local density, average flow speed, directional entropy, and crowd pressure into a single [0, 1] signal with four alert levels (LOW / MODERATE / HIGH / CRITICAL).
  • Kernel Density Heatmaps — Spatial pressure maps overlaid on source frames using KDE over detected centroids, enabling real-time visualization of dangerous accumulation zones.
  • Anomaly Detection — Identifies directional reversals, velocity spikes, and density surges that are statistically anomalous relative to rolling baselines — the hallmark precursors of crowd crushes.
  • Comprehensive Evaluation — Reports MAE, RMSE, and per-subset accuracy metrics with rich seaborn visualizations comparing detection-only vs. regression-fused count estimates against ground truth.

Tech Stack

Category Tools & Libraries
Language Python 3.12
Core CV OpenCV (cv2), Pillow
Detection Ultralytics YOLOv8 (ultralytics)
Tracking DeepSORT (deep_sort_realtime), Supervision (supervision)
Kalman Filtering FilterPy (DeepSORT backend)
Density Models CSRNet, MCNN (pretrained PyTorch weights)
Deep Learning PyTorch, TorchVision
Data Handling pandas, numpy, pathlib, tqdm
Visualization matplotlib, seaborn
Datasets UMN Crowd Dataset, UCSD Pedestrian Dataset
Environment Google Colab / Jupyter Notebook

Technical Architecture

The system executes as a 5-stage sequential pipeline, where each stage enriches the data representation before passing it downstream:

Raw Video Files (UMN / UCSD / Custom CCTV)
    │
    ▼
[Stage 1] DATA PREPARATION
    Video metadata extraction (FPS, resolution, duration)
    Frame extraction at configurable sample rate
    Directory scaffolding for downstream outputs
    Output → frames/, metadata.csv
    │
    ▼
[Stage 2] DETECTION
    YOLOv8 inference — person class only (class_id = 0)
    Confidence thresholding (default τ = 0.4)
    Per-frame bounding box export
    Output → detections.csv, annotated frames
    │
    ▼
[Stage 3] TRACKING + MOTION ANALYSIS
    DeepSORT: Kalman Filter state prediction
              + Hungarian algorithm track association
              + Re-ID cosine distance descriptor matching
    Per-track velocity vector computation
    Directional entropy calculation per frame
    Output → tracks.csv, velocity_vectors.csv
    │
    ├──────────────────────────────────────┐
    ▼                                      ▼
[Stage 4a] DENSITY FUSION             [Stage 4b] MOTION ANALYTICS
    CSRNet / MCNN inference               Average speed per frame
    Occlusion-aware metadata scaling      Directional variance (entropy)
    Fusion with detection count           Crowd pressure zone mapping
    Output → density_maps/               Output → motion_report.csv
    │                                      │
    └──────────────────────────────────────┘
                       │
                       ▼
            [Stage 5] RISK ENGINE + EVALUATION
                Stampede Risk Index (SRI) computation
                Alert threshold classification (4 levels)
                MAE / RMSE evaluation vs. ground truth
                Heatmap overlays + risk timeline plots
                Output → risk_report.json, eval_plots/

Key Algorithms & Mathematical Concepts

YOLOv8 Detection

YOLOv8 uses an anchor-free detection head that predicts bounding box regression offsets (cx, cy, w, h) and a class probability vector per grid cell. Only predictions satisfying both filters are retained:

filtered_boxes = [(box, score) for box, score, cls in predictions
                  if cls == 0 and score ≥ τ]

Bounding box centroids from retained detections seed the DeepSORT tracker and the KDE heatmap generator.

Kalman Filter State Estimation (DeepSORT)

Each tracked person is modeled as an 8-dimensional state vector:

x = [cx, cy, w, h, ċx, ċy, ẇ, ḣ]

where (cx, cy, w, h) is the bounding box center and dimensions, and the dotted terms are their first-order temporal derivatives. The Kalman prediction and update steps are:

Prediction:   x̂_k = F · x_{k-1}          (state extrapolation)
              P_k  = F · P_{k-1} · F^T + Q  (covariance propagation)

Update:       K    = P_k · H^T · (H·P_k·H^T + R)^{-1}   (Kalman gain)
              x_k  = x̂_k + K · (z_k - H·x̂_k)            (state correction)

Track-detection assignment uses the Hungarian algorithm on a cost matrix combining IoU bounding box distance and Re-ID appearance descriptor cosine distance.

Directional Entropy (Anomaly Detection)

To detect anomalous flow — a key stampede precursor — the system computes Shannon entropy over the velocity angle distribution of all active tracks per frame:

H(θ) = -Σ p(θᵢ) · log₂ p(θᵢ)     for θ ∈ [0°, 360°) discretized into 8 bins

Low entropy → crowd is moving coherently in a shared direction (normal). High entropy → disordered, conflicting motion (potential panic state). Sudden entropy spikes are flagged as anomaly events.

Density Regression Fusion

Raw detection count N_det systematically undercounts occluded individuals. The system fuses it with the CSRNet density map integral N_reg using an occlusion-derived scale factor α:

N_fused = α · N_reg + (1 - α) · N_det

where:  α = σ(occlusion_ratio)          (sigmoid-scaled weight)
        occlusion_ratio = 1 - (visible_area / total_bbox_area)

This produces occlusion-robust estimates validated against ground-truth dot annotations on the UCSD dataset.

Composite Stampede Risk Index (SRI)

The SRI aggregates four independently normalized signals into a single calibrated risk score:

SRI = w₁·ρ̄ + w₂·v̄ + w₃·H(θ) + w₄·P

where:  ρ̄  = normalized crowd density       w₁ = 0.35
        v̄  = normalized average speed        w₂ = 0.25
        H(θ) = directional entropy           w₃ = 0.20
        P  = normalized pressure index       w₄ = 0.20

Alert classification: SRI < 0.3 → LOW | 0.3–0.6 → MODERATE | 0.6–0.85 → HIGH | > 0.85 → CRITICAL

Weights were empirically calibrated on incident sequences from the UMN dataset.


⚙️ Installation & Usage

1. Clone the Repository

git clone https://github.com/AbhishekGitBot/Crowd-Stampede-Analysis.git
cd CrowdStampedeAnalysis

2. Install Dependencies

pip install ultralytics opencv-python pandas matplotlib seaborn tqdm \
            deep_sort_realtime supervision filterpy torch torchvision

GPU Note: A CUDA-enabled GPU is strongly recommended for real-time throughput. CPU execution is supported but will be significantly slower for the density regression models.

3. Dataset Setup

# Create the expected directory structure
mkdir -p datasets/videos

# Copy your video files into place
cp /path/to/your/videos/*.avi datasets/videos/

# Recommended benchmark datasets:
# UMN:  http://mha.cs.umn.edu/proj_events.shtml#crowd
# UCSD: http://www.svcl.ucsd.edu/projects/peoplecnt/

4. Run the Full Pipeline

# Stage 1: Extract frames and metadata
jupyter nbconvert --to notebook --execute Step1_DataPreparation.ipynb

# Stage 2: YOLOv8 person detection
python step2_detection.py --video_dir datasets/videos/ --confidence 0.4

# Stage 3: DeepSORT tracking + motion analysis
python step3_tracking.py --detections_dir outputs/detections/

# Stage 4: Density fusion + risk scoring + heatmaps
python step4_risk_engine.py \
    --tracks_dir outputs/tracks/ \
    --output_dir outputs/reports/

# Stage 5: Evaluation dashboard
jupyter notebook Step5_Evaluation.ipynb

5. Output Artifacts

outputs/
├── detections/           → Per-frame bounding box CSVs + annotated frames
├── tracks/               → Track histories with velocity vectors
├── density_maps/         → Per-frame KDE heatmap images
├── reports/
│   ├── risk_report.json  → Per-frame SRI scores + alert classifications
│   └── eval_plots/       → MAE/RMSE charts, SRI timeline, density comparisons
└── final_metrics.csv     → Summary evaluation table

Project Structure

CrowdStampedeAnalysis/
│
├── Step1_DataPreparation.ipynb    # Video ingestion, frame extraction, metadata
├── step2_detection.py             # YOLOv8 inference pipeline
├── step3_tracking.py              # DeepSORT tracking + motion vectors
├── step4_risk_engine.py           # Density fusion, SRI computation, heatmaps
├── Step5_Evaluation.ipynb         # Metrics, visualizations, final reporting
│
├── models/
│   ├── csrnet_weights.pth         # Pre-trained CSRNet density regressor
│   └── mcnn_weights.pth           # Pre-trained MCNN crowd counter
│
├── datasets/
│   └── videos/                    # Input video files (user-provided)
│
├── outputs/                       # Auto-generated during pipeline execution
│
├── requirements.txt
└── README.md

Benchmark Results

Evaluated on the UMN Crowd Dataset (3 scenes, 7,740 frames):

Metric Detection-Only Baseline Fused (Detection + CSRNet)
MAE ↓ 8.3 4.1
RMSE ↓ 12.7 6.8
SRI Alert Precision 91.4%
SRI Alert Recall 88.7%

Density fusion reduces MAE by ~50% vs. raw detection counts, validating the occlusion-handling design. The SRI achieves >90% precision in flagging incident frames on held-out UMN sequences.


Future Roadmap

1. Transformer-Based Density Estimation (P2PNet / DM-Count) Replace CSRNet/MCNN with point-supervised transformer architectures like P2PNet or the density-map-free DM-Count model. These approaches eliminate the need for density map ground truth annotation and achieve state-of-the-art MAE on ShanghaiTech-A/B and UCF-QNRF benchmarks. This would substantially improve accuracy in ultra-high density scenarios (>500 persons/frame) where regression models currently degrade.

2. Real-Time RTSP Streaming Support Extend the pipeline to ingest live RTSP/RTMP streams via FFmpeg integration, enabling deployment on actual venue CCTV infrastructure. This requires a frame-buffer queue architecture and sub-100ms end-to-end latency optimization via TensorRT INT8 quantization of the YOLOv8 and density regressor models, targeting deployment on NVIDIA Jetson edge devices.

3. Multi-Camera Fusion & 3D Crowd Reconstruction For venues with overlapping camera fields of view, implement homography-based view registration to eliminate cross-camera duplicate detections and produce a unified top-down occupancy grid. Pair with monocular depth estimation (ZoeDepth / Depth-Anything) for 3D pressure field reconstruction, enabling more physically grounded density estimation in architecturally complex venues such as corridors and staircases.


License

This project is released under the MIT License. See LICENSE for details.


Author & Contact

Abhishek Sharma AI Research Engineer — Computer Vision & Agentic Systems


About

Crowd Stampede Analysis is a comprehensive computer vision pipeline for real-time crowd density estimation and stampede risk detection. It integrates YOLOv8 detection, DeepSORT tracking, motion analytics, and regression-based counting (CSRNet/MCNN) to process videos and generate density heatmaps, risk alerts, and performance reports.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors