Skip to content

hassan-hfk/SAM2-Video-Tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAM2 Universal Video Tracker

A desktop application for tracking any object in video using SAM 2 (Segment Anything Model 2), with two input modes:

  1. Box prompt — draw a bounding box on any frame; SAM2 propagates the mask through the entire video.
  2. Text prompt — type free-form class names ("helmet, vest, person"); GroundingDINO detects boxes automatically, which are passed to SAM2 for pixel-accurate tracking.

Features

  • Box-prompt and text-prompt tracking on video
  • Multi-object tracking with distinct colors per object
  • Add / remove / refine prompts on any frame — not just frame 0
  • Live preview with play / pause / seek / frame-step
  • Export options: annotated video (MP4), mask video, per-frame masks (PNG), COCO-style JSON, CSV of bounding boxes
  • Thread-safe — all heavy inference runs off the UI thread
  • Persistent settings (last-used paths, model checkpoints, device)

Why This Design

SAM 2 is promptable by points, boxes, and masks — it does not accept text natively. The open-vocabulary text entry point is added via GroundingDINO (the standard "Grounded SAM 2" pattern). This repo keeps the two models decoupled so either side can be swapped independently.


Stack

  • UI: PyQt6
  • Tracking: SAM2 (Meta) + GroundingDINO
  • Video I/O: decord (read), imageio-ffmpeg (write), opencv-python (draw)
  • Runtime: PyTorch ≥ 2.3 — CUDA strongly recommended, CPU works

Install

1. Clone and create environment

git clone https://github.com/hassan-hfk/sam2-video-tracker.git
cd sam2-video-tracker
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

2. Install dependencies

pip install -r requirements.txt

Note: ffmpeg must be on your system PATH for video export.

3. Download SAM2 checkpoint

Choose one and place it anywhere on your system — you'll set the path in Settings:

# Small — fast, 46M params (recommended for most use cases)
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt

# Base Plus — balanced
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt

# Large — best quality
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt

Or download from the official repo: https://github.com/facebookresearch/sam2

4. Download GroundingDINO checkpoint (text-prompt mode only)

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Run

python -m app.main

On first launch, go to Settings to set your SAM2 checkpoint path and (optionally) GroundingDINO checkpoint path.


Project Structure

sam2_tracker/
├── app/              ← Entry point + application shell
├── core/             ← Model wrappers, tracker orchestrator, export
├── ui/               ← PyQt6 widgets
├── utils/            ← Video I/O, drawing, config
└── assets/           ← Icons, QSS stylesheet

Hardware Requirements

Mode Min GPU VRAM Notes
SAM2 Small (box prompt) 4 GB CPU also works, slower
SAM2 Large (box prompt) 8 GB Best mask quality
SAM2 + GroundingDINO (text prompt) 8 GB Both models loaded simultaneously

CUDA 11.8+ recommended. CPU inference works but is significantly slower for long videos.


License

MIT — free to use, modify, and distribute.

About

Desktop app for video object tracking using SAM2 + GroundingDINO : box and text prompts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages