A desktop application for tracking any object in video using SAM 2 (Segment Anything Model 2), with two input modes:
- Box prompt — draw a bounding box on any frame; SAM2 propagates the mask through the entire video.
- Text prompt — type free-form class names (
"helmet, vest, person"); GroundingDINO detects boxes automatically, which are passed to SAM2 for pixel-accurate tracking.
- Box-prompt and text-prompt tracking on video
- Multi-object tracking with distinct colors per object
- Add / remove / refine prompts on any frame — not just frame 0
- Live preview with play / pause / seek / frame-step
- Export options: annotated video (MP4), mask video, per-frame masks (PNG), COCO-style JSON, CSV of bounding boxes
- Thread-safe — all heavy inference runs off the UI thread
- Persistent settings (last-used paths, model checkpoints, device)
SAM 2 is promptable by points, boxes, and masks — it does not accept text natively. The open-vocabulary text entry point is added via GroundingDINO (the standard "Grounded SAM 2" pattern). This repo keeps the two models decoupled so either side can be swapped independently.
- UI: PyQt6
- Tracking: SAM2 (Meta) + GroundingDINO
- Video I/O:
decord(read),imageio-ffmpeg(write),opencv-python(draw) - Runtime: PyTorch ≥ 2.3 — CUDA strongly recommended, CPU works
git clone https://github.com/hassan-hfk/sam2-video-tracker.git
cd sam2-video-tracker
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtNote:
ffmpegmust be on your system PATH for video export.
- Windows: https://ffmpeg.org/download.html
- Linux:
sudo apt install ffmpeg- Mac:
brew install ffmpeg
Choose one and place it anywhere on your system — you'll set the path in Settings:
# Small — fast, 46M params (recommended for most use cases)
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt
# Base Plus — balanced
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt
# Large — best quality
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.ptOr download from the official repo: https://github.com/facebookresearch/sam2
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pthpython -m app.mainOn first launch, go to Settings to set your SAM2 checkpoint path and (optionally) GroundingDINO checkpoint path.
sam2_tracker/
├── app/ ← Entry point + application shell
├── core/ ← Model wrappers, tracker orchestrator, export
├── ui/ ← PyQt6 widgets
├── utils/ ← Video I/O, drawing, config
└── assets/ ← Icons, QSS stylesheet
| Mode | Min GPU VRAM | Notes |
|---|---|---|
| SAM2 Small (box prompt) | 4 GB | CPU also works, slower |
| SAM2 Large (box prompt) | 8 GB | Best mask quality |
| SAM2 + GroundingDINO (text prompt) | 8 GB | Both models loaded simultaneously |
CUDA 11.8+ recommended. CPU inference works but is significantly slower for long videos.
MIT — free to use, modify, and distribute.