🎯 Real-Time Object Detection & Tracking

YOLOv8 + Deep SORT — A Production-Grade Multi-Object Tracking System

Detect objects in real-time video streams and track them with persistent IDs — built for speed, designed for clarity.

🎬 Demo

Real-time multi-object tracking on a live webcam feed — each object is detected by YOLOv8, assigned a persistent ID by Deep SORT, and annotated with a unique color-coded bounding box.

⚡ TL;DR: This project combines YOLOv8's blazing-fast object detection with Deep SORT's identity-preserving tracker to deliver frame-by-frame detections with cross-frame object continuity — the foundation of any serious computer vision pipeline.

📖 About The Project

This project is a complete, production-ready real-time object detection and multi-object tracking system developed as Task 4 of the CodeAlpha Computer Vision Internship. It processes live video streams — whether from a webcam or a video file — and outputs annotated frames where every detected object is bounded, classified, and tracked with a unique, persistent ID.

Object detection alone tells you what is in a frame. Tracking tells you who is where across time. Bridging the two is what transforms a simple detector into a system capable of powering surveillance and security monitoring, autonomous vehicle perception, crowd analytics, sports player tracking, and smart retail insights. This project demonstrates that bridge — cleanly architected, thoroughly documented, and optimized for real-time performance.

🏗️ Technical Architecture

 ┌──────────────────┐      ┌──────────────────────┐      ┌───────────────────────┐      ┌──────────────────────┐
 │                  │      │                      │      │                       │      │                      │
 │   Video Source   │─────▶│    YOLOv8 Detector   │─────▶│    Deep SORT Tracker  │─────▶│   Annotated Output   │
 │  (Webcam / MP4)  │      │   (Per-Frame Boxes)  │      │   (Persistent IDs)    │      │  (Boxes + IDs + FPS) │
 │                  │      │                      │      │                       │      │                      │
 └──────────────────┘      └──────────────────────┘      └───────────────────────┘      └──────────────────────┘
         │                          │                             │                            │
         │                          │                             │                            │
    cv2.VideoCapture         Bounding Boxes (xywh)      Kalman Prediction             cv2.rectangle()
    Frame-by-frame           Confidence Scores           Hungarian Assignment          cv2.putText()
    BGR image                Class IDs                   Appearance Embeddings         Color-coded IDs

Pipeline Breakdown

Video Source — OpenCV captures frames from a webcam or a video file and feeds them into the pipeline one at a time.
YOLOv8 Detection — Each frame is independently processed by the YOLOv8 Nano model, which outputs bounding boxes, confidence scores, and class labels for all 80 COCO categories. Detections below the confidence threshold are discarded immediately.
Deep SORT Tracking — The filtered detections are passed to Deep SORT, which matches them to existing tracks using a combination of Kalman Filter motion prediction and cosine-distance appearance embeddings. New objects receive tentative tracks; confirmed tracks retain their IDs across frames.
Annotated Output — Confirmed tracks are rendered with color-coded bounding boxes, class names, tracking IDs, and confidence percentages. A smoothed FPS counter is overlaid in the top-left corner.

✨ Key Features

🚀 OOP Architecture — The entire pipeline is encapsulated in a single ObjectTracker class with clean method separation (process_frame, draw_annotations, run), making it trivial to extend or integrate into larger systems.
⚡ Real-Time FPS Counter — An exponential moving average FPS readout ensures a stable, jitter-free performance metric displayed on every frame.
🎯 Deep SORT Re-Identification — A MobileNet-based appearance embedder allows the tracker to re-associate objects after brief occlusions, surviving up to 30 missed frames (max_age=30) before retiring an ID.
🎨 Color-Coded Tracking IDs — Each unique track ID maps to a deterministic color from a curated 20-color palette, providing instant visual continuity across frames.
🛡️ Robust Error Handling — Missing video files, unavailable cameras, and per-frame processing failures are all caught and reported gracefully — the system never crashes silently.
📐 Coordinate Clamping — All bounding box coordinates are clamped to frame boundaries before rendering, eliminating OpenCV drawing errors on edge cases.
🔧 Flexible Input — Supports both live webcam feeds and offline video files via a simple argparse CLI, with automatic source-type detection.
🧠 Configurable Confidence — Detection threshold is adjustable at runtime, letting you trade precision for recall depending on your use case.

🧠 Deep Dive: Detection vs. Tracking

Understanding the distinction between detection and tracking is fundamental to appreciating this system's design.

YOLOv8 — Frame-by-Frame Detection

YOLOv8 is a single-frame detector. It processes each image independently: it slices the input into a grid, predicts bounding boxes and class probabilities at each grid cell, and applies non-maximum suppression to remove duplicates. The result is a set of anonymous detections — bounding boxes with class labels and confidence scores, but no memory between frames. A person detected at position (100, 200) in frame N and at (105, 202) in frame N+1 are, to YOLO, two completely unrelated observations. YOLO answers the question: "What objects exist in this frame, and where?"

Deep SORT — Identity-Preserving Tracking

Deep SORT transforms anonymous detections into persistent identities by solving the data association problem across time. It operates through three tightly integrated mechanisms:

Component	Role	How It Works
Kalman Filter	Motion prediction	Models each track's position and velocity as a linear dynamical system. Before each frame, it predicts where every existing track should appear, then updates its state once a measurement is matched. This allows the tracker to maintain continuity through brief occlusions where no detection is available.
Hungarian Algorithm	Optimal assignment	Computes a cost matrix between all predicted track positions and all new detections, then solves for the globally optimal one-to-one matching. The cost is a weighted combination of Mahalanobis distance (motion consistency) and cosine distance (appearance similarity). Unmatched detections spawn new tentative tracks; unmatched tracks age toward deletion.
Appearance Embeddings	Re-identification	A lightweight MobileNet CNN extracts a compact feature vector from each detection's image patch. These embeddings are stored in a gallery per track and compared against new detections using cosine similarity. This is what allows Deep SORT to distinguish two visually similar objects moving close together — the critical advantage over pure motion-based trackers.

The result: Once a track is confirmed (after surviving n_init=3 consecutive frames), it receives a unique integer ID that persists across the entire video — through partial occlusions, motion blur, and temporary detection dropouts — as long as it reappears within the max_age=30 frame window. This is what transforms a detector into a tracker, and what powers applications that require counting distinct objects or following individual trajectories over time.

🚀 Getting Started

Prerequisites

Python 3.9+ — Download
pip — Comes bundled with Python
A webcam (for live demo) or an MP4/AVI video file

Installation

# 1. Clone the repository
git clone https://github.com/<your-username>/realtime-object-tracking.git
cd realtime-object-tracking

# 2. Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate          # Linux / macOS
# venv\Scripts\activate           # Windows

# 3. Install dependencies
pip install -r requirements.txt

Note: On first run, YOLOv8 will automatically download the yolov8n.pt weights file (~6 MB) and Deep SORT will download its MobileNet embedder. An internet connection is required for the initial setup only.

🎮 Usage

Run with a Webcam

python object_tracker.py --source 0 --confidence 0.5

Run with a Video File

python object_tracker.py --source ./videos/sample.mp4 --confidence 0.4

All CLI Options

Flag	Default	Description
`--source`	`0`	Video source — `0` for default webcam, or a file path to an MP4/AVI video
`--confidence`	`0.5`	Minimum detection confidence threshold (0.0–1.0)
`--model`	`yolov8n.pt`	YOLOv8 weights file — `yolov8n` (fastest), `yolov8s`, `yolov8m`, `yolov8l`

Switching Models

# Balanced speed/accuracy
python object_tracker.py --source 0 --model yolov8s.pt

# Higher accuracy (requires stronger GPU)
python object_tracker.py --source 0 --model yolov8m.pt

Tip: Press q at any time to cleanly exit the tracking loop and release all resources.

📂 Project Structure

realtime-object-tracking/
│
├── object_tracker.py        # Main script — ObjectTracker class + CLI entry point
├── requirements.txt         # Python dependencies
├── README.md                # You are here
├── LICENSE                  # MIT License
│
└── assets/
    └── demo.gif             # Placeholder for demo animation

👨‍💻 Author

Meraj Basiri Mani Alagheband

🎓 Computer Vision Intern @ CodeAlpha

🔬 Passionate about building intelligent systems that perceive, reason, and act in the real world.

"The intersection of perception and intelligence is where the future is being built — one frame at a time."

🙏 Acknowledgements

CodeAlpha — For the internship opportunity and structured learning path that made this project possible.
Ultralytics — For the state-of-the-art YOLOv8 framework that democratizes real-time object detection.
Deep SORT Realtime — For the seamless Python integration of the Deep SORT tracking algorithm.

Built with ❤️ and curiosity by Meraj Basiri and Mani Alagheband

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Real-Time Object Detection & Tracking

YOLOv8 + Deep SORT — A Production-Grade Multi-Object Tracking System

🎬 Demo

📖 About The Project

🏗️ Technical Architecture

Pipeline Breakdown

✨ Key Features

🧠 Deep Dive: Detection vs. Tracking

YOLOv8 — Frame-by-Frame Detection

Deep SORT — Identity-Preserving Tracking

🚀 Getting Started

Prerequisites

Installation

🎮 Usage

Run with a Webcam

Run with a Video File

All CLI Options

Switching Models

📂 Project Structure

👨‍💻 Author

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
venv		venv
.gitignore		.gitignore
README.md		README.md
object_tracker.py		object_tracker.py
requirements.txt		requirements.txt
yolov8n.pt		yolov8n.pt

Folders and files

Latest commit

History

Repository files navigation

🎯 Real-Time Object Detection & Tracking

YOLOv8 + Deep SORT — A Production-Grade Multi-Object Tracking System

🎬 Demo

📖 About The Project

🏗️ Technical Architecture

Pipeline Breakdown

✨ Key Features

🧠 Deep Dive: Detection vs. Tracking

YOLOv8 — Frame-by-Frame Detection

Deep SORT — Identity-Preserving Tracking

🚀 Getting Started

Prerequisites

Installation

🎮 Usage

Run with a Webcam

Run with a Video File

All CLI Options

Switching Models

📂 Project Structure

👨‍💻 Author

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages