Skip to content

qatre-ai/CodeAlpha_Object_Detection_and_Tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฏ Real-Time Object Detection & Tracking

YOLOv8 + Deep SORT โ€” A Production-Grade Multi-Object Tracking System

Detect objects in real-time video streams and track them with persistent IDs โ€” built for speed, designed for clarity.

Python OpenCV YOLOv8 Deep SORT License LinkedIn


๐ŸŽฌ Demo

Demo

Real-time multi-object tracking on a live webcam feed โ€” each object is detected by YOLOv8, assigned a persistent ID by Deep SORT, and annotated with a unique color-coded bounding box.

โšก TL;DR: This project combines YOLOv8's blazing-fast object detection with Deep SORT's identity-preserving tracker to deliver frame-by-frame detections with cross-frame object continuity โ€” the foundation of any serious computer vision pipeline.


๐Ÿ“– About The Project

This project is a complete, production-ready real-time object detection and multi-object tracking system developed as Task 4 of the CodeAlpha Computer Vision Internship. It processes live video streams โ€” whether from a webcam or a video file โ€” and outputs annotated frames where every detected object is bounded, classified, and tracked with a unique, persistent ID.

Object detection alone tells you what is in a frame. Tracking tells you who is where across time. Bridging the two is what transforms a simple detector into a system capable of powering surveillance and security monitoring, autonomous vehicle perception, crowd analytics, sports player tracking, and smart retail insights. This project demonstrates that bridge โ€” cleanly architected, thoroughly documented, and optimized for real-time performance.


๐Ÿ—๏ธ Technical Architecture

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚                  โ”‚      โ”‚                      โ”‚      โ”‚                       โ”‚      โ”‚                      โ”‚
 โ”‚   Video Source   โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚    YOLOv8 Detector   โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚    Deep SORT Tracker  โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚   Annotated Output   โ”‚
 โ”‚  (Webcam / MP4)  โ”‚      โ”‚   (Per-Frame Boxes)  โ”‚      โ”‚   (Persistent IDs)    โ”‚      โ”‚  (Boxes + IDs + FPS) โ”‚
 โ”‚                  โ”‚      โ”‚                      โ”‚      โ”‚                       โ”‚      โ”‚                      โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                          โ”‚                             โ”‚                            โ”‚
         โ”‚                          โ”‚                             โ”‚                            โ”‚
    cv2.VideoCapture         Bounding Boxes (xywh)      Kalman Prediction             cv2.rectangle()
    Frame-by-frame           Confidence Scores           Hungarian Assignment          cv2.putText()
    BGR image                Class IDs                   Appearance Embeddings         Color-coded IDs

Pipeline Breakdown

  1. Video Source โ€” OpenCV captures frames from a webcam or a video file and feeds them into the pipeline one at a time.
  2. YOLOv8 Detection โ€” Each frame is independently processed by the YOLOv8 Nano model, which outputs bounding boxes, confidence scores, and class labels for all 80 COCO categories. Detections below the confidence threshold are discarded immediately.
  3. Deep SORT Tracking โ€” The filtered detections are passed to Deep SORT, which matches them to existing tracks using a combination of Kalman Filter motion prediction and cosine-distance appearance embeddings. New objects receive tentative tracks; confirmed tracks retain their IDs across frames.
  4. Annotated Output โ€” Confirmed tracks are rendered with color-coded bounding boxes, class names, tracking IDs, and confidence percentages. A smoothed FPS counter is overlaid in the top-left corner.

โœจ Key Features

  • ๐Ÿš€ OOP Architecture โ€” The entire pipeline is encapsulated in a single ObjectTracker class with clean method separation (process_frame, draw_annotations, run), making it trivial to extend or integrate into larger systems.
  • โšก Real-Time FPS Counter โ€” An exponential moving average FPS readout ensures a stable, jitter-free performance metric displayed on every frame.
  • ๐ŸŽฏ Deep SORT Re-Identification โ€” A MobileNet-based appearance embedder allows the tracker to re-associate objects after brief occlusions, surviving up to 30 missed frames (max_age=30) before retiring an ID.
  • ๐ŸŽจ Color-Coded Tracking IDs โ€” Each unique track ID maps to a deterministic color from a curated 20-color palette, providing instant visual continuity across frames.
  • ๐Ÿ›ก๏ธ Robust Error Handling โ€” Missing video files, unavailable cameras, and per-frame processing failures are all caught and reported gracefully โ€” the system never crashes silently.
  • ๐Ÿ“ Coordinate Clamping โ€” All bounding box coordinates are clamped to frame boundaries before rendering, eliminating OpenCV drawing errors on edge cases.
  • ๐Ÿ”ง Flexible Input โ€” Supports both live webcam feeds and offline video files via a simple argparse CLI, with automatic source-type detection.
  • ๐Ÿง  Configurable Confidence โ€” Detection threshold is adjustable at runtime, letting you trade precision for recall depending on your use case.

๐Ÿง  Deep Dive: Detection vs. Tracking

Understanding the distinction between detection and tracking is fundamental to appreciating this system's design.

YOLOv8 โ€” Frame-by-Frame Detection

YOLOv8 is a single-frame detector. It processes each image independently: it slices the input into a grid, predicts bounding boxes and class probabilities at each grid cell, and applies non-maximum suppression to remove duplicates. The result is a set of anonymous detections โ€” bounding boxes with class labels and confidence scores, but no memory between frames. A person detected at position (100, 200) in frame N and at (105, 202) in frame N+1 are, to YOLO, two completely unrelated observations. YOLO answers the question: "What objects exist in this frame, and where?"

Deep SORT โ€” Identity-Preserving Tracking

Deep SORT transforms anonymous detections into persistent identities by solving the data association problem across time. It operates through three tightly integrated mechanisms:

Component Role How It Works
Kalman Filter Motion prediction Models each track's position and velocity as a linear dynamical system. Before each frame, it predicts where every existing track should appear, then updates its state once a measurement is matched. This allows the tracker to maintain continuity through brief occlusions where no detection is available.
Hungarian Algorithm Optimal assignment Computes a cost matrix between all predicted track positions and all new detections, then solves for the globally optimal one-to-one matching. The cost is a weighted combination of Mahalanobis distance (motion consistency) and cosine distance (appearance similarity). Unmatched detections spawn new tentative tracks; unmatched tracks age toward deletion.
Appearance Embeddings Re-identification A lightweight MobileNet CNN extracts a compact feature vector from each detection's image patch. These embeddings are stored in a gallery per track and compared against new detections using cosine similarity. This is what allows Deep SORT to distinguish two visually similar objects moving close together โ€” the critical advantage over pure motion-based trackers.

The result: Once a track is confirmed (after surviving n_init=3 consecutive frames), it receives a unique integer ID that persists across the entire video โ€” through partial occlusions, motion blur, and temporary detection dropouts โ€” as long as it reappears within the max_age=30 frame window. This is what transforms a detector into a tracker, and what powers applications that require counting distinct objects or following individual trajectories over time.


๐Ÿš€ Getting Started

Prerequisites

  • Python 3.9+ โ€” Download
  • pip โ€” Comes bundled with Python
  • A webcam (for live demo) or an MP4/AVI video file

Installation

# 1. Clone the repository
git clone https://github.com/<your-username>/realtime-object-tracking.git
cd realtime-object-tracking

# 2. Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate          # Linux / macOS
# venv\Scripts\activate           # Windows

# 3. Install dependencies
pip install -r requirements.txt

Note: On first run, YOLOv8 will automatically download the yolov8n.pt weights file (~6 MB) and Deep SORT will download its MobileNet embedder. An internet connection is required for the initial setup only.


๐ŸŽฎ Usage

Run with a Webcam

python object_tracker.py --source 0 --confidence 0.5

Run with a Video File

python object_tracker.py --source ./videos/sample.mp4 --confidence 0.4

All CLI Options

Flag Default Description
--source 0 Video source โ€” 0 for default webcam, or a file path to an MP4/AVI video
--confidence 0.5 Minimum detection confidence threshold (0.0โ€“1.0)
--model yolov8n.pt YOLOv8 weights file โ€” yolov8n (fastest), yolov8s, yolov8m, yolov8l

Switching Models

# Balanced speed/accuracy
python object_tracker.py --source 0 --model yolov8s.pt

# Higher accuracy (requires stronger GPU)
python object_tracker.py --source 0 --model yolov8m.pt

Tip: Press q at any time to cleanly exit the tracking loop and release all resources.


๐Ÿ“‚ Project Structure

realtime-object-tracking/
โ”‚
โ”œโ”€โ”€ object_tracker.py        # Main script โ€” ObjectTracker class + CLI entry point
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”œโ”€โ”€ README.md                # You are here
โ”œโ”€โ”€ LICENSE                  # MIT License
โ”‚
โ””โ”€โ”€ assets/
    โ””โ”€โ”€ demo.gif             # Placeholder for demo animation

๐Ÿ‘จโ€๐Ÿ’ป Author

Meraj Basiri Mani Alagheband

๐ŸŽ“ Computer Vision Intern @ CodeAlpha

๐Ÿ”ฌ Passionate about building intelligent systems that perceive, reason, and act in the real world.

"The intersection of perception and intelligence is where the future is being built โ€” one frame at a time."

LinkedIn LinkedIn GitHub GitHub Email


๐Ÿ™ Acknowledgements

  • CodeAlpha โ€” For the internship opportunity and structured learning path that made this project possible.
  • Ultralytics โ€” For the state-of-the-art YOLOv8 framework that democratizes real-time object detection.
  • Deep SORT Realtime โ€” For the seamless Python integration of the Deep SORT tracking algorithm.

Built with โค๏ธ and curiosity by Meraj Basiri and Mani Alagheband

About

Real-Time Object Detection and Tracking using YOLOv8 and Deep SORT | CodeAlpha Internship

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages