BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
This repository addresses the fragmented nature of the multi-object tracking (MOT) field by providing a standardized collection of pluggable, state-of-the-art trackers. Designed to seamlessly integrate with segmentation, object detection, and pose estimation models, the repository streamlines the adoption and comparison of MOT methods. For trackers employing appearance-based techniques, we offer a range of automatically downloadable state-of-the-art re-identification (ReID) models, from heavyweight (CLIPReID) to lightweight options (LightMBN, OSNet). Additionally, clear and practical examples demonstrate how to effectively integrate these trackers with various popular models, enabling versatility across diverse vision tasks.
| Tracker | Status | HOTA↑ | MOTA↑ | IDF1↑ | FPS |
|---|---|---|---|---|---|
| boosttrack | ✅ | 69.253 | 75.914 | 83.206 | 25 |
| botsort | ✅ | 68.885 | 78.222 | 81.344 | 46 |
| strongsort | ✅ | 68.05 | 76.185 | 80.763 | 17 |
| deepocsort | ✅ | 67.796 | 75.868 | 80.514 | 12 |
| bytetrack | ✅ | 67.68 | 78.039 | 79.157 | 1265 |
| ocsort | ✅ | 66.441 | 74.548 | 77.899 | 1483 |
NOTES: Evaluation was conducted on the second half of the MOT17 training set, as the validation set is not publicly available and the ablation detector was trained on the first half. We employed pre-generated detections and embeddings. Each tracker was configured using the default parameters from their official repositories.
BoxMOT seamlessly integrates ultra-lightweight, motion-only trackers for efficient high-FPS tracking on CPUs, as well as hybrid methods, which combine motion cues with deep ReID embeddings for improved accuracy at higher computational costs. Evaluate your trackers directly against state-of-the-art methods across diverse public benchmarks including MOT17 (moderately crowded scenes), MOT20 (extremely dense crowds), and DanceTrack (complex human motions). Our integrated workflow provides scripts designed for rapid experimentation, enabling users to save detections and embeddings once and reuse them with any tracking algorithm. This eliminates redundant computations, significantly reducing compute time and ensuring fair comparisons across hardware setups.
Install the boxmot package, including all requirements, in a Python>=3.9 environment:
pip install boxmotBoxMOT provides a unified CLI boxmot with the following subcommands:
Usage: boxmot COMMAND [ARGS]...
Commands:
track Run tracking only
generate Generate detections and embeddings
eval Evaluate tracking performance using the official trackeval repository
tune Tune tracker hyperparameters based on selected detections and embeddingsIf you want to contribute to this package check how to contribute here
Tracking
$ boxmot track --yolo-model rf-detr-base.pt # bboxes only
boxmot track --yolo-model yolox_s.pt # bboxes only
boxmot track --yolo-model yolo12n.pt # bboxes only
boxmot track --yolo-model yolo11n.pt # bboxes only
boxmot track --yolo-model yolov10n.pt # bboxes only
boxmot track --yolo-model yolov9c.pt # bboxes only
boxmot track --yolo-model yolov8n.pt # bboxes only
yolov8n-seg.pt # bboxes + segmentation masks
yolov8n-pose.pt # bboxes + pose estimationTracking methods
$ boxmot track --tracking-method deepocsort
strongsort
ocsort
bytetrack
botsort
boosttrackTracking sources
Tracking can be run on most video formats
$ boxmot track --source 0 # webcam
img.jpg # image
vid.mp4 # video
path/ # directory
path/*.jpg # glob
'https://youtu.be/Zgi9g1ksQHc' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP streamSelect ReID model
Some tracking methods combine appearance description and motion in the process of tracking. For those which use appearance, you can choose a ReID model based on your needs from this ReID model zoo. These model can be further optimized for you needs by the reid_export.py script
$ boxmot track --source 0 --reid-model lmbn_n_cuhk03_d.pt # lightweight
osnet_x0_25_market1501.pt
mobilenetv2_x1_4_msmt17.engine
resnet50_msmt17.onnx
osnet_x1_0_msmt17.pt
clip_market1501.pt # heavy
clip_vehicleid.pt
...Filter tracked classes
By default the tracker tracks all MS COCO classes.
If you want to track a subset of the classes that you model predicts, add their corresponding index after the classes flag,
boxmot track --source 0 --yolo-model yolov8s.pt --classes 16 17 # COCO yolov8 model. Track cats and dogs, onlyHere is a list of all the possible objects that a Yolov8 model trained on MS COCO can detect. Notice that the indexing for the classes in this repo starts at zero
Evaluation
Evaluate a combination of detector, tracking method and ReID model on standard MOT dataset or you custom one by
# reproduce MOT17 README results
$ boxmot eval --yolo-model yolox_x_MOT17_ablation.pt --reid-model lmbn_n_duke.pt --tracking-method boosttrack --source MOT17-ablation --verbose
# MOT20 results
$ boxmot eval --yolo-model yolox_x_MOT20_ablation.pt --reid-model lmbn_n_duke.pt --tracking-method boosttrack --source MOT20-ablation --verbose
# Dancetrack results
$ boxmot eval --yolo-model yolox_x_dancetrack_ablation.pt --reid-model lmbn_n_duke.pt --tracking-method boosttrack --source dancetrack-ablation --verbose
# metrics on custom dataset
$ boxmot eval --yolo-model yolov8n.pt --reid-model osnet_x0_25_msmt17.pt --tracking-method deepocsort --source ./assets/MOT17-mini/train --verboseadd --gsi to your command for postprocessing the MOT results by gaussian smoothed interpolation. Detections and embeddings are stored for the selected YOLO and ReID model respectively. They can then be loaded into any tracking algorithm. Avoiding the overhead of repeatedly generating this data.
Evolution
We use a fast and elitist multiobjective genetic algorithm for tracker hyperparameter tuning. By default the objectives are: HOTA, MOTA, IDF1. Run it by
# saves dets and embs under ./runs/dets_n_embs separately for each selected yolo and reid model
$ boxmot generate --source ./assets/MOT17-mini/train --yolo-model yolov8n.pt yolov8s.pt --reid-model weights/osnet_x0_25_msmt17.pt
# evolve parameters for specified tracking method using the selected detections and embeddings generated in the previous step
$ boxmot tune --dets yolov8n --embs osnet_x0_25_msmt17 --n-trials 9 --tracking-method botsort --source ./assets/MOT17-mini/trainThe set of hyperparameters leading to the best HOTA result are written to the tracker's config file.
Export
We support ReID model export to ONNX, OpenVINO, TorchScript and TensorRT
# export to ONNX
$ python3 boxmot/appearance/reid_export.py --include onnx --device cpu
# export to OpenVINO
$ python3 boxmot/appearance/reid_export.py --include openvino --device cpu
# export to TensorRT with dynamic input
$ python3 boxmot/appearance/reid_export.py --include engine --device 0 --dynamic| Example Description | Notebook |
|---|---|
| Torchvision bounding box tracking with BoxMOT | |
| Torchvision pose tracking with BoxMOT | |
| Torchvision segmentation tracking with BoxMOT |
For BoxMOT bugs and feature requests please visit GitHub Issues. For business inquiries or professional support requests please send an email to: box-mot@outlook.com
