Skip to content

A Multimodality Human Detection project part of the Robotics and Machine Learning course at the University of Coimbra

Notifications You must be signed in to change notification settings

felipe-aveiro/MMHD-project

Repository files navigation

🤖 MMHD-project - Multimodality Human Detection

MMHD (Multimodality Human Detection) is a research-oriented project developed for the Robotics and Machine Learning course at the University of Coimbra.
The goal is to explore and compare the performance of YOLOv8 models for human detection using RGB, thermal, and depth image modalities, individually and in a fusion-based setup.

🎓🏆 Approved with the Highest Honors!


🎯 Objectives

  • ✅ Train YOLOv8 models for RGB, thermal, and depth images
  • ✅ Evaluate and compare performances across modalities
  • ✅ Perform inference and visualize predictions
  • ✅ Implement and assess late fusion strategies (e.g., NMS, Voting)
  • ✅ Prepare for possible sensor fusion in real-world applications

🛠️ Model Details

Modality Model Used Input Size Dataset
RGB YOLOv8s 640×512 MID-3K RGB subset
Thermal YOLOv8s 640×512 MID-3K Thermal subset
Depth YOLOv8s 640×512 MID-3K Depth subset

📁 Project Structure

MMHD-project/
|
├── README.md # 📌 This file!
|
├── AAU-VAP-trimodal-dataset/ # 🗂️ Trimodal dataset with aligned RGB, thermal, and depth images
├── MID-3K/ # 🗂️ Base dataset
│ └── dataset/
│   ├── rgb/ # 🌈 RGB images and labels (split & full)
│   ├── thermal/ # 🌡️ Thermal images and labels (split & full)
│   └── depth/ # 🕳️ Depth images and labels (split & full)
|
├── Project/ # 💻 Scripts and project related files
|   ├── RML-2025-report-FelipeTassariAveiro.pdf # 📝 Final report!
|   ├── dataset-separator.py # ✂️ Python script to split dataset
|   ├── RML-project-MMHD.py # 🏋️ Python script to train models
|   ├── label-identifier.py # 🏷️ Python script to organize AAU VAP Trimodal dataset labels
|   ├── Depth.png # 📊 Normalized confusion matrices for depth modality
|   ├── RGB.png # 📊 Normalized confusion matrices for RGB modality
|   ├── Thermal.png # 📊 Normalized confusion matrices for thermal modality
│   └── README-MultimodalISRDataset.md # 📘 README file with description on MID-3K dataset
|
├── RML-project-MMHD/ # 📁 Contains training outputs of each modality
|   ├── rgb/ # 🌈 YOLO training results for RGB modality (metrics, predictions, weights)
|   ├── thermal/ # 🌡️ YOLO training results for thermal modality (metrics, predictions, weights)
|   ├── depth/ # 🕳️ YOLO training results for depth modality (metrics, predictions, weights)
|   ├── test_eval_rgb/ # 🧪 Validation metrics and plots on test set using RGB-trained model
|   ├── test_eval_AAU_rgb/ # 🧪 Validation metrics and plots on test set using RGB-trained model (AAU VAP Trimodal dataset)
|   ├── test_eval_thermal/ # 🧪 Validation metrics and plots on test set using thermal-trained model
|   ├── test_eval_AAU_thermal/ # 🧪 Validation metrics and plots on test set using thermal-trained model (AAU VAP Trimodal dataset)
|   ├── test_eval_depth/ # 🧪 Validation metrics and plots on test set using depth-trained model
|   └── test_eval_AAU_depth/ # 🧪 Validation metrics and plots on test set using depth-trained model (AAU VAP Trimodal dataset)
|
├── predictions # 🧠 Inference results
|  └── late_fusion/ # 🤖 Late fusion inference results
│    ├── comparison/ # 🧩 Predictions comparison between Voting and NMS 
│    ├── eval_summary/ # 📈 Computed metrics 
│    ├── nms/ # 🧮 NMS predictions 
│    └── voting/ # ⚖️ Voting predictions 
|
├── rgb.yaml # ⚙️ Configuration file for training with RGB images
├── thermal.yaml # ⚙️ Configuration file for training with thermal images
├── depth.yaml # ⚙️ Configuration file for training with depth images
|
├── yolo11n.pt # 🤖 Pretrained YOLOv11n model weights
├── yolov8s.pt # 🤖 Pretrained YOLOv8s model weights
|
└── MMHD_project.ipynb # 👨‍💻 Colab notebook

🧪 Setup & Requirements

  • ✅ Google Colab environment
  • Ultralytics YOLOv8
  • ✅ Dataset preprocessing and splits handled automatically
  • ✅ Custom YAML configurations for each modality
  • ✅ Compatible with fusion-based inference

👨‍💻 Felipe Tassari Aveiro

Master's in Mechanical Engineering, Robotics and Machine Learning (2025)

🎓🏆 Approved with the Highest Honors

University of Coimbra


📎 References

About

A Multimodality Human Detection project part of the Robotics and Machine Learning course at the University of Coimbra

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published