End-to-end medical computer vision system for parasite microscopy image analysis, combining multi-class classification and an optional object detection branch for localization and crop-based workflows.
This project was built as an applied AI engineering system, not only a model training exercise. It is designed to support practical microscopy analysis where speed, repeatability, and clear evaluation are critical.
The initial project context was participation in the IEEE MANSB Victoris 2 Final Competition hosted on Kaggle, then extension into a broader end-to-end AI workflow.
The system addresses the problem of manual parasite identification by providing:
- Structured image preprocessing and augmentation to improve robustness.
- Multi-architecture classification experiments (Custom CNN and transfer learning models).
- Optional YOLOv8-based detection pipeline for parasite localization and crop generation.
- Evaluation artifacts for model comparison and operational decision-making.
Real-world use cases:
- Laboratory screening support and triage assistance.
- Research workflows for parasite species analysis at scale.
- Educational and clinical prototyping for AI-assisted diagnostics.
The platform follows a staged ML system design:
- Input Layer: Microscopy images from curated parasite datasets.
- Processing Layer: Data cleaning, resizing, augmentation, and dataset balancing.
- Modeling Layer: Classification training (Custom CNN, ResNet50, ResNet50V2, Xception) and optional YOLOv8 detection.
- Evaluation Layer: History curves, confusion matrices, AUC plots, and detection metrics.
- Inference Layer: Class prediction and optional object-level crop extraction.
System flow:
Microscopy Images -> Preprocessing/Augmentation -> Model Training -> Evaluation -> Inference Outputs
The repository now includes the architecture diagram used for this staged ML flow:
- Multi-class parasite image classification across 21 labeled classes.
- Transfer learning support using ResNet50, ResNet50V2, and Xception.
- Custom CNN baseline for architecture comparison.
- Preprocessing and augmentation workflow to improve generalization.
- YOLOv8 detection extension for object localization and crop extraction.
- Experiment artifacts for each model family (history, AUC, confusion matrix, samples).
- Notebook-first reproducible workflow from preparation to inference.
Dataset sources used across the original workflow:
Primary classification label mapping (21 classes):
| Label | Description |
|---|---|
| 0 | Ascariasis |
| 1 | Babesia |
| 2 | Capillaria philippinensis |
| 3 | Enterobius vermicularis |
| 4 | Epidermophyton floccosum |
| 5 | Fasciolopsis buski |
| 6 | Hookworm egg |
| 7 | Hymenolepis diminuta |
| 8 | Hymenolepis nana |
| 9 | Leishmania |
| 10 | Leukocyte 400X |
| 11 | Leukocyte 1000X |
| 12 | Opisthorchis viverrine |
| 13 | Paragonimus spp |
| 14 | Plasmodium |
| 15 | Trichophyton rubrum (T. rubrum) |
| 16 | Taenia spp |
| 17 | Toxoplasma |
| 18 | Trichomonad |
| 19 | Trichuris trichiura |
| 20 | Trypanosome |
- Designed as a full pipeline from data preparation to inference, not a single notebook experiment.
- Used transfer learning as a deliberate design choice to improve sample efficiency on specialized medical imagery.
- Maintained both classification and detection capabilities to support multiple operational modes.
- Preserved evaluation transparency through architecture-specific visual diagnostics stored in-repo.
- Structured code and notebooks to support extension into API-based or cloud-based serving.
- Baseline model: Custom CNN for controlled comparison.
- Transfer models: ResNet50, ResNet50V2, Xception.
- Detection model: YOLOv8 (custom dataset training in
parasite detection/TrainYolov8CustomDataset.ipynb). - Evaluation approach:
- Classification: confusion matrix, AUC, training history, sample predictions.
- Detection: precision, recall, mAP@50, mAP@50-95 tracked per epoch.
- Language: Python
- Notebooks: Jupyter
- Deep Learning: TensorFlow/Keras, PyTorch
- Computer Vision: OpenCV, Ultralytics YOLOv8
- Data and Analysis: NumPy, pandas, scikit-learn, Matplotlib
- Experiment Tracking: TensorBoard logs (YOLO training runs)
- Cloud Readiness: Microsoft Azure / Azure Machine Learning (integration-ready workflow)
- Data preprocessing, feature engineering, and dataset balancing.
- Computer vision and image processing with OpenCV.
- Transfer learning and deep learning model development.
- Classification modeling and predictive analytics.
- Data analysis with NumPy and pandas.
- Statistics-driven model evaluation and reporting.
- API design mindset for downstream integration.
- Cloud-aligned ML workflows (Azure and Azure ML readiness).
- Python 3.10+
- pip
- Jupyter Notebook or VS Code with Jupyter extension
git clone https://github.com/<your-username>/Parasite-Image-Classifier.git
cd Parasite-Image-Classifier
python -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install jupyter numpy pandas matplotlib scikit-learn opencv-python tensorflow torch torchvision ultralyticsOpen notebooks in this sequence:
1 - Perpare Data.ipynb2 - Data Anaylsis.ipynb3 - Deep Learning Model.ipynb4 - Transfer Learning.ipynb5 - Parasite Image Classifier.ipynb
jupyter notebook- Training notebook:
parasite detection/TrainYolov8CustomDataset.ipynb - Inference notebook:
parasite detection/Detection.ipynb - Detection utility class:
parasite detection/Main.py - Trained weights examples:
parasite detection/train/80-epochs/weights/best.ptparasite detection/train/200-epochs/weights/best.pt
The repository includes architecture-level evaluation outputs under:
assets/Custom CNNassets/ResNet50assets/ResNet50V2assets/Xception
Each folder contains model history, confusion matrix, AUC, and sample output visuals for technical review.
Metrics extracted from tracked training logs:
| Run | Best mAP@50 | Epoch | Best mAP@50-95 | Epoch | Final Precision | Final Recall | Final mAP@50 | Final mAP@50-95 |
|---|---|---|---|---|---|---|---|---|
| 80 epochs | 0.97071 | 47 | 0.81999 | 47 | 0.87429 | 0.89445 | 0.93801 | 0.79261 |
| 200 epochs | 0.99442 | 90 | 0.84442 | 90 | 0.93633 | 0.90635 | 0.93236 | 0.76508 |
Engineering interpretation:
- Longer training improved peak mAP performance.
- Final-epoch drop in mAP@50-95 on the 200-epoch run suggests monitoring for overfitting and selecting weights by validation best epoch, not final epoch.

