This project focuses on developing a robust Computer Vision system to classify pulmonary diseases from chest X-ray images. By utilising Deep Learning architectures, the goal is to create an interpretable model that assists in medical diagnostics through automated image analysis.
The project followed an iterative research path to overcome significant data imbalance and overfitting:
| Model Configuration | Validation Accuracy | Test Accuracy | Status |
|---|---|---|---|
| Frozen ResNet50 (Baseline) | 87.5% (n=16) | 59.78% | Overfit / Discarded |
| Fine-Tuned ResNet50 (v2) | 98.29% (n=~800) | 80.61% | Current Best |
Key Breakthrough: Transitioning from a frozen backbone to full-model fine-tuning and expanding the validation set by 50x were the primary drivers of the 20.8% performance increase.
- Objective: Accurate classification of pulmonary conditions (e.g., Pneumonia) using convolutional neural networks (CNNs).
- Architecture: Developed using PyTorch within a Miniconda environment for reproducible research.
- Dataset: Medical X-ray datasets processed via OpenCV and Torchvision.
- Image Processing & Augmentation: Implementing preprocessing pipelines to normalise medical datasets and improve model robustness through resizing, grayscale conversion, and data augmentation.
- Performance Optimisation: Utilising Scikit-learn and Matplotlib in Jupyter Notebooks to evaluate model accuracy through confusion matrices and loss curves.
- Development Environment: Managed through VS Code and Miniconda to ensure a clean, isolated dependency structure.
.gitignore: Standardised configuration to exclude local environments and large datasets from version control.environment.yml: Miniconda configuration file for seamless environment replication.data/: (Local Only) Local directory for medical image datasets; excluded from Git to maintain privacy and performance.med_ai_env/: (Local Only) The dedicated Conda environment containing PyTorch, Torchvision, and CUDA-optimized libraries for the RTX 4090.models/: (Local Only) Stores trained PyTorch weights (.pthfiles). Note: These files are excluded from the repository due to size.notebooks/: Contains detailed Jupyter Notebooks tracking experimental iterations and hyperparameter tuning.scripts/: Python utilities for hardware verification and environment setup (e.g.,check_gpu.py).src/: Modular source code for model architecture, training loops, and evaluation pipelines.
I am currently maintaining detailed technical documentation to track experimental iterations and prepare findings for academic review at Queen Mary University of London. The project follows professional Data Governance standards mirrored from my industry experience at NovoPart to ensure data integrity and IP security.
- This project is an independent research endeavour.
- All methodologies follow standard research ethics regarding medical data handling and algorithmic transparency.
- Email: monica.duarte@monicaduarte.com
- Portfolio: monicaduarte.com
- LinkedIn: linkedin.com/in/monicaduarteai
- GitHub: github.com/monicaduarteai
This project is a core component of my development in Computer Vision and Artificial Intelligence.