GitHub - amankumar12S/Optimizing-Deep-Learning-for-Brain-Tumor-Classification: Official implementation of the paper Optimizing Deep Learning for Brain Tumor Classification: A Comparative Ablation Study of Preprocessing and Augmentation Strategies. Includes patient-level data splitting, preprocessing pipelines (skull-stripping, CLAHE, bilateral filtering), augmentation workflows, VGG16 training code, and complete exp logs.

Project Overview

This project investigates how image preprocessing and data augmentation strategies affect the performance of deep learning models for brain tumor classification (Glioma, Meningioma, Pituitary).

Most prior works rely on slice-wise random splitting, which introduces patient-level data leakage and leads to inflated performance estimates. We address this issue by enforcing a strict Patient-Level Split, ensuring realistic clinical generalization.

A controlled 8-experiment ablation study was conducted using a fixed VGG16 architecture to isolate the effects of preprocessing and augmentation.

Full Jupyter Notebook

If GitHub cannot render the full notebook due to its size, the complete ablation study (with logs, plots, and outputs) is available on Google Colab:

Link : (https://colab.research.google.com/drive/1_LDk5Kr7Fo2emU5qR0cuDmMw01Q13K39?usp=sharing)]

Dataset Link : Brain Tumor Dataset on Figshare

Key Findings

Preprocessing alone is insufficient — accuracy saturates at ~88%.
Aggressive (“destructive”) preprocessing harms performance — Skull-stripping and CLAHE combined with augmentation lead to underfitting (~82% accuracy).
Denoising combined with augmentation yields the highest overall accuracy, but with important statistical nuance:
- 94.00% Accuracy
- 0.991 AUC
- However, McNemar’s test indicates that the improvement over augmentation alone is not statistically significant, suggesting a global accuracy gain rather than a consistent per-sample advantage.

Key takeaway: Data augmentation is the dominant factor for generalization, while denoising provides a modest average improvement but does not uniformly outperform augmentation alone on all test samples.

Methodology

1. Dataset & Splitting

Source: Figshare Brain Tumor Dataset (Cheng et al.)
Data Size: 3,064 T1-weighted contrast-enhanced MRI images from 233 patients
Tumor Classes: Glioma, Meningioma, Pituitary
Leakage Prevention: Custom patient-level splitting algorithm
- Training: 70% (163 patients)
- Validation: 15% (35 patients)
- Testing: 15% (35 patients)
Result: Zero patient overlap across splits

Duplicate Verification

A perceptual hash (imagehash) check identified 26 visually identical images. All duplicates were confined within the same split partition, confirming no cross-split leakage.

Patient-Level Split Summary

Set	Patients	Percentage
Train	163	70%
Validation	35	15%
Test	35	15%

Zero patient overlap.

Model Architecture (VGG16)

Pretrained VGG16 (ImageNet), frozen convolutional base
Custom classification head:
- Flatten → Dense(512) → Dropout(0.6)
- Dense(256) → Dropout(0.5)
- Softmax (3 classes)
Regularization: Dropout + EarlyStopping
Identical architecture and hyperparameters across all experiments

Experimental Setup

Preprocessing strategies tested:

Skull-Stripping (Otsu thresholding + morphological operations)
CLAHE (contrast enhancement)
Bilateral Filtering (edge-preserving denoising)
Raw images (baseline)

Each configuration was evaluated with and without data augmentation under identical conditions.

Results Summary

Exp	Strategy	Augmentation	Accuracy	AUC	Outcome
1	Baseline	No	87.92%	0.970	Weak generalization
2	Aug Only	Yes	92.16%	0.992	Major improvement
3	Skull-Stripping	No	88.77%	0.968	Minor change
4	CLAHE	No	88.98%	0.977	Minor change
5	Denoising	No	88.14%	0.973	No improvement
6	Skull-Strip + Aug	Yes	81.78%	0.960	Underfitting
7	CLAHE + Aug	Yes	82.63%	0.955	Underfitting
8	Denoise + Aug	Yes	94.00%	0.991	Highest accuracy

Statistical note: McNemar’s test compared the Experiments and did indicated a statistically significant difference p = 0.05

Visualizations

Training & validation curves
Confusion matrices
ROC curves
Grad-CAM visual explanations

All outputs are available in the master Jupyter notebook included in this repository.

Installation

pip install tensorflow opencv-python-headless scikit-learn pandas seaborn matplotlib tqdm mat73

Authors & Contributions

Latchan Chhetri

LinkedIn: https://www.linkedin.com/in/latchan-chhetri-704380297/

Contributions:

Methodology & experimental design
Patient-level data splitting
Preprocessing pipeline
Model training & ablation study

Aman Kumar

LinkedIn : https://www.linkedin.com/in/aman-kumar-4a415a287/ Contributions:

Conceptualization
Administrative assistance
Analysis, visualization, and writing
Manuscript approval

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FIGSHARE(DATASET_FINAL).ipynb		FIGSHARE(DATASET_FINAL).ipynb
LICENSE		LICENSE
MASTER_COMPARITIVE_ABLATION_NOTEBOOK.ipynb		MASTER_COMPARITIVE_ABLATION_NOTEBOOK.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Full Jupyter Notebook

Dataset Link : Brain Tumor Dataset on Figshare

Key Findings

Methodology

1. Dataset & Splitting

Duplicate Verification

Patient-Level Split Summary

Model Architecture (VGG16)

Experimental Setup

Results Summary

Visualizations

Installation

Authors & Contributions

Latchan Chhetri

Aman Kumar

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Full Jupyter Notebook

Dataset Link : Brain Tumor Dataset on Figshare

Key Findings

Methodology

1. Dataset & Splitting

Duplicate Verification

Patient-Level Split Summary

Model Architecture (VGG16)

Experimental Setup

Results Summary

Visualizations

Installation

Authors & Contributions

Latchan Chhetri

Aman Kumar

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages