In this project our team tackled one of the biggest challenges in computational pathology : leveraging massive histopathological images for accurate cancer subtype prediction. This is the full pipeline of all our work which goes in the following sequence:
- Data Preprocessing : Making patches of whole slide images.
- Data Cleaning : Filtering out white and faulty patches from our dataset.
- Feature Extraction : Extracting high quality features using SOTA models such as Conch 1.5.
- Tissue Classifier : Training a tissue classifier using the CRC100K dataset to cluster the patches and average on the clusters to get a slide level embedding.
- Slide Classification : Making the final prediction and evaluation.