Skip to content

lal3lu03/MMCLFMI

Repository files navigation

MMCLFMI - Multimodal Contrastive Learning for Medical Imaging

Project Logo

MMCLFMI is a bachelor thesis project on medical vision-language learning using MIMIC-CXR chest X-ray images and associated radiology reports. The project explores CLIP/CLOOB-style contrastive learning for aligning image and text representations, with downstream experiments in zero-shot chest X-ray classification.

The work was developed as part of the Bachelor’s program in Artificial Intelligence at Johannes Kepler University Linz.


Project motivation

Radiology datasets often contain both medical images and free-text reports. Contrastive image-text learning can use these paired modalities to learn representations that connect visual findings with clinical language.

This project investigates whether contrastive vision-language models can support medical image understanding by aligning:

  • chest X-ray images from MIMIC-CXR
  • radiology reports and report-derived labels
  • image embeddings from CLIP/CLOOB-style models
  • text embeddings derived from medical report content

The project was inspired by CheXzero, which demonstrated zero-shot chest X-ray classification using contrastive language-image learning.


Project summary

The pipeline covers the following workflow:

  1. Prepare the MIMIC-CXR dataset
  2. Convert DICOM chest X-ray images to JPEG format
  3. Extract report-derived labels using NegBio
  4. Preprocess metadata and image-report mappings
  5. Generate image and text embeddings with CLIP and CLOOB
  6. Run zero-shot classification experiments
  7. Compare prediction behavior across findings and symptom groups

Key features

  • Medical imaging: chest X-ray representation learning on MIMIC-CXR
  • Vision-language learning: image-report alignment with contrastive learning
  • Zero-shot classification: finding-level classification using learned image-text similarity
  • CLIP/CLOOB experiments: comparison of contrastive representation learning approaches
  • Report-derived labels: label extraction from radiology reports using NegBio
  • Preprocessing pipeline: DICOM conversion, metadata preparation, embedding generation, and evaluation notebooks

Dataset

This project uses MIMIC-CXR, a large publicly available database of chest radiographs and associated radiology reports.

Because MIMIC-CXR contains clinical data, it is not included in this repository. Users must obtain access through the official credentialed process and place the dataset locally before running the notebooks.

Expected local dataset location:

MultiModel_Contrastive_Learning_for_Medical_Imaging/
└── data/
    └── MIMIC_CXR/

Note: older internal folder names may still use MultiModel or MIMIC_CRX. These names refer to the same project context but should be understood as multimodal learning on MIMIC-CXR.


Method overview

The project follows a contrastive image-text learning setup.

Chest X-ray image
      |
      v
Image encoder
      |
      v
Image embedding
      |
      | similarity
      v
Text embedding
      ^
      |
Text encoder
      ^
      |
Radiology report / finding prompt

The core idea is to place images and their corresponding report-derived text representations in a shared embedding space. During zero-shot classification, disease or finding prompts can be compared against image embeddings to estimate the presence of specific findings.


Getting started

1. Prepare the dataset

Copy the MIMIC-CXR dataset into the expected data directory:

MultiModel_Contrastive_Learning_for_Medical_Imaging/data/MIMIC_CXR/

2. Run NegBio

Run NegBio on the radiology reports to obtain finding labels.

Instructions and related code are located in:

NegBio/

3. Create the environment

Create the conda environment:

conda env create -f environment.yml
conda activate mmclfmi

If the environment name differs in environment.yml, use the name defined there.

4. Convert DICOM images

Run the DICOM conversion script:

python converter.py

This converts DICOM chest X-ray images to JPEG format and stores them in the configured jpg folder.

5. Run preprocessing notebooks

Run the preprocessing notebooks in order:

visualization.ipynb
meta_data_pre.ipynb

These notebooks prepare metadata, image paths, labels, and intermediate files used for embedding generation and evaluation.

6. Generate embeddings

Run the embedding notebooks:

embedding_cloob.ipynb
embedding_clip.ipynb

These generate image and text embeddings for the CLOOB and CLIP experiments.

7. Run zero-shot classification

Choose one of the zero-shot evaluation notebooks:

zero_shot_findings.ipynb
zero_shot_all_symptoms.ipynb

Use zero_shot_findings.ipynb for finding-level experiments and zero_shot_all_symptoms.ipynb for broader symptom-based experiments.


Repository structure

.
├── MultiModel_Contrastive_Learning_for_Medical_Imaging/
│   ├── clip_/
│   ├── data/
│   │   ├── checkpoints/
│   │   ├── csv/
│   │   ├── embedding/
│   │   ├── jpg/
│   │   ├── MIMIC_CXR/
│   │   └── model_config/
│   ├── df_prepearing.ipynb
│   ├── preprocess_dicoms/
│   ├── Visualization_preprocessing/
│   └── Zero_shot/
├── NegBio/
└── README.md

Main folders

Folder Purpose
clip_ CLIP-related model and embedding code
data/checkpoints model checkpoints
data/csv metadata and processed CSV files
data/embedding generated CLIP/CLOOB embeddings
data/jpg converted JPEG chest X-ray images
data/MIMIC_CXR local MIMIC-CXR dataset location
data/model_config model configuration files
preprocess_dicoms DICOM preprocessing and conversion code
Visualization_preprocessing visualization and metadata preprocessing notebooks
Zero_shot zero-shot classification notebooks
NegBio report label extraction tooling

Results and outputs

The notebooks generate intermediate and final outputs such as:

  • processed metadata tables
  • report-derived labels
  • image embeddings
  • text embeddings
  • zero-shot classification scores
  • finding-level evaluation outputs
  • visualizations for dataset inspection and analysis

Exact result values depend on local preprocessing choices, dataset access, model configuration, and notebook parameters.


Limitations

This repository is a bachelor thesis research project and should not be interpreted as a clinical diagnostic tool.

Current limitations include:

  • MIMIC-CXR data is not included because it requires credentialed access
  • labels derived from radiology reports can contain noise
  • zero-shot classification performance depends strongly on prompt design and preprocessing
  • experiments are notebook-driven rather than packaged as a production pipeline
  • findings are evaluated retrospectively and not clinically validated

Acknowledgements

This thesis was supervised and graded by Sepp Hochreiter at the Institute for Machine Learning, Johannes Kepler University Linz.

I also received co-supervision and guidance from:

  • Elisabeth Rumetshofer: https://github.com/elirum
  • Andreas Fürst: https://github.com/fuersta

Their work on the CLOOB repository formed an important basis for this project.


References

  • MIMIC-CXR: https://physionet.org/content/mimic-cxr/
  • CLIP: https://arxiv.org/abs/2103.00020
  • CLOOB: https://arxiv.org/abs/2110.11316
  • NegBio: https://arxiv.org/abs/1712.05898
  • CheXzero: https://doi.org/10.1038/s41551-022-00936-9

Contact

Maximilian Hageneder Email: max.hageneder@gmail.com LinkedIn: https://www.linkedin.com/in/maximilian-hageneder-ai

About

Medical vision-language learning with CLIP/CLOOB for contrastive chest X-ray image-report representation learning on MIMIC-CXR.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors