MMCLFMI - Multimodal Contrastive Learning for Medical Imaging

MMCLFMI is a bachelor thesis project on medical vision-language learning using MIMIC-CXR chest X-ray images and associated radiology reports. The project explores CLIP/CLOOB-style contrastive learning for aligning image and text representations, with downstream experiments in zero-shot chest X-ray classification.

The work was developed as part of the Bachelor’s program in Artificial Intelligence at Johannes Kepler University Linz.

Project motivation

Radiology datasets often contain both medical images and free-text reports. Contrastive image-text learning can use these paired modalities to learn representations that connect visual findings with clinical language.

This project investigates whether contrastive vision-language models can support medical image understanding by aligning:

chest X-ray images from MIMIC-CXR
radiology reports and report-derived labels
image embeddings from CLIP/CLOOB-style models
text embeddings derived from medical report content

The project was inspired by CheXzero, which demonstrated zero-shot chest X-ray classification using contrastive language-image learning.

Project summary

The pipeline covers the following workflow:

Prepare the MIMIC-CXR dataset
Convert DICOM chest X-ray images to JPEG format
Extract report-derived labels using NegBio
Preprocess metadata and image-report mappings
Generate image and text embeddings with CLIP and CLOOB
Run zero-shot classification experiments
Compare prediction behavior across findings and symptom groups

Key features

Medical imaging: chest X-ray representation learning on MIMIC-CXR
Vision-language learning: image-report alignment with contrastive learning
Zero-shot classification: finding-level classification using learned image-text similarity
CLIP/CLOOB experiments: comparison of contrastive representation learning approaches
Report-derived labels: label extraction from radiology reports using NegBio
Preprocessing pipeline: DICOM conversion, metadata preparation, embedding generation, and evaluation notebooks

Dataset

This project uses MIMIC-CXR, a large publicly available database of chest radiographs and associated radiology reports.

Because MIMIC-CXR contains clinical data, it is not included in this repository. Users must obtain access through the official credentialed process and place the dataset locally before running the notebooks.

Expected local dataset location:

MultiModel_Contrastive_Learning_for_Medical_Imaging/
└── data/
    └── MIMIC_CXR/

Note: older internal folder names may still use MultiModel or MIMIC_CRX. These names refer to the same project context but should be understood as multimodal learning on MIMIC-CXR.

Method overview

The project follows a contrastive image-text learning setup.

Chest X-ray image
      |
      v
Image encoder
      |
      v
Image embedding
      |
      | similarity
      v
Text embedding
      ^
      |
Text encoder
      ^
      |
Radiology report / finding prompt

The core idea is to place images and their corresponding report-derived text representations in a shared embedding space. During zero-shot classification, disease or finding prompts can be compared against image embeddings to estimate the presence of specific findings.

Getting started

1. Prepare the dataset

Copy the MIMIC-CXR dataset into the expected data directory:

MultiModel_Contrastive_Learning_for_Medical_Imaging/data/MIMIC_CXR/

2. Run NegBio

Run NegBio on the radiology reports to obtain finding labels.

Instructions and related code are located in:

NegBio/

3. Create the environment

Create the conda environment:

conda env create -f environment.yml
conda activate mmclfmi

If the environment name differs in environment.yml, use the name defined there.

4. Convert DICOM images

Run the DICOM conversion script:

python converter.py

This converts DICOM chest X-ray images to JPEG format and stores them in the configured jpg folder.

5. Run preprocessing notebooks

Run the preprocessing notebooks in order:

visualization.ipynb
meta_data_pre.ipynb

These notebooks prepare metadata, image paths, labels, and intermediate files used for embedding generation and evaluation.

6. Generate embeddings

Run the embedding notebooks:

embedding_cloob.ipynb
embedding_clip.ipynb

These generate image and text embeddings for the CLOOB and CLIP experiments.

7. Run zero-shot classification

Choose one of the zero-shot evaluation notebooks:

zero_shot_findings.ipynb
zero_shot_all_symptoms.ipynb

Use zero_shot_findings.ipynb for finding-level experiments and zero_shot_all_symptoms.ipynb for broader symptom-based experiments.

Repository structure

.
├── MultiModel_Contrastive_Learning_for_Medical_Imaging/
│   ├── clip_/
│   ├── data/
│   │   ├── checkpoints/
│   │   ├── csv/
│   │   ├── embedding/
│   │   ├── jpg/
│   │   ├── MIMIC_CXR/
│   │   └── model_config/
│   ├── df_prepearing.ipynb
│   ├── preprocess_dicoms/
│   ├── Visualization_preprocessing/
│   └── Zero_shot/
├── NegBio/
└── README.md

Main folders

Folder	Purpose
`clip_`	CLIP-related model and embedding code
`data/checkpoints`	model checkpoints
`data/csv`	metadata and processed CSV files
`data/embedding`	generated CLIP/CLOOB embeddings
`data/jpg`	converted JPEG chest X-ray images
`data/MIMIC_CXR`	local MIMIC-CXR dataset location
`data/model_config`	model configuration files
`preprocess_dicoms`	DICOM preprocessing and conversion code
`Visualization_preprocessing`	visualization and metadata preprocessing notebooks
`Zero_shot`	zero-shot classification notebooks
`NegBio`	report label extraction tooling

Results and outputs

The notebooks generate intermediate and final outputs such as:

processed metadata tables
report-derived labels
image embeddings
text embeddings
zero-shot classification scores
finding-level evaluation outputs
visualizations for dataset inspection and analysis

Exact result values depend on local preprocessing choices, dataset access, model configuration, and notebook parameters.

Limitations

This repository is a bachelor thesis research project and should not be interpreted as a clinical diagnostic tool.

Current limitations include:

MIMIC-CXR data is not included because it requires credentialed access
labels derived from radiology reports can contain noise
zero-shot classification performance depends strongly on prompt design and preprocessing
experiments are notebook-driven rather than packaged as a production pipeline
findings are evaluated retrospectively and not clinically validated

Acknowledgements

This thesis was supervised and graded by Sepp Hochreiter at the Institute for Machine Learning, Johannes Kepler University Linz.

I also received co-supervision and guidance from:

Elisabeth Rumetshofer: https://github.com/elirum
Andreas Fürst: https://github.com/fuersta

Their work on the CLOOB repository formed an important basis for this project.

References

MIMIC-CXR: https://physionet.org/content/mimic-cxr/
CLIP: https://arxiv.org/abs/2103.00020
CLOOB: https://arxiv.org/abs/2110.11316
NegBio: https://arxiv.org/abs/1712.05898
CheXzero: https://doi.org/10.1038/s41551-022-00936-9

Contact

Maximilian Hageneder Email: max.hageneder@gmail.com LinkedIn: https://www.linkedin.com/in/maximilian-hageneder-ai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMCLFMI - Multimodal Contrastive Learning for Medical Imaging

Project motivation

Project summary

Key features

Dataset

Method overview

Getting started

1. Prepare the dataset

2. Run NegBio

3. Create the environment

4. Convert DICOM images

5. Run preprocessing notebooks

6. Generate embeddings

7. Run zero-shot classification

Repository structure

Main folders

Results and outputs

Limitations

Acknowledgements

References

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
NegBio		NegBio
Visualization_preprocessing		Visualization_preprocessing
Zero_shot		Zero_shot
clip_		clip_
preprocess_dicoms		preprocess_dicoms
LICENSE		LICENSE
README.md		README.md
df_prepearing.ipynb		df_prepearing.ipynb
environment.yml		environment.yml
logo.png		logo.png

Folders and files

Latest commit

History

Repository files navigation

MMCLFMI - Multimodal Contrastive Learning for Medical Imaging

Project motivation

Project summary

Key features

Dataset

Method overview

Getting started

1. Prepare the dataset

2. Run NegBio

3. Create the environment

4. Convert DICOM images

5. Run preprocessing notebooks

6. Generate embeddings

7. Run zero-shot classification

Repository structure

Main folders

Results and outputs

Limitations

Acknowledgements

References

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages