MMCLFMI is a bachelor thesis project on medical vision-language learning using MIMIC-CXR chest X-ray images and associated radiology reports. The project explores CLIP/CLOOB-style contrastive learning for aligning image and text representations, with downstream experiments in zero-shot chest X-ray classification.
The work was developed as part of the Bachelor’s program in Artificial Intelligence at Johannes Kepler University Linz.
Radiology datasets often contain both medical images and free-text reports. Contrastive image-text learning can use these paired modalities to learn representations that connect visual findings with clinical language.
This project investigates whether contrastive vision-language models can support medical image understanding by aligning:
- chest X-ray images from MIMIC-CXR
- radiology reports and report-derived labels
- image embeddings from CLIP/CLOOB-style models
- text embeddings derived from medical report content
The project was inspired by CheXzero, which demonstrated zero-shot chest X-ray classification using contrastive language-image learning.
The pipeline covers the following workflow:
- Prepare the MIMIC-CXR dataset
- Convert DICOM chest X-ray images to JPEG format
- Extract report-derived labels using NegBio
- Preprocess metadata and image-report mappings
- Generate image and text embeddings with CLIP and CLOOB
- Run zero-shot classification experiments
- Compare prediction behavior across findings and symptom groups
- Medical imaging: chest X-ray representation learning on MIMIC-CXR
- Vision-language learning: image-report alignment with contrastive learning
- Zero-shot classification: finding-level classification using learned image-text similarity
- CLIP/CLOOB experiments: comparison of contrastive representation learning approaches
- Report-derived labels: label extraction from radiology reports using NegBio
- Preprocessing pipeline: DICOM conversion, metadata preparation, embedding generation, and evaluation notebooks
This project uses MIMIC-CXR, a large publicly available database of chest radiographs and associated radiology reports.
Because MIMIC-CXR contains clinical data, it is not included in this repository. Users must obtain access through the official credentialed process and place the dataset locally before running the notebooks.
Expected local dataset location:
MultiModel_Contrastive_Learning_for_Medical_Imaging/
└── data/
└── MIMIC_CXR/
Note: older internal folder names may still use MultiModel or MIMIC_CRX. These names refer to the same project context but should be understood as multimodal learning on MIMIC-CXR.
The project follows a contrastive image-text learning setup.
Chest X-ray image
|
v
Image encoder
|
v
Image embedding
|
| similarity
v
Text embedding
^
|
Text encoder
^
|
Radiology report / finding prompt
The core idea is to place images and their corresponding report-derived text representations in a shared embedding space. During zero-shot classification, disease or finding prompts can be compared against image embeddings to estimate the presence of specific findings.
Copy the MIMIC-CXR dataset into the expected data directory:
MultiModel_Contrastive_Learning_for_Medical_Imaging/data/MIMIC_CXR/
Run NegBio on the radiology reports to obtain finding labels.
Instructions and related code are located in:
NegBio/
Create the conda environment:
conda env create -f environment.yml
conda activate mmclfmiIf the environment name differs in environment.yml, use the name defined there.
Run the DICOM conversion script:
python converter.pyThis converts DICOM chest X-ray images to JPEG format and stores them in the configured jpg folder.
Run the preprocessing notebooks in order:
visualization.ipynb
meta_data_pre.ipynb
These notebooks prepare metadata, image paths, labels, and intermediate files used for embedding generation and evaluation.
Run the embedding notebooks:
embedding_cloob.ipynb
embedding_clip.ipynb
These generate image and text embeddings for the CLOOB and CLIP experiments.
Choose one of the zero-shot evaluation notebooks:
zero_shot_findings.ipynb
zero_shot_all_symptoms.ipynb
Use zero_shot_findings.ipynb for finding-level experiments and zero_shot_all_symptoms.ipynb for broader symptom-based experiments.
.
├── MultiModel_Contrastive_Learning_for_Medical_Imaging/
│ ├── clip_/
│ ├── data/
│ │ ├── checkpoints/
│ │ ├── csv/
│ │ ├── embedding/
│ │ ├── jpg/
│ │ ├── MIMIC_CXR/
│ │ └── model_config/
│ ├── df_prepearing.ipynb
│ ├── preprocess_dicoms/
│ ├── Visualization_preprocessing/
│ └── Zero_shot/
├── NegBio/
└── README.md
| Folder | Purpose |
|---|---|
clip_ |
CLIP-related model and embedding code |
data/checkpoints |
model checkpoints |
data/csv |
metadata and processed CSV files |
data/embedding |
generated CLIP/CLOOB embeddings |
data/jpg |
converted JPEG chest X-ray images |
data/MIMIC_CXR |
local MIMIC-CXR dataset location |
data/model_config |
model configuration files |
preprocess_dicoms |
DICOM preprocessing and conversion code |
Visualization_preprocessing |
visualization and metadata preprocessing notebooks |
Zero_shot |
zero-shot classification notebooks |
NegBio |
report label extraction tooling |
The notebooks generate intermediate and final outputs such as:
- processed metadata tables
- report-derived labels
- image embeddings
- text embeddings
- zero-shot classification scores
- finding-level evaluation outputs
- visualizations for dataset inspection and analysis
Exact result values depend on local preprocessing choices, dataset access, model configuration, and notebook parameters.
This repository is a bachelor thesis research project and should not be interpreted as a clinical diagnostic tool.
Current limitations include:
- MIMIC-CXR data is not included because it requires credentialed access
- labels derived from radiology reports can contain noise
- zero-shot classification performance depends strongly on prompt design and preprocessing
- experiments are notebook-driven rather than packaged as a production pipeline
- findings are evaluated retrospectively and not clinically validated
This thesis was supervised and graded by Sepp Hochreiter at the Institute for Machine Learning, Johannes Kepler University Linz.
I also received co-supervision and guidance from:
- Elisabeth Rumetshofer:
https://github.com/elirum - Andreas Fürst:
https://github.com/fuersta
Their work on the CLOOB repository formed an important basis for this project.
- MIMIC-CXR:
https://physionet.org/content/mimic-cxr/ - CLIP:
https://arxiv.org/abs/2103.00020 - CLOOB:
https://arxiv.org/abs/2110.11316 - NegBio:
https://arxiv.org/abs/1712.05898 - CheXzero:
https://doi.org/10.1038/s41551-022-00936-9
Maximilian Hageneder
Email: max.hageneder@gmail.com
LinkedIn: https://www.linkedin.com/in/maximilian-hageneder-ai
