RadVLM: Vision Language Models for Radiology Report Generation - A Reasoning and Knowledge Graph Retrieval Augmented Generation Approach

This is official repo for "RadVLM: Vision Language Models for Radiology Report Generation" by DBIS group at RWTH Aachen University (Yongli Mou*, Antonia Gustke and Stefan Decker)

1. Overview

RadVLM is a research project focused on enhancing radiology report generation using Vision-Language Models (VLMs). It integrates reasoning and knowledge graph retrieval to improve accuracy and contextual understanding. The repository provides tools for dataset preprocessing, model training, and evaluation, along with pre-trained models and benchmarks.

2. Installation

git clone https://github.com/MouYongli/RadVLM.git
cd RadVLM

Anaconda

Because the DeepSeek-VL2 and Qwen2.5-VL require different versions of dependencies, we need to install them in separate conda environments.

export PROJECT_ROOT=$(pwd)

DeepSeek-VL2

cd $PROJECT_ROOT/baselines
mkdir deepseek
# Clone the DeepSeek-VL2 repository
git clone https://github.com/deepseek-ai/DeepSeek-VL2.git
mv DeepSeek-VL2/* deepseek
rm -rf DeepSeek-VL2
# Update requirements.txt in DeepSeek-VL2 folder
cp requirements.deepseek.txt deepseek/requirements.txt
#  Update pyproject.toml in DeepSeek-VL2 folder
cp pyproject.deepseek.toml deepseek/pyproject.toml
# Install dependencies and install the deepseek-vl2 package
cd deepseek
# Create a new conda environment for DeepSeek-VL2
conda create --name deepseekenv python=3.10
conda activate deepseekenv
pip install -r requirements.txt
pip install -e .
# Install PyTorch with CUDA 12.6
# For other version, please refer to https://pytorch.org/get-started/locally, for example:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install torch torchvision torchaudio

Qwen2.5-VL

cd $PROJECT_ROOT/baselines
mkdir qwen
conda create --name qwenenv python=3.10
conda activate qwenenv
cp requirements.qwen.txt qwen/requirements.txt
cd qwen
pip install -r requirements.txt

Our project and dependencies

cd $PROJECT_ROOT
conda activate deepseekenv
pip install -e .
conda activate qwenenv
pip install -e .

Datasets

Download datasets

MIMIC-CXR: https://physionet.org/content/mimic-cxr/2.0.0/
ChestExpert

Usage

Here's an example of how to use the model:

from radvlm.models.modeling_radvlm import RadVLM
model = RadVLM.load_pretrained("base")

Project Structure

📦 RadVLM
├── 📁 data         # Sample datasets and preprocessing scripts
├── 📁 models           # Pre-trained models and checkpoints
├── 📁 notebooks        # Jupyter notebooks with tutorials
├── 📁 docs             # Documentation and API references
├── 📁 experiments      # Experimental configurations, logs and results
├── 📁 src              # Core implementation of foundation models
└── README.md           # Project description

Benchmark Results

Model	accuracy
Baseline	xx
Ours	xx
More benchmarks are available in the research paper.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this project in your research, please cite:

@article{mou2025radvlm,
  author  = {Yongli Mou, Antonia Gustke and Stefan Decker},
  title   = {XXX},
  journal = {XXX},
  year    = {202X}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RadVLM: Vision Language Models for Radiology Report Generation - A Reasoning and Knowledge Graph Retrieval Augmented Generation Approach

1. Overview

2. Installation

Anaconda

Datasets

Download datasets

Usage

Project Structure

Benchmark Results

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
baselines		baselines
data		data
experiments		experiments
models		models
notebooks		notebooks
scripts		scripts
src/radvlm		src/radvlm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

RadVLM: Vision Language Models for Radiology Report Generation - A Reasoning and Knowledge Graph Retrieval Augmented Generation Approach

1. Overview

2. Installation

Anaconda

Datasets

Download datasets

Usage

Project Structure

Benchmark Results

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages