Skip to content

Sapphirine/202512-6-ChestXray-TextGuided-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOLO + LLM-CLIP (PadChest ROI) — Training Pipeline

This repository contains a three-stage training pipeline:

  1. Baseline YOLO training (standard Ultralytics training)
  2. Contrastive pretraining (YOLO image encoder + LLM text encoder, CLIP-style)
  3. LLM-guided fine-tuning (modified Ultralytics trainer)

0. Environment

  • Python 3.8+
  • PyTorch
  • Ultralytics (official + modified version for Stage 3)

Install dependencies (example):

pip install ultralytics torch torchvision torchaudio

1. Baseline Training (Ultralytics Standard)

Train a vanilla YOLO detector using the official Ultralytics pipeline.

Code Example

from ultralytics import YOLO
import torch

model = YOLO("yolo11m.yaml")

train_results = model.train(
    data="/chest.yaml",
    epochs=500,
    imgsz=640,
    device="0",
)

Notes

  • chest.yaml should define train, val, and names.
  • yolo11m.yaml can be replaced with other YOLO configs depending on compute resources.

2. Contrastive Pretraining (YOLO + LLM CLIP)

We perform CLIP-style contrastive pretraining between image features (YOLO backbone) and text features (LLM encoder) using PadChest ROI-level data.

Run Script

python pretraining.py

Example Command

python pretraining.py \
  --csv_path /root/autodl-tmp/dataset/PadChest-GR-yolo-6labels/roi256/roi256_box_sentence.csv \
  --batch_size 16 \
  --epochs 20 \
  --lr 5e-5 \
  --weight_decay 1e-2 \
  --temperature 0.07 \
  --device cuda:0 \
  --textencoder llama2 \
  --llama_rep Llama-2-7b-chat-hf \
  --context True \
  --context_length 8 \
  --n_prompts 2

3. LLM-Guided Fine-tuning (Modified Ultralytics Trainer)

Fine-tune the YOLO detector using LLM-guided features.

Run Script

python train.py

Notes

  • The Ultralytics trainer has been modified to support text-guided modules.
  • Pretrained weights from Stage 2 can be loaded for initialization.

Recommended Workflow

  1. Baseline training
  2. Contrastive pretraining
  3. LLM-guided fine-tuning

Quick Start

python -c "from ultralytics import YOLO; YOLO('yolo11m.yaml').train(data='/chest.yaml', epochs=500, imgsz=640, device='0')"
python pretraining.py --device cuda:0
python train.py

About

Semantic Alignment and LLM-Guided Detection for Chest X-Ray Understanding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages