Synthetic medical VQA dataset generation and VLM fine-tuning pipeline.
119K medical images annotated by two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, producing 110K training records. Fine-tuning 3 small models (2-3B params) improves all benchmarks — best model reaches +15.0% average exact match.
Blog post: SynthVision: Building a 110K Synthetic Medical VQA Dataset
# Requires Python 3.11+
uv syncAggregate and deduplicate images from 4 open medical datasets (ROCO, MultiCaRe, PathVQA, VQA-RAD):
uv run python scripts/build_seeds.py \
--config configs/datasets.yaml \
--output data/seedsSend seed images to a VLM API for multi-turn clinical annotation. The annotation pipeline supports batch inference via Doubleword:
uv run python -c "
from openmed.annotation.pipeline import AnnotationPipeline
pipeline = AnnotationPipeline('configs/annotation.yaml')
pipeline.run(seeds_dir='data/seeds', annotated_dir='data/annotated')
"Merge validated annotations, deduplicate, and convert to ShareGPT JSONL:
uv run python scripts/prepare_training.py --output data/trainingLoRA fine-tuning with assistant-only label masking. Each model family has its own config:
# Qwen2.5-VL-3B
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
scripts/finetune.py --model qwen --data data/training --config configs/qwen2vl_v6.yaml
# Qwen3.5-2B (best model)
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
scripts/finetune.py --model qwen35 --data data/training --config configs/qwen35_d.yaml
# Ministral-3B
uv run accelerate launch --num_processes=4 --mixed_precision=bf16 \
scripts/finetune.py --model ministral --data data/training --config configs/ministral_d.yamlBenchmark on VQA-RAD, PathVQA, and SLAKE using vLLM batched inference:
# Base model
uv run python scripts/evaluate.py \
--model base --model-id Qwen/Qwen2.5-VL-3B-Instruct \
--benchmarks all --output data/eval/base.json
# Fine-tuned model
uv run python scripts/evaluate.py \
--model OpenMed/Qwen2.5-3B-MedVL \
--benchmarks all --output data/eval/finetuned.json| Model | VQA-RAD | PathVQA | SLAKE | Avg EM | vs Base |
|---|---|---|---|---|---|
| Qwen3.5-2B (D) | 0.5521 | 0.4748 | 0.6880 | 0.5716 | +15.0% |
| Qwen2.5-VL-3B (v6) | 0.5211 | 0.3468 | 0.6032 | 0.4904 | +8.9% |
| Ministral-3B (D) | 0.4789 | 0.3669 | 0.5664 | 0.4707 | +9.6% |
| Asset | Link |
|---|---|
| Seed images | OpenMed/synthvision-seeds |
| Qwen 3.5 annotations | OpenMed/synthvision-annotated-qwen |
| Kimi K2.5 annotations | OpenMed/synthvision-annotated-kimi |
| Qwen validated by Kimi | OpenMed/synthvision-validated-qwen-by-kimi |
| Kimi validated by Qwen | OpenMed/synthvision-validated-kimi-by-qwen |
| Training data | OpenMed/synthvision-training |
| Qwen2.5-3B-MedVL | OpenMed/Qwen2.5-3B-MedVL |
| Qwen3.5-2B-MedVL | OpenMed/Qwen3.5-2B-MedVL |
| Ministral-3B-MedVL | OpenMed/Ministral-3B-MedVL |
Apache-2.0
