Qwen3/3.5-ONNX: Image-to-Text Inference Model

Overview

This repository provides the ONNX-converted version of the qwen3-vl-2b multimodal model, optimized for efficient image-to-text generation. The model supports inference on single images with a fixed input resolution of 224×224 and outputs descriptive text based on visual content.

Key Features

Model Type: ONNX-exported multimodal large language model (vision-language)
Input Specification: Single RGB image (224×224 resolution, 3 channels)
Output: High-quality natural language description of the input image
Conversion Source: Original qwen3-vl-2b (PyTorch) → ONNX format

Future Features

Qwen3.5-VL ONNX: Qwen3.5 can be converted to ONNX, but its inference speed is slower than PyTorch. This is mainly because the torch_chunk_gated_delta_rule function in Qwen3.5 uses a large number of dynamic slicing operations and loops, resulting in a very large static computation graph in ONNX.

Inference Example

Input

Version: Single RGB image (224×224) of a lemon.
Language: Describe this image.

Output

This image shows a single, yellow, spherical object that appears to be a small, smooth, and rounded lemon. It is placed on a light-colored, possibly white or off-white, surface with a wood grain texture. The lemon has a rounded, slightly flattened top and a smooth surface. The lighting is even, and the object is the central focus of the image.

Next task

Adapt images of different sizes
Comparison of Test Torch and ONNX inference Speed
Convert ONNX to TensorRT to further improve inference speed
Convert more models from Torch to ONNX

Usage

0. Environment Setup

conda create -n onnx python=3.10 -y
conda activate onnx
pip install -r requirements.txt
git clone https://github.com/garlic-byte/Qwen3_VL_Export_ONNX_and_TensorRT.git
cd Qwen3_VL_Export_ONNX_and_TensorRT

1. Download Qwen3-VL

# Download the model
mkdir qwen3-vl-2b
hf download Qwen/Qwen3-VL-2B-Instruct --local-dir=qwen3-vl-2b/

2. Conert Torch to ONNX and Test Inference

python qwen3_vl_export_onnx.py
python inference_onnx.py

3. Conert ONNX to TensorRT and Test Inference

My environment

CUDA 12.8
TensorRT Debian local repo: nv-tensorrt-local-repo-ubuntu2404-10.9.0-cuda-12.8_1.0-1_amd64.deb
Python TensorRT wheel: tensorrt-10.9.0.34

bash build_engine.sh
python inference_trt.py

Model Conversion Notes

The ONNX model is exported from the original PyTorch implementation of qwen3-vl-2b.
Input resolution is fixed at 224×224 (consistent with the model's training configuration).
For optimal performance, use ONNX Runtime with GPU acceleration (install onnxruntime-gpu instead of onnxruntime).
The model retains the original qwen3-vl-2b's visual understanding and text generation capabilities.

Performance Benchmark

Metric	Type	Value
Latency (1000 runs)	Torch (fp32)	44.46 (sec)
	ONNX (fp32)	26.78 (sec)
	ONNX (fp16)	18.13 (sec)
	TensorRT (fp16)	13.77 (sec)
Generation Speed (10 runs, fp16)	Qwen3-vl	19.378385 (tokens/sec) (Tokens generated: 1103)
	ONNX (tokens/sec)	38.667467 (tokens/sec) (Tokens generated: 1062)
	TensorRT (tokens/sec)	66.579019 (tokens/sec) (Tokens generated: 842)

License

The model is licensed under the same license as the original qwen3-vl-2b (see Qwen Official Repository for details).

Acknowledgements

Original qwen3-vl-2b model developed by Alibaba Cloud.
ONNX conversion leverages PyTorch's torch.onnx.export API and ONNX Runtime for inference optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
__pycache__		__pycache__
config		config
demo_data		demo_data
modules		modules
utils		utils
README.md		README.md
build_engine.sh		build_engine.sh
inference_trt.py		inference_trt.py
load_onnx.py		load_onnx.py
qwen35_vl_export_onnx.py		qwen35_vl_export_onnx.py
qwen35_vl_inference_onnx.py		qwen35_vl_inference_onnx.py
qwen3_vl_export_onnx.py		qwen3_vl_export_onnx.py
qwen3_vl_inference_onnx.py		qwen3_vl_inference_onnx.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3/3.5-ONNX: Image-to-Text Inference Model

Overview

Key Features

Future Features

Inference Example

Input

Output

Next task

Usage

0. Environment Setup

1. Download Qwen3-VL

2. Conert Torch to ONNX and Test Inference

3. Conert ONNX to TensorRT and Test Inference

My environment

Model Conversion Notes

Performance Benchmark

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3/3.5-ONNX: Image-to-Text Inference Model

Overview

Key Features

Future Features

Inference Example

Input

Output

Next task

Usage

0. Environment Setup

1. Download Qwen3-VL

2. Conert Torch to ONNX and Test Inference

3. Conert ONNX to TensorRT and Test Inference

My environment

Model Conversion Notes

Performance Benchmark

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages