Skip to content

Latest commit

 

History

History
107 lines (88 loc) · 6.62 KB

File metadata and controls

107 lines (88 loc) · 6.62 KB
sidebar-title Profile with MMVU Dataset

Profile with MMVU Dataset

AIPerf supports benchmarking using the MMVU dataset, an expert-level video understanding benchmark that tests multi-discipline reasoning over video content. Each sample contains a video URL and a question (multiple-choice or open-ended) that requires watching the video to answer.

This guide covers profiling OpenAI-compatible video language models using the MMVU public dataset.


Start a vLLM Server

Launch a vLLM server with a video-capable vision language model:

docker pull vllm/vllm-openai:latest
docker run --gpus all -p 8000:8000 -e HF_TOKEN vllm/vllm-openai:latest \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --enforce-eager

Verify the server is ready:

timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen2-VL-2B-Instruct\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; }

Profile with MMVU Dataset

AIPerf loads the MMVU dataset from HuggingFace, combines each question with its multiple-choice options, attaches the video URL, and sends each pair as a single-turn video request. The prompt format matches vLLM's own MMVU benchmark format.

aiperf profile \
    --model Qwen/Qwen2-VL-2B-Instruct \
    --endpoint-type chat \
    --streaming \
    --url localhost:8000 \
    --public-dataset mmvu \
    --request-count 5 \
    --concurrency 2 \
    --output-tokens-mean 128

Sample Output (Successful Run):

                                     NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃           Metric ┃        avg ┃        min ┃        max ┃        p99 ┃        p90 ┃        p50 ┃        std ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│    Time to First │ 236,267.00 │   2,967.98 │ 535,809.00 │ 528,246.99 │ 460,180.00 │ 292,846.13 │ 206,874.00 │
│       Token (ms) │            │            │            │            │            │            │            │
│   Time to Second │ 157,173.00 │     113.08 │ 473,750.00 │ 467,270.27 │ 408,951.00 │     127.74 │ 199,053.00 │
│       Token (ms) │            │            │            │            │            │            │            │
│  Request Latency │ 476,346.00 │ 297,204.97 │ 841,020.00 │ 829,081.39 │ 721,631.00 │ 350,652.38 │ 200,572.00 │
│             (ms) │            │            │            │            │            │            │            │
│      Inter Token │   3,631.07 │     106.31 │  11,204.46 │  11,020.23 │   9,362.19 │     127.14 │   4,543.17 │
│     Latency (ms) │            │            │            │            │            │            │            │
│     Output Token │       5.19 │       0.09 │       9.41 │       9.37 │       9.01 │       7.87 │       4.17 │
│   Throughput Per │            │            │            │            │            │            │            │
│             User │            │            │            │            │            │            │            │
│     (tokens/sec) │            │            │            │            │            │            │            │
│  Output Sequence │      58.00 │      32.00 │     128.00 │     125.04 │      98.40 │      42.00 │      35.84 │
│  Length (tokens) │            │            │            │            │            │            │            │
│   Input Sequence │      26.00 │       9.00 │      67.00 │      65.72 │      54.20 │      10.00 │      22.79 │
│  Length (tokens) │            │            │            │            │            │            │            │
│     Output Token │       0.24 │        N/A │        N/A │        N/A │        N/A │        N/A │        N/A │
│       Throughput │            │            │            │            │            │            │            │
│     (tokens/sec) │            │            │            │            │            │            │            │
│    Request Count │       5.00 │        N/A │        N/A │        N/A │        N/A │        N/A │        N/A │
│       (requests) │            │            │            │            │            │            │            │
└──────────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘

Note: High TTFT variance (3s min, 536s max) is expected — the model server fetches each video URL from HuggingFace during inference, and fetch time varies with video size and network conditions.


Notes

  • The video column in MMVU contains HTTPS URLs pointing to .mp4 files hosted on HuggingFace. AIPerf passes these URLs directly to the model server, which fetches the video during inference.
  • For multiple-choice questions, choices are appended to the question in the format A.option B.option .... Open-ended questions use the question text only.
  • The dataset has a validation split with samples spanning multiple academic disciplines (Art, Science, Engineering, etc.).