feat(perf): add --runtime [ort|openvino] to compare ORT vs OpenVINO by xieofxie · Pull Request #960 · microsoft/winml-cli

xieofxie · 2026-06-24T08:25:38Z

What

Adds a --runtime [ort|openvino] flag to winml perf so the same ONNX file can be benchmarked on ONNX Runtime vs OpenVINO Runtime for a side-by-side comparison.

winml perf -m model.onnx --runtime openvino --device gpu
winml perf -m model.onnx --runtime ort --ep cpu   # ORT-native baseline

Default is ort — existing behavior is unchanged.
--runtime openvino reads the raw ONNX directly via OpenVINO Runtime (no quantize/optimize/compile build), which is the fair, simple comparison on the same graph. ONNX input only.

How

OpenVINOSession (session/openvino/openvino_session.py) mirrors the subset of WinMLSession the perf engine uses — compile() / run() / perf() plus io_config / device / ep_name / running_model_path. It reuses get_io_config, load_onnx, PerfStats, and WinMLSession._get_precision, so I/O metadata and reports match the ORT path. No model-specific logic.
_OpenVINOModel adapter in perf.py exposes the _single surface the benchmark engine reads, so _run_single / _run_benchmark* / reporting are untouched. PerfBenchmark._load_model() branches to it and skips WinMLAutoModel + ORT EP resolution entirely (OpenVINO is independent of ORT's EPs).
--device maps cpu/gpu/npu/auto → OpenVINO CPU/GPU/NPU/AUTO. compile() fails fast against Core().available_devices with a readable message instead of a raw backend stack trace.
RuntimeName Literal + RUNTIME_NAMES in constants.py (mirrors CompilerName) — the CLI choice list and the typed config field derive from one source.
CLI guards: --runtime openvino requires a .onnx input and rejects --module.

Verified locally

--runtime openvino runs on CPU and GPU end-to-end; latency/throughput populated.
--monitor works on CPU and GPU (HW utilization via PDH; falls back to NullEPMonitor like most EPs — no OV-specific ep_proof telemetry yet).
Absent device (NPU here) → friendly error: OpenVINO device 'NPU' ... is not available. OpenVINO sees: ['CPU', 'GPU'].
New unit tests (tests/unit/session/test_openvino_session.py, gated on importorskip("openvino")) + CLI guard tests; all existing perf tests pass; ruff clean.

Notes / follow-ups

--ep and quant/optimize flags are intentional no-ops under --runtime openvino (raw ONNX) — documented in the flag help.
On machines where the WinML registry installs the OpenVINO EP, --runtime ort --device cpu already routes ORT→OpenVINO EP; use --runtime ort --ep cpu for a true ORT-native baseline.
Possible follow-up: surface EXECUTION_DEVICES in the report so AUTO-mode fallbacks are visible.

Closes #948

🤖 Generated with Claude Code

- Add RuntimeName Literal + RUNTIME_NAMES to constants (mirrors CompilerName), thread it through BenchmarkConfig and the perf CLI instead of bare str. - Fail fast in OpenVINOSession.compile() when the requested device is absent from Core().available_devices, with a readable message instead of a raw backend stack trace. AUTO is exempt; matches plain (GPU) and indexed (GPU.0) device names. - Add a hardware-independent unit test for the unavailable-device path.

…ssing Wrap the openvino import in OpenVINOSession.compile() so an absent package raises a clear install hint (pip install winml-cli[openvino]) instead of a bare ModuleNotFoundError. Add a unit test that simulates the missing module.

DingmaomaoBJTU

Overall the PR is well-structured: the adapter pattern cleanly separates OpenVINO from the ORT pipeline, error handling is thoughtful (lazy import, device pre-check, file existence), and test coverage is solid. Three findings below.

DingmaomaoBJTU · 2026-06-25T02:27:43Z

+                f"(not a HuggingFace model ID), got: {hf_model}"
+            )
+        if module_class:
+            raise click.UsageError("--runtime openvino does not support --module benchmarking.")


Silent discard of --ep / --ep-options when --runtime openvino — no user feedback

When --runtime openvino is combined with --ep cuda (or any EP), the value is forwarded into BenchmarkConfig but silently ignored because _load_model() returns before _resolve_device_ep() runs. A pattern already used in this file (--shape-config warning in --module mode) would work here — emit a yellow console warning so users know the flag had no effect.

DingmaomaoBJTU · 2026-06-25T02:27:43Z

+        return self._io_config
+
+    @property
+    def device(self) -> str:


device property returns input string, not resolved OpenVINO device

After AUTO compilation self._ov_device holds the concrete target (e.g. 'GPU.0'), but this property still returns self._device (e.g. 'auto'). Since the perf engine reads model.device for report labelling, an AUTO-mode run will appear as device='auto' in JSON output rather than the true hardware. Consider return self._ov_device or self._device once compiled.

DingmaomaoBJTU · 2026-06-25T02:27:43Z

+        # output-name normalization mismatches (the order of model.outputs
+        # matches the ONNX graph output order get_io_config reads).
+        out_names = self.io_config["output_names"]
+        return {name: np.asarray(result[i]) for i, name in enumerate(out_names)}


Output index mapping assumes io_config and compiled output count agree

If the two disagree (e.g. optional outputs interpreted differently), the dict comprehension silently truncates or raises an IndexError with no context. A defensive length check before the comprehension would make failures easier to diagnose.

xieofxie · 2026-06-25T08:33:09Z

Pros

Internal check our top200 models on ORT VS Native
- should cert team do this?
From ISV case, they want to know the difference between ORT VS Native

Cons

What if user finds that Native if better than ORT?
Are we measuring correctly?
It will lead to many different implementations for IHVs

xieofxie added 2 commits June 24, 2026 16:04

add ov runtime

b476788

xieofxie requested a review from a team as a code owner June 24, 2026 08:25

DingmaomaoBJTU reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(perf): add --runtime [ort|openvino] to compare ORT vs OpenVINO#960

feat(perf): add --runtime [ort|openvino] to compare ORT vs OpenVINO#960
xieofxie wants to merge 3 commits into
mainfrom
hualxie/run_ov

xieofxie commented Jun 24, 2026

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Uh oh!

xieofxie commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xieofxie commented Jun 24, 2026

What

How

Verified locally

Notes / follow-ups

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

xieofxie commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xieofxie commented Jun 25, 2026 •

edited

Loading