feat(perf): add --runtime [ort|openvino] to compare ORT vs OpenVINO#960
feat(perf): add --runtime [ort|openvino] to compare ORT vs OpenVINO#960xieofxie wants to merge 3 commits into
Conversation
- Add RuntimeName Literal + RUNTIME_NAMES to constants (mirrors CompilerName), thread it through BenchmarkConfig and the perf CLI instead of bare str. - Fail fast in OpenVINOSession.compile() when the requested device is absent from Core().available_devices, with a readable message instead of a raw backend stack trace. AUTO is exempt; matches plain (GPU) and indexed (GPU.0) device names. - Add a hardware-independent unit test for the unavailable-device path.
…ssing Wrap the openvino import in OpenVINOSession.compile() so an absent package raises a clear install hint (pip install winml-cli[openvino]) instead of a bare ModuleNotFoundError. Add a unit test that simulates the missing module.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall the PR is well-structured: the adapter pattern cleanly separates OpenVINO from the ORT pipeline, error handling is thoughtful (lazy import, device pre-check, file existence), and test coverage is solid. Three findings below.
| f"(not a HuggingFace model ID), got: {hf_model}" | ||
| ) | ||
| if module_class: | ||
| raise click.UsageError("--runtime openvino does not support --module benchmarking.") |
There was a problem hiding this comment.
Silent discard of --ep / --ep-options when --runtime openvino — no user feedback
When --runtime openvino is combined with --ep cuda (or any EP), the value is forwarded into BenchmarkConfig but silently ignored because _load_model() returns before _resolve_device_ep() runs. A pattern already used in this file (--shape-config warning in --module mode) would work here — emit a yellow console warning so users know the flag had no effect.
| return self._io_config | ||
|
|
||
| @property | ||
| def device(self) -> str: |
There was a problem hiding this comment.
device property returns input string, not resolved OpenVINO device
After AUTO compilation self._ov_device holds the concrete target (e.g. 'GPU.0'), but this property still returns self._device (e.g. 'auto'). Since the perf engine reads model.device for report labelling, an AUTO-mode run will appear as device='auto' in JSON output rather than the true hardware. Consider return self._ov_device or self._device once compiled.
| # output-name normalization mismatches (the order of model.outputs | ||
| # matches the ONNX graph output order get_io_config reads). | ||
| out_names = self.io_config["output_names"] | ||
| return {name: np.asarray(result[i]) for i, name in enumerate(out_names)} |
There was a problem hiding this comment.
Output index mapping assumes io_config and compiled output count agree
If the two disagree (e.g. optional outputs interpreted differently), the dict comprehension silently truncates or raises an IndexError with no context. A defensive length check before the comprehension would make failures easier to diagnose.
|
Pros
Cons
|
What
Adds a
--runtime [ort|openvino]flag towinml perfso the same ONNX file can be benchmarked on ONNX Runtime vs OpenVINO Runtime for a side-by-side comparison.winml perf -m model.onnx --runtime openvino --device gpu winml perf -m model.onnx --runtime ort --ep cpu # ORT-native baselineort— existing behavior is unchanged.--runtime openvinoreads the raw ONNX directly via OpenVINO Runtime (no quantize/optimize/compile build), which is the fair, simple comparison on the same graph. ONNX input only.How
OpenVINOSession(session/openvino/openvino_session.py) mirrors the subset ofWinMLSessionthe perf engine uses —compile()/run()/perf()plusio_config/device/ep_name/running_model_path. It reusesget_io_config,load_onnx,PerfStats, andWinMLSession._get_precision, so I/O metadata and reports match the ORT path. No model-specific logic._OpenVINOModeladapter inperf.pyexposes the_singlesurface the benchmark engine reads, so_run_single/_run_benchmark*/ reporting are untouched.PerfBenchmark._load_model()branches to it and skipsWinMLAutoModel+ ORT EP resolution entirely (OpenVINO is independent of ORT's EPs).--devicemapscpu/gpu/npu/auto→ OpenVINOCPU/GPU/NPU/AUTO.compile()fails fast againstCore().available_deviceswith a readable message instead of a raw backend stack trace.RuntimeNameLiteral +RUNTIME_NAMESinconstants.py(mirrorsCompilerName) — the CLI choice list and the typed config field derive from one source.--runtime openvinorequires a.onnxinput and rejects--module.Verified locally
--runtime openvinoruns on CPU and GPU end-to-end; latency/throughput populated.--monitorworks on CPU and GPU (HW utilization via PDH; falls back toNullEPMonitorlike most EPs — no OV-specificep_prooftelemetry yet).OpenVINO device 'NPU' ... is not available. OpenVINO sees: ['CPU', 'GPU'].tests/unit/session/test_openvino_session.py, gated onimportorskip("openvino")) + CLI guard tests; all existing perf tests pass; ruff clean.Notes / follow-ups
--epand quant/optimize flags are intentional no-ops under--runtime openvino(raw ONNX) — documented in the flag help.--runtime ort --device cpualready routes ORT→OpenVINO EP; use--runtime ort --ep cpufor a true ORT-native baseline.EXECUTION_DEVICESin the report so AUTO-mode fallbacks are visible.Closes #948
🤖 Generated with Claude Code