perf: investigate prewarmed latency variance and backend-dominant generation cost

Performance follow-up should isolate the remaining latency variance and backend-dominant cost without blaming the wrong subsystem.

Source report: #22

## Problem
Prewarmed latency remains noisy enough to obscure regression interpretation, while benchmark evidence still shows backend generation dominating wall time.

## Evidence
- April 18, 2026 validation report: #22
- `clone_regression.json` classified the slowdown source as `overlay_refactor`
- `current_runtime_current_helper_total_s = 30.988`
- `current_runtime_old_helper_total_s = 26.3212`
- `perf_results.json` identified `total_backend`, `generation`, and `collect_generation` as the top bottlenecks
- `pro_custom_prewarmed_samples.json` recorded wall-time CV `0.6238`

## Current ownership
- `scripts/harness_lib/bench_runner.py`
- `Sources/QwenVoiceNativeRuntime/NativeMLXMacEngine.swift`
- `Sources/QwenVoiceNative/XPCNativeEngineClient.swift`

## Acceptance
The team can either show a stable reproduced explanation for the remaining variance or a measurable improvement in the smallest relevant benchmark category touched by the fix.

## Focused rerun
- `python3 scripts/harness.py validate`
- `python3 scripts/harness.py bench --category latency --runs 3 --output-dir <dir>`
- `python3 scripts/harness.py bench --category clone_regression --runs 3 --output-dir <dir>` only if helper-overlay work is touched


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: investigate prewarmed latency variance and backend-dominant generation cost #26

Problem

Evidence

Current ownership

Acceptance

Focused rerun

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: investigate prewarmed latency variance and backend-dominant generation cost #26

Description

Problem

Evidence

Current ownership

Acceptance

Focused rerun

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions