You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The team can either show a stable reproduced explanation for the remaining variance or a measurable improvement in the smallest relevant benchmark category touched by the fix.
Performance follow-up should isolate the remaining latency variance and backend-dominant cost without blaming the wrong subsystem.
Source report: #22
Problem
Prewarmed latency remains noisy enough to obscure regression interpretation, while benchmark evidence still shows backend generation dominating wall time.
Evidence
clone_regression.jsonclassified the slowdown source asoverlay_refactorcurrent_runtime_current_helper_total_s = 30.988current_runtime_old_helper_total_s = 26.3212perf_results.jsonidentifiedtotal_backend,generation, andcollect_generationas the top bottleneckspro_custom_prewarmed_samples.jsonrecorded wall-time CV0.6238Current ownership
scripts/harness_lib/bench_runner.pySources/QwenVoiceNativeRuntime/NativeMLXMacEngine.swiftSources/QwenVoiceNative/XPCNativeEngineClient.swiftAcceptance
The team can either show a stable reproduced explanation for the remaining variance or a measurable improvement in the smallest relevant benchmark category touched by the fix.
Focused rerun
python3 scripts/harness.py validatepython3 scripts/harness.py bench --category latency --runs 3 --output-dir <dir>python3 scripts/harness.py bench --category clone_regression --runs 3 --output-dir <dir>only if helper-overlay work is touched