Validate GoldenMatch native kernels on ER + reference entry by benzsevern · Pull Request #8 · benseverndev-oss/dqbench

Ben Severn (benzsevern) · 2026-05-25T20:25:57Z

Summary

Validated GoldenMatch's native Rust acceleration kernels (goldenmatch._native) against the DQBench ER composite, and recorded the result on the reference (ungated) board.

Ran the ER benchmark twice in one env on the same deterministic datasets — pure-Python (GOLDENMATCH_NATIVE=0) vs native kernels (GOLDENMATCH_NATIVE=1):

Mode	Composite	T1	T2	T3	T4 (diag, wt 0)
Python (`=0`)	92.03	0.8929	0.9836	0.8707	0.9195
Native (`=1`)	92.03	0.8929	0.9836	0.8707	0.9195
delta	0.00	0	0	0	0

≥ 91.04: PASS (92.03 — higher than the v1.12 ship number because this is goldenmatch 1.19.0).
Exact equality: PASS — bit-identical at every tier incl. precision/recall, TP/FP/FN, and B³; only wall-time/memory differ.

Coverage caveat (verified)

On DQBench's small tiers the zero-config controller selects the polars-direct backend, so the native clustering kernel fires heavily (connected_components/cluster_confidence/severe_bridge_count, thousands of calls/tier) but block-scoring (score_block_pairs) never runs (0 calls — bucket backend never selected). Block-scoring parity still rests on unit tests and needs a separate bucket-backend workload.

Changes

leaderboard/reference/er.json + leaderboard/submissions/er-goldenmatch-zeroconfig-native.json (gated:false): new reference ER entry "GoldenMatch (zero-config, native kernels)" @ 92.03. Ungated because the native ext must be built from the goldenmatch repo (Rust 1.94.1) — dqbench's pip-only CI gate can't reproduce it. The CI changed job skips gated:false manifests, so this does not break the verify gate.
docs/validation/2026-05-25-goldenmatch-native-er.md + per-tier evidence JSONs.
dqbench/submission.py: generalized the reference-board warning to cover both reasons an entry is ungated (non-determinism or an unbuildable native extension), since this entry is provably bit-exact, not non-deterministic.
CHANGELOG.md: Unreleased note.

Test plan

dqbench publish --check — store valid + board in sync
pytest — 251 passed
ruff check . — clean
CI Leaderboard workflow green (validate passes; verify matrix skips the ungated manifest)

https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6

Generated by Claude Code

Validated goldenmatch._native Rust kernels against the DQBench ER composite: native (GOLDENMATCH_NATIVE=1) and pure-Python (=0) are bit-exact (composite 92.03 both ways, identical per-tier F1/P/R/B3), clearing the >=91.04 and exact-equality gate criteria. Recorded as a reference (ungated) ER entry since the native extension must be built from the goldenmatch repo and can't pass the pip-only CI gate. On DQBench's small tiers the controller picks polars-direct, so the clustering kernel is exercised but block-scoring is not. Generalize the reference-board warning to cover non-buildable (native) entries, not just non-deterministic ones. https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate GoldenMatch native kernels on ER + reference entry#8

Validate GoldenMatch native kernels on ER + reference entry#8
Ben Severn (benzsevern) wants to merge 1 commit into
mainfrom
claude/goldenmatch-dqbench-native-validation-aZGfH

Ben Severn (benzsevern) commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ben Severn (benzsevern) commented May 25, 2026

Summary

Coverage caveat (verified)

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants