Skip to content

Validate GoldenMatch native kernels on ER + reference entry#8

Draft
Ben Severn (benzsevern) wants to merge 1 commit into
mainfrom
claude/goldenmatch-dqbench-native-validation-aZGfH
Draft

Validate GoldenMatch native kernels on ER + reference entry#8
Ben Severn (benzsevern) wants to merge 1 commit into
mainfrom
claude/goldenmatch-dqbench-native-validation-aZGfH

Conversation

@benzsevern
Copy link
Copy Markdown
Collaborator

Summary

Validated GoldenMatch's native Rust acceleration kernels (goldenmatch._native) against the DQBench ER composite, and recorded the result on the reference (ungated) board.

Ran the ER benchmark twice in one env on the same deterministic datasets — pure-Python (GOLDENMATCH_NATIVE=0) vs native kernels (GOLDENMATCH_NATIVE=1):

Mode Composite T1 T2 T3 T4 (diag, wt 0)
Python (=0) 92.03 0.8929 0.9836 0.8707 0.9195
Native (=1) 92.03 0.8929 0.9836 0.8707 0.9195
delta 0.00 0 0 0 0
  • ≥ 91.04: PASS (92.03 — higher than the v1.12 ship number because this is goldenmatch 1.19.0).
  • Exact equality: PASS — bit-identical at every tier incl. precision/recall, TP/FP/FN, and B³; only wall-time/memory differ.

Coverage caveat (verified)

On DQBench's small tiers the zero-config controller selects the polars-direct backend, so the native clustering kernel fires heavily (connected_components/cluster_confidence/severe_bridge_count, thousands of calls/tier) but block-scoring (score_block_pairs) never runs (0 calls — bucket backend never selected). Block-scoring parity still rests on unit tests and needs a separate bucket-backend workload.

Changes

  • leaderboard/reference/er.json + leaderboard/submissions/er-goldenmatch-zeroconfig-native.json (gated:false): new reference ER entry "GoldenMatch (zero-config, native kernels)" @ 92.03. Ungated because the native ext must be built from the goldenmatch repo (Rust 1.94.1) — dqbench's pip-only CI gate can't reproduce it. The CI changed job skips gated:false manifests, so this does not break the verify gate.
  • docs/validation/2026-05-25-goldenmatch-native-er.md + per-tier evidence JSONs.
  • dqbench/submission.py: generalized the reference-board warning to cover both reasons an entry is ungated (non-determinism or an unbuildable native extension), since this entry is provably bit-exact, not non-deterministic.
  • CHANGELOG.md: Unreleased note.

Test plan

  • dqbench publish --check — store valid + board in sync
  • pytest — 251 passed
  • ruff check . — clean
  • CI Leaderboard workflow green (validate passes; verify matrix skips the ungated manifest)

https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6


Generated by Claude Code

Validated goldenmatch._native Rust kernels against the DQBench ER composite:
native (GOLDENMATCH_NATIVE=1) and pure-Python (=0) are bit-exact (composite
92.03 both ways, identical per-tier F1/P/R/B3), clearing the >=91.04 and
exact-equality gate criteria. Recorded as a reference (ungated) ER entry since
the native extension must be built from the goldenmatch repo and can't pass the
pip-only CI gate. On DQBench's small tiers the controller picks polars-direct,
so the clustering kernel is exercised but block-scoring is not.

Generalize the reference-board warning to cover non-buildable (native) entries,
not just non-deterministic ones.

https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants