Validate GoldenMatch native kernels on ER + reference entry#8
Draft
Ben Severn (benzsevern) wants to merge 1 commit into
Draft
Validate GoldenMatch native kernels on ER + reference entry#8Ben Severn (benzsevern) wants to merge 1 commit into
Ben Severn (benzsevern) wants to merge 1 commit into
Conversation
Validated goldenmatch._native Rust kernels against the DQBench ER composite: native (GOLDENMATCH_NATIVE=1) and pure-Python (=0) are bit-exact (composite 92.03 both ways, identical per-tier F1/P/R/B3), clearing the >=91.04 and exact-equality gate criteria. Recorded as a reference (ungated) ER entry since the native extension must be built from the goldenmatch repo and can't pass the pip-only CI gate. On DQBench's small tiers the controller picks polars-direct, so the clustering kernel is exercised but block-scoring is not. Generalize the reference-board warning to cover non-buildable (native) entries, not just non-deterministic ones. https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validated GoldenMatch's native Rust acceleration kernels (
goldenmatch._native) against the DQBench ER composite, and recorded the result on the reference (ungated) board.Ran the ER benchmark twice in one env on the same deterministic datasets — pure-Python (
GOLDENMATCH_NATIVE=0) vs native kernels (GOLDENMATCH_NATIVE=1):=0)=1)1.19.0).Coverage caveat (verified)
On DQBench's small tiers the zero-config controller selects the
polars-directbackend, so the native clustering kernel fires heavily (connected_components/cluster_confidence/severe_bridge_count, thousands of calls/tier) but block-scoring (score_block_pairs) never runs (0 calls — bucket backend never selected). Block-scoring parity still rests on unit tests and needs a separate bucket-backend workload.Changes
leaderboard/reference/er.json+leaderboard/submissions/er-goldenmatch-zeroconfig-native.json(gated:false): new reference ER entry "GoldenMatch (zero-config, native kernels)" @ 92.03. Ungated because the native ext must be built from the goldenmatch repo (Rust 1.94.1) — dqbench's pip-only CI gate can't reproduce it. The CIchangedjob skipsgated:falsemanifests, so this does not break the verify gate.docs/validation/2026-05-25-goldenmatch-native-er.md+ per-tier evidence JSONs.dqbench/submission.py: generalized the reference-board warning to cover both reasons an entry is ungated (non-determinism or an unbuildable native extension), since this entry is provably bit-exact, not non-deterministic.CHANGELOG.md: Unreleased note.Test plan
dqbench publish --check— store valid + board in syncpytest— 251 passedruff check .— cleanLeaderboardworkflow green (validate passes; verify matrix skips the ungated manifest)https://claude.ai/code/session_01R3fgyVcYdMBX3sh94NwGH6
Generated by Claude Code