fix(ci): rule-3 accepts competitors/*.md as competitive-benchmark doc surface#1052
Merged
Conversation
… surface corpus-refresh + competitor tasks update both the reading in novel/competitive-benchmark/src/competitors.ts (code) AND the narrative in competitors/<name>.md (doc), but rule-3-doc-first only recognized user-stories/*.md and the package README — so every corpus refresh failed the gate in CI (it's skipped locally without a pr-body, so worker --stage=full passed). Teach rule-3 that competitors/*.md IS the doc surface for the competitive-benchmark package. Extracted packageDocError helper to keep complexity ≤10; added 3 paired tests (corpus-refresh passes, scoping holds, no-doc still fails). Unblocks #1047/#1050 and the whole corpus-refresh class.
fyodoriv
added a commit
that referenced
this pull request
Jun 2, 2026
fyodoriv
added a commit
that referenced
this pull request
Jun 2, 2026
fyodoriv
added a commit
that referenced
this pull request
Jun 2, 2026
…-swe-agent (#1050) * chore: refresh swe-agent SWE-bench Verified reading to mini-swe-agent 0.74 (corpus-refresh-swe-agent) Replace the stale 2024 NeurIPS SWE-agent + GPT-4 reading (0.125, full-split proxy, asOf 2024-10-01, 604 days very-stale) with the SWE-agent project's current flagship scaffold mini-swe-agent: Gemini 3 Pro at 0.74 on the SWE-bench Verified 500-instance split, submitted 2026-02-26 to the official swebench.com "Bash Only" leaderboard (primary statement at mini-swe-agent.com). This is a true Verified-split number, so the prior full-split/Lite proxy caveat is dropped. asOf 2026-02-26 is 96 days old -> freshness status moves very-stale -> stale (3 days outside the 90-day "fresh" bucket). Per rule #9 and the task Pivot, the most authoritative primary-source date for the project's own scaffold is used rather than a fabricated fresher date. - competitors.ts: refreshed citation, asOf, value for the swe-agent entry - competitors/swe-agent.md: Scorecard readings table + superseded-reading history note + Last reviewed date - competitors/scorecard.md: updated the two swe-agent rows in the (now hand-maintained) static snapshot to match Hypothesis self-grade: Predicted: refreshing swe-agent to a publication <=90 days old returns "fresh". Observed: freshest cleanly-attributable primary source is 2026-02-26 (96d) -> "stale", not "fresh"; value 0.125 -> 0.74. Match: partial (very-stale -> stale, not fresh; the 90-day bar is missed by 3 days; no honest <=90d project-scaffold Verified source exists). Lesson: the SWE-agent project's last dated Verified-split scaffold submission is 2026-02-26; a strict 90-day fresh bar can be unmeetable without fabricating a date, which the Pivot forbids. * chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-refresh-swe-agent
fyodoriv
added a commit
that referenced
this pull request
Jun 2, 2026
…-openhands (#1047) * chore: refresh openhands corpus reading to 0.728 SWE-bench Verified (corpus-refresh-openhands) Refresh the openhands competitor reading from the 406-day-stale 2025-04-15 65.8% (0.658) inference-time-scaling number to the current first-party 72.8% (0.728), citing the OpenHands Software Agent SDK paper (arXiv:2511.03690v2, 2026-04-22, Table 4 section 5.4 - Claude Sonnet 4.5 + extended thinking on the V1 SDK). The v2 revision (41 days old) flips the corpus-freshness bucket from "very-stale" to "fresh". Hypothesis self-grade: Predicted: check-corpus-freshness returns "fresh" (<=90d) for openhands. Observed: status="fresh", ageDays=41 (asOf 2026-04-22). Match: yes Lesson: the vendor's newest exact-number publication (SDK paper v2) supersedes the Apr-2025 reading; refreshing to the real higher number is honest, not masking - Pivot's "stale-by-vendor" clause does not apply when the vendor actively publishes. * chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-refresh-openhands
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this is needed
Every corpus-refresh / competitor task fails the
rule-3-doc-firstCI gate, which blocksci(it aggregates rule-3). Root cause: those tasks update both the reading innovel/competitive-benchmark/src/competitors.ts(code) AND the narrative + provenance incompetitors/<name>.md(doc), but rule-3 only recognizeduser-stories/*.mdand the packageREADME.mdas doc surfaces — notcompetitors/*.md. The gate is skipped locally (no pr-body), so worker--stage=fullpassed while CI failed. This silently blocked #1047 (corpus-refresh-openhands) and #1050 (corpus-refresh-swe-agent).What changed
competitors/<name>.mdnow satisfies the doc clause for thenovel/competitive-benchmarkpackage (those files ARE the human-facing corpus docs).competitors/*.mdtouch does NOT excuse code changes in other packages.packageDocErrorhelper to keepcheckRule3DocFirstunder the complexity gate.competitors/*.md.Verification
pnpm exec vitest run scripts/check-rule-3-doc-first.test.mjs→ 22 passed.biome check --error-on-warningson both files → exit 0.competitors.ts+competitors/openhands.md) → rule-3 PASS.Hypothesis self-grade
Vision trace
Security & privacy
No new attack surface. Pure-function change to a lint over the PR diff; reads no new files, writes nothing, binds no ports. vision.md § 13 minimum-bar reviewed.
🤖 Written by an agent, not Fyodor. Ping me if this looks off.