fix(ci): rule-3 accepts competitors/*.md as competitive-benchmark doc surface by fyodoriv · Pull Request #1052 · fyodoriv/minsky

fyodoriv · 2026-06-02T15:27:15Z

Why this is needed

Every corpus-refresh / competitor task fails the rule-3-doc-first CI gate, which blocks ci (it aggregates rule-3). Root cause: those tasks update both the reading in novel/competitive-benchmark/src/competitors.ts (code) AND the narrative + provenance in competitors/<name>.md (doc), but rule-3 only recognized user-stories/*.md and the package README.md as doc surfaces — not competitors/*.md. The gate is skipped locally (no pr-body), so worker --stage=full passed while CI failed. This silently blocked #1047 (corpus-refresh-openhands) and #1050 (corpus-refresh-swe-agent).

What changed

competitors/<name>.md now satisfies the doc clause for the novel/competitive-benchmark package (those files ARE the human-facing corpus docs).
Scoped to that package only — a competitors/*.md touch does NOT excuse code changes in other packages.
Extracted a packageDocError helper to keep checkRule3DocFirst under the complexity gate.
3 paired tests: corpus-refresh passes, scoping holds (budget-guard still fails), no-doc still fails with a hint that names competitors/*.md.

Verification

pnpm exec vitest run scripts/check-rule-3-doc-first.test.mjs → 22 passed.
biome check --error-on-warnings on both files → exit 0.
Simulated corpus-refresh diff (competitors.ts + competitors/openhands.md) → rule-3 PASS.

Hypothesis self-grade

Predicted: recognizing competitors/*.md as the competitive-benchmark doc surface turns the rule-3 gate green for corpus-refresh PRs, unblocking feat(corpus-refresh-openhands): autonomous delivery of corpus-refresh-openhands #1047/feat(corpus-refresh-swe-agent): autonomous delivery of corpus-refresh-swe-agent #1050 and the whole class, with zero false-negatives elsewhere.
Observed: 22 tests green including the scoping test (budget-guard still fails); the simulated corpus-refresh diff passes; biome clean.
Match: yes
Lesson: a doc-discipline gate must enumerate every legitimate doc surface — competitors/*.md is the corpus's real documentation, and omitting it turned an honest doc update into a phantom violation.

Vision trace

Vision goal: rule chore: align TASKS.md with bootstrap PR plan #3 (doc-first) + rule feat: @minsky/observability v0 — OTEL strategy with three-signal selfTest #10 (deterministic gate ratchet) — the gate should pass exactly when a real doc accompanies code, no more, no less.
User story: as a minsky operator running corpus refreshes, my PR passes rule-3 because I updated the competitor's doc, without a phantom user-story requirement.
Competitor prior art: N/A — internal CI-gate correctness fix.

Security & privacy

No new attack surface. Pure-function change to a lint over the PR diff; reads no new files, writes nothing, binds no ports. vision.md § 13 minimum-bar reviewed.

🤖 Written by an agent, not Fyodor. Ping me if this looks off.

… surface corpus-refresh + competitor tasks update both the reading in novel/competitive-benchmark/src/competitors.ts (code) AND the narrative in competitors/<name>.md (doc), but rule-3-doc-first only recognized user-stories/*.md and the package README — so every corpus refresh failed the gate in CI (it's skipped locally without a pr-body, so worker --stage=full passed). Teach rule-3 that competitors/*.md IS the doc surface for the competitive-benchmark package. Extracted packageDocError helper to keep complexity ≤10; added 3 paired tests (corpus-refresh passes, scoping holds, no-doc still fails). Unblocks #1047/#1050 and the whole corpus-refresh class.

…fresh-openhands

…fresh-swe-agent

…-swe-agent (#1050) * chore: refresh swe-agent SWE-bench Verified reading to mini-swe-agent 0.74 (corpus-refresh-swe-agent) Replace the stale 2024 NeurIPS SWE-agent + GPT-4 reading (0.125, full-split proxy, asOf 2024-10-01, 604 days very-stale) with the SWE-agent project's current flagship scaffold mini-swe-agent: Gemini 3 Pro at 0.74 on the SWE-bench Verified 500-instance split, submitted 2026-02-26 to the official swebench.com "Bash Only" leaderboard (primary statement at mini-swe-agent.com). This is a true Verified-split number, so the prior full-split/Lite proxy caveat is dropped. asOf 2026-02-26 is 96 days old -> freshness status moves very-stale -> stale (3 days outside the 90-day "fresh" bucket). Per rule #9 and the task Pivot, the most authoritative primary-source date for the project's own scaffold is used rather than a fabricated fresher date. - competitors.ts: refreshed citation, asOf, value for the swe-agent entry - competitors/swe-agent.md: Scorecard readings table + superseded-reading history note + Last reviewed date - competitors/scorecard.md: updated the two swe-agent rows in the (now hand-maintained) static snapshot to match Hypothesis self-grade: Predicted: refreshing swe-agent to a publication <=90 days old returns "fresh". Observed: freshest cleanly-attributable primary source is 2026-02-26 (96d) -> "stale", not "fresh"; value 0.125 -> 0.74. Match: partial (very-stale -> stale, not fresh; the 90-day bar is missed by 3 days; no honest <=90d project-scaffold Verified source exists). Lesson: the SWE-agent project's last dated Verified-split scaffold submission is 2026-02-26; a strict 90-day fresh bar can be unmeetable without fabricating a date, which the Pivot forbids. * chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-refresh-swe-agent

…-openhands (#1047) * chore: refresh openhands corpus reading to 0.728 SWE-bench Verified (corpus-refresh-openhands) Refresh the openhands competitor reading from the 406-day-stale 2025-04-15 65.8% (0.658) inference-time-scaling number to the current first-party 72.8% (0.728), citing the OpenHands Software Agent SDK paper (arXiv:2511.03690v2, 2026-04-22, Table 4 section 5.4 - Claude Sonnet 4.5 + extended thinking on the V1 SDK). The v2 revision (41 days old) flips the corpus-freshness bucket from "very-stale" to "fresh". Hypothesis self-grade: Predicted: check-corpus-freshness returns "fresh" (<=90d) for openhands. Observed: status="fresh", ageDays=41 (asOf 2026-04-22). Match: yes Lesson: the vendor's newest exact-number publication (SDK paper v2) supersedes the Apr-2025 reading; refreshing to the real higher number is honest, not masking - Pivot's "stale-by-vendor" clause does not apply when the vendor actively publishes. * chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-refresh-openhands

fyodoriv merged commit 3dae1a1 into main Jun 2, 2026
93 checks passed

fyodoriv deleted the fix/rule-3-accept-competitor-md branch June 2, 2026 15:33

fyodoriv added a commit that referenced this pull request Jun 2, 2026

chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-re…

06f741c

…fresh-openhands

fyodoriv added a commit that referenced this pull request Jun 2, 2026

chore(ci): re-trigger rule-3 against fixed main (#1052) for corpus-re…

e9cba03

…fresh-swe-agent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): rule-3 accepts competitors/*.md as competitive-benchmark doc surface#1052

fix(ci): rule-3 accepts competitors/*.md as competitive-benchmark doc surface#1052
fyodoriv merged 1 commit into
mainfrom
fix/rule-3-accept-competitor-md

fyodoriv commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fyodoriv commented Jun 2, 2026

Why this is needed

What changed

Verification

Hypothesis self-grade

Vision trace

Security & privacy

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant