Skip to content

[Platform] CI hygiene: Makefile, pre-commit, feedback bot, Definition of Done#12

Merged
mariagorskikh merged 1 commit into
mainfrom
platform/ci-hygiene
May 26, 2026
Merged

[Platform] CI hygiene: Makefile, pre-commit, feedback bot, Definition of Done#12
mariagorskikh merged 1 commit into
mainfrom
platform/ci-hygiene

Conversation

@mariagorskikh

Copy link
Copy Markdown
Collaborator

What this PR does

Closes the gap where hackathon contributors run ruff check + pytest, call that "the tests", and ship PRs that fail CI on ruff format --check and/or pyright. Five-file change, all additive — no participant code touched, no behavior change to the existing CI workflow.

Files

  1. Makefile (new)

    • make ci-local runs the exact CI command sequence in order: uv sync && uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v. One target, hard-fails on the first red command (Make's default behavior; no set -e gymnastics needed because each command is its own recipe line).
    • make hooks installs pre-commit hooks via uv run --with pre-commit pre-commit install (no separate global install required).
    • make help is the default goal — running bare make prints the menu.
  2. .pre-commit-config.yaml (new)

    • astral-sh/ruff-pre-commit@v0.15.14: ruff (with --fix) and ruff-format. Versions pinned to what uv sync resolves on main today.
    • RobertCraigie/pyright-python@v1.1.409: invoked with pass_filenames: false so it runs the whole workspace (strict-mode type errors cross file boundaries; per-file is unreliable). Strict config picked up from [tool.pyright] in pyproject.toml.
    • default_stages: [pre-commit] — hooks only fire on local commits. Auto-fix happens locally; CI never runs pre-commit (it runs the real ruff/pyright/pytest commands directly via ci.yml), so there is no path by which this config can mutate files in CI.
  3. CONTRIBUTING.md — new "Definition of Done" section pinned to the very top, above all existing content. Lists the five exact commands every contributor MUST run before pushing in a single code block, with a one-line "why" for each (especially calling out the ruff format --check and pyright traps that bit 5/10 hackathon agents).

  4. .github/workflows/ci-feedback.yml (new)

    • Triggers on workflow_run: completed of the existing CI workflow, gated to conclusion == 'failure' AND event == 'pull_request'.
    • Downloads the failing run's log zip via gh api .../actions/runs/{id}/logs, unzips, and runs an embedded Python extractor that produces ≤40-line excerpts per failing check (ruff format diff snippet, pyright error list, pytest failure summary).
    • Idempotent: paginates issues/{pr}/comments, finds any comment containing the stable marker <!-- ci-feedback-bot -->, and PATCHes it in place if found; otherwise POSTs a new one. The bot never spams — at most one comment per PR, updated on every subsequent failure.
    • Permissions scoped exactly as required: pull-requests: write, actions: read, contents: read. No write to contents, no checks API, no secrets beyond GITHUB_TOKEN.
    • Comment body always ends with a link to CONTRIBUTING.md#definition-of-done plus the make ci-local one-liner.
  5. README.md — small "Before you push" callout below the hello-world block linking to CONTRIBUTING.md#definition-of-done.

Design decisions

  • Single Make target for the full sequence, not five sub-targets. The whole point is that contributors today cherry-pick the steps that match their mental model of "tests". A single target removes that footgun. There is intentionally no make lint / make test shortcut that could be re-introduced.
  • Pre-commit pyright runs on the whole workspace, not staged files. Strict-mode errors propagate across files; per-file checking gives false greens. Slightly slower commits, but matches what CI sees.
  • Feedback bot edits a single comment per PR keyed off an HTML marker, not actor == 'github-actions[bot]'. The marker is stable across renames and survives someone hand-editing the body. It is also unambiguous in body | contains(...) jq filters.
  • The extractor is a single embedded Python heredoc, not a separate script file. Keeps the workflow self-contained — no risk of a participant's PR breaking the bot by modifying a shared script.
  • CI sequence in Makefile matches ci.yml line-for-line. When ci.yml changes, both files MUST be updated together; this is intentional friction so the local and remote sequences cannot drift silently.

How to test the feedback bot

Once this PR is merged, open a follow-up test PR that deliberately violates one (or all) of the three categories — for example:

git checkout -b test/ci-feedback-bot
# Introduce a format-only violation:
python -c "open('packages/nest-core/nest_core/__init__.py','a').write('\n\n\n\n  badly_formatted   = 1\n')"
git commit -am "test: trigger ci-feedback bot with a format violation"
git push -u origin test/ci-feedback-bot
# Open PR; CI will fail on `ruff format --check`; ci-feedback workflow should
# post one comment with the format diff excerpt and the reproduce command.
# Then add a pyright error in a second commit and confirm the SAME comment
# is edited in place (no second comment appears).

Expected behavior:

  • One PR comment appears within ~30s of CI going red.
  • Comment contains the ruff-format diff snippet, the reproduction command, and a link to CONTRIBUTING.md#definition-of-done.
  • On a follow-up failing push, the comment is edited in place — no new comment is added.
  • On a green push, no new comment is added and the stale failure comment is left as-is (out of scope to delete; it remains as a historical record).

Verification

Ran make ci-local on this branch before pushing. All five steps exit 0:

>>> [1/5] uv sync                       # ok
>>> [2/5] uv run ruff check .           # All checks passed!
>>> [3/5] uv run ruff format --check .  # 94 files already formatted
>>> [4/5] uv run pyright                # 0 errors, 0 warnings, 0 informations
>>> [5/5] uv run pytest -v              # 259 passed, 1 warning in 13.97s
ci-local: all 5 checks passed. Safe to push.

Practicing what we preach.


Generated by Claude Code

…n of Done

Closes the gap where contributors run "ruff check" + pytest, call that "the
tests", and ship PRs that fail CI on ruff format --check and/or pyright.

- Makefile: `ci-local` runs the exact CI sequence (uv sync, ruff check,
  ruff format --check, pyright, pytest -v) and hard-fails on the first red
  command. `hooks` installs pre-commit. `help` is the default goal.
- .pre-commit-config.yaml: ruff-check + ruff-format (auto-fix locally) and
  pyright in strict mode (versions pinned to what `uv sync` resolves).
- CONTRIBUTING.md: Definition of Done section at the very top with the five
  required pre-push commands and a one-line rationale for each.
- README.md: "Before you push" callout pointing at `make ci-local` and the
  Definition of Done.
- .github/workflows/ci-feedback.yml: triggers on the existing CI workflow's
  failure, downloads logs, extracts per-check excerpts (ruff format diff,
  pyright errors, pytest summary; ~40 lines each), and posts/edits a single
  PR comment keyed off the marker `<!-- ci-feedback-bot -->`. Permissions
  scoped to pull-requests:write, actions:read, contents:read.

Verified locally: `make ci-local` exits 0 on this branch (5/5 green).

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @mariagorskikh, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

mariagorskikh added a commit that referenced this pull request May 26, 2026
Integration of 5 platform tracks built in parallel by specialist agents:

- platform/ci-hygiene (PR #12): Makefile + pre-commit + idempotent CI feedback bot + CONTRIBUTING Definition of Done
- platform/open-problems (PR #13): 10 differentiated open problems across 10 layers, charter, judging doc
- platform/judge-panel (PR #14): rubric, anthropic + openai providers, run_all CLI, real-diff fixture, live gpt-5.5 scoreboard for PRs #2-#11
- platform/research-harness (PR #15): conditions matrix, claude-CLI live runner, collect + analyze, dry-run fixtures + tests
- platform/marketplace-ui (PR #16): /hackathon Next.js section with author tags, judge scores, layer browser; Python data adapter

Schema reconciled end-to-end (rubric -> scores.json -> adapter -> TS types -> UI) on the 6-dim 1-5 scale with totals in [6, 30].

Local CI: 341 passed, 1 skipped (matplotlib gated), 1 deselected (live marker).

Live judge scoreboard top:
  #2  harvard-phd     trust       26.0/30  (EigenTrust + checkable invariants)
  #7  coinbase-crypto payments    26.0/30  (HTLC escrow)
  #6  stanford-ml-phd trust       25.0/30
  #11 google-staff    transport   25.0/30
@mariagorskikh mariagorskikh merged commit 14e59ed into main May 26, 2026
4 checks passed

Copy link
Copy Markdown
Collaborator Author

Superseded by #17 (now merged to main at 1771cdb). Closing — the content of this PR is part of that integration merge.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants