[Platform] Open problems + charter + judging doc#13
Merged
Conversation
Ship a participant-facing brief for the month-long NEST hackathon: a charter, 10 differentiated open problems spanning 10 of the 12 layers, and a six-dimension judging rubric. The 10 problems are chosen to avoid the obvious-pick collapse seen in the first round (3x EigenTrust, 4x latency transport): no plain EigenTrust problem, no plain in_memory-transport-latency problem, and every problem cites the specific reference file lines that prove the gap is real.
There was a problem hiding this comment.
Sorry @mariagorskikh, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
The participant-facing judging doc described a 0-10 per-dimension scale with `final = sum/6`, and an "adversarial validator fails → zero on correctness" hard floor. Neither matches what the judge panel (`scripts/judge/judge_pr.py` + `scripts/judge/rubric.md`) actually does. The rubric scores six dimensions each on a 1-5 integer scale; the headline total is the sum (in [6, 30]). The judges read the PR body, diff, and checks summary — they have no mechanism to evaluate an "adversarial validator", so the zero-on-correctness rule was writing a check the code can't cash. - Rewrite to declare `scripts/judge/rubric.md` as the source of truth and explicitly defer to it for anchor descriptions. - Replace the 0-10 / sum-divided-by-six description with the actual 1-5 per-dimension scale and `[6, 30]` sum total, matching what gets written to `docs/hackathon/scores.json` as the `median` field. - Drop the "adversarial validator zero" claim and the cross-runner / seed-bank description, both of which describe machinery that does not exist in the judging pipeline. - Document the real flow: CI gates → N independent LLM judges via `judge_pr.py` → per-dim medians + `median_low(per-judge totals)` → consensus narrative → `scores.json` → marketplace UI.
mariagorskikh
added a commit
that referenced
this pull request
May 26, 2026
Integration of 5 platform tracks built in parallel by specialist agents: - platform/ci-hygiene (PR #12): Makefile + pre-commit + idempotent CI feedback bot + CONTRIBUTING Definition of Done - platform/open-problems (PR #13): 10 differentiated open problems across 10 layers, charter, judging doc - platform/judge-panel (PR #14): rubric, anthropic + openai providers, run_all CLI, real-diff fixture, live gpt-5.5 scoreboard for PRs #2-#11 - platform/research-harness (PR #15): conditions matrix, claude-CLI live runner, collect + analyze, dry-run fixtures + tests - platform/marketplace-ui (PR #16): /hackathon Next.js section with author tags, judge scores, layer browser; Python data adapter Schema reconciled end-to-end (rubric -> scores.json -> adapter -> TS types -> UI) on the 6-dim 1-5 scale with totals in [6, 30]. Local CI: 341 passed, 1 skipped (matplotlib gated), 1 deselected (live marker). Live judge scoreboard top: #2 harvard-phd trust 26.0/30 (EigenTrust + checkable invariants) #7 coinbase-crypto payments 26.0/30 (HTLC escrow) #6 stanford-ml-phd trust 25.0/30 #11 google-staff transport 25.0/30
Collaborator
Author
|
Superseded by #17 (now merged to main at Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Ships the participant-facing scaffolding for the NEST hackathon:
docs/hackathon/charter.md— the brief (what NEST is, what to build, the rules)docs/hackathon/problems/— 10 differentiated open problemsdocs/hackathon/judging.md— six-dimension rubric and scoreboard explanationREADME.md— top-level "Hackathon" section linking to the aboveNo code outside
docs/or the top ofREADME.mdis touched. All five CI checks pass locally:uv sync && uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v.How I chose the problems
packages/nest-plugins-reference/) to find which layers have the weakest defaults — the gaps where a participant can ship something materially better with one weekend of work.Layer coverage
10 of 12 layers covered (skipped trust and transport — both 3x covered in round 1, deliberately excluded to avoid a fourth pile-on).
Difficulty distribution
Matches the requested 4/4/2 split.
Anti-overlap with PRs #2-#11
Explicit exclusions stated in the charter and reiterated in problem files:
CI
Locally on this branch:
uv sync— 0uv run ruff check .— 0 (All checks passed!)uv run ruff format --check .— 0 (94 files already formatted)uv run pyright— 0 (0 errors, 0 warnings, 0 informations)uv run pytest -v— 0 (259 passed, 1 warning, 14.28s)Out of scope
packages/was touched.scripts/judge/rubric.mddoes not exist yet (parallel judge-panel track);judging.mddefers to it when it lands and writes the rubric inline meanwhile.Generated by Claude Code