[Platform] Open problems + charter + judging doc by mariagorskikh · Pull Request #13 · projnanda/nandatown

mariagorskikh · 2026-05-26T20:04:06Z

What this PR does

Ships the participant-facing scaffolding for the NEST hackathon:

docs/hackathon/charter.md — the brief (what NEST is, what to build, the rules)
docs/hackathon/problems/ — 10 differentiated open problems
docs/hackathon/judging.md — six-dimension rubric and scoreboard explanation
README.md — top-level "Hackathon" section linking to the above

No code outside docs/ or the top of README.md is touched. All five CI checks pass locally: uv sync && uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v.

How I chose the problems

Read every reference plugin (packages/nest-plugins-reference/) to find which layers have the weakest defaults — the gaps where a participant can ship something materially better with one weekend of work.
Read PRs [Hackathon] harvard-phd: EigenTrust plugin with checkable invariants #2-[Hackathon] google-staff: netem transport + tail-latency metrics + SLO validators #11 to see what's already taken. First-round picks collapsed: 3x EigenTrust (PRs [Hackathon] harvard-phd: EigenTrust plugin with checkable invariants #2, [Hackathon] mit-undergrad: EigenTrust plugin for the trust layer #3, [Hackathon] stanford-ml-phd: EigenTrust plugin for the trust layer #6), 4x latency-transport (PRs [Hackathon] linux-kernel: per-hop latency models for the in-memory transport #8, [Hackathon] meta-backend: realistic transport plugin (latency, jitter, queueing, loss) #10, [Hackathon] google-staff: netem transport + tail-latency metrics + SLO validators #11 + adjacent), 1x DPoP auth ([Hackathon] cybersec-blackhat: dpop_jwt auth plugin + security validators #9), 1x HTLC payments ([Hackathon] coinbase-crypto: htlc_escrow payments plugin (hash- & time-locked conditional payments) #7), 1x sealed-bid coord ([Hackathon] cmu-robotics: sealed-bid auction coordination plugin (first-price + Vickrey) #5), 1x semantic memory ([Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU) #4). Five layers (comms, identity, registry, negotiation, privacy, datafacts) had zero PRs.
Wrote problems that target the gaps, citing specific files and line numbers in each motivation so participants can't argue the gap isn't real. Where the obvious pick was already overrepresented, wrote a harder, sharper version (e.g. coordination = partition-tolerant BFT with view-change, not sealed-bid).

Layer coverage

Layer	Problem #	Difficulty	First-round overlap?
comms	01	easy	none
memory	02	easy	PR #4 added semantic recall but kept LWW; CRDT story still open
payments	03	easy	PR #7 added HTLC (one-shot); streaming is orthogonal
auth	04	easy	PR #9 added DPoP; delegation+revocation is orthogonal
identity	05	medium	none
registry	06	medium	none
negotiation	07	medium	none
datafacts	08	medium	none
privacy	09	hard	none
coordination	10	hard	PR #5 added sealed-bid; BFT consensus under partition is a different problem

10 of 12 layers covered (skipped trust and transport — both 3x covered in round 1, deliberately excluded to avoid a fourth pile-on).

Difficulty distribution

Easy: 4 (comms, memory, payments, auth)
Medium: 4 (identity, registry, negotiation, datafacts)
Hard: 2 (privacy, coordination)

Matches the requested 4/4/2 split.

Anti-overlap with PRs #2-#11

Explicit exclusions stated in the charter and reiterated in problem files:

No plain "implement EigenTrust" problem at all. Trust layer is excluded entirely from the 10.
No plain "add latency to in_memory transport" problem. Transport layer is excluded entirely from the 10.
Problem [Hackathon] mit-undergrad: EigenTrust plugin for the trust layer #3 (streaming payments) cites PR [Hackathon] coinbase-crypto: htlc_escrow payments plugin (hash- & time-locked conditional payments) #7 and says "streams aren't HTLCs."
Problem [Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU) #4 (delegated auth) cites PR [Hackathon] cybersec-blackhat: dpop_jwt auth plugin + security validators #9 and says "don't rebuild PR [Hackathon] cybersec-blackhat: dpop_jwt auth plugin + security validators #9's DPoP."
Problem [Hackathon] harvard-phd: EigenTrust plugin with checkable invariants #2 (CRDT memory) cites PR [Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU) #4 and says "PR [Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU) #4 added semantic recall but kept LWW semantics."
Problem [Hackathon] meta-backend: realistic transport plugin (latency, jitter, queueing, loss) #10 (BFT consensus) cites PR [Hackathon] cmu-robotics: sealed-bid auction coordination plugin (first-price + Vickrey) #5 and says "PR [Hackathon] cmu-robotics: sealed-bid auction coordination plugin (first-price + Vickrey) #5 added sealed-bid coordination — useful for mechanism design but not a consensus protocol."

CI

Locally on this branch:

uv sync — 0
uv run ruff check . — 0 (All checks passed!)
uv run ruff format --check . — 0 (94 files already formatted)
uv run pyright — 0 (0 errors, 0 warnings, 0 informations)
uv run pytest -v — 0 (259 passed, 1 warning, 14.28s)

Out of scope

No code in packages/ was touched.
No other platform tracks were touched.
scripts/judge/rubric.md does not exist yet (parallel judge-panel track); judging.md defers to it when it lands and writes the rubric inline meanwhile.

Generated by Claude Code

Ship a participant-facing brief for the month-long NEST hackathon: a charter, 10 differentiated open problems spanning 10 of the 12 layers, and a six-dimension judging rubric. The 10 problems are chosen to avoid the obvious-pick collapse seen in the first round (3x EigenTrust, 4x latency transport): no plain EigenTrust problem, no plain in_memory-transport-latency problem, and every problem cites the specific reference file lines that prove the gap is real.

sourcery-ai

Sorry @mariagorskikh, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

The participant-facing judging doc described a 0-10 per-dimension scale with `final = sum/6`, and an "adversarial validator fails → zero on correctness" hard floor. Neither matches what the judge panel (`scripts/judge/judge_pr.py` + `scripts/judge/rubric.md`) actually does. The rubric scores six dimensions each on a 1-5 integer scale; the headline total is the sum (in [6, 30]). The judges read the PR body, diff, and checks summary — they have no mechanism to evaluate an "adversarial validator", so the zero-on-correctness rule was writing a check the code can't cash. - Rewrite to declare `scripts/judge/rubric.md` as the source of truth and explicitly defer to it for anchor descriptions. - Replace the 0-10 / sum-divided-by-six description with the actual 1-5 per-dimension scale and `[6, 30]` sum total, matching what gets written to `docs/hackathon/scores.json` as the `median` field. - Drop the "adversarial validator zero" claim and the cross-runner / seed-bank description, both of which describe machinery that does not exist in the judging pipeline. - Document the real flow: CI gates → N independent LLM judges via `judge_pr.py` → per-dim medians + `median_low(per-judge totals)` → consensus narrative → `scores.json` → marketplace UI.

Integration of 5 platform tracks built in parallel by specialist agents: - platform/ci-hygiene (PR #12): Makefile + pre-commit + idempotent CI feedback bot + CONTRIBUTING Definition of Done - platform/open-problems (PR #13): 10 differentiated open problems across 10 layers, charter, judging doc - platform/judge-panel (PR #14): rubric, anthropic + openai providers, run_all CLI, real-diff fixture, live gpt-5.5 scoreboard for PRs #2-#11 - platform/research-harness (PR #15): conditions matrix, claude-CLI live runner, collect + analyze, dry-run fixtures + tests - platform/marketplace-ui (PR #16): /hackathon Next.js section with author tags, judge scores, layer browser; Python data adapter Schema reconciled end-to-end (rubric -> scores.json -> adapter -> TS types -> UI) on the 6-dim 1-5 scale with totals in [6, 30]. Local CI: 341 passed, 1 skipped (matplotlib gated), 1 deselected (live marker). Live judge scoreboard top: #2 harvard-phd trust 26.0/30 (EigenTrust + checkable invariants) #7 coinbase-crypto payments 26.0/30 (HTLC escrow) #6 stanford-ml-phd trust 25.0/30 #11 google-staff transport 25.0/30

mariagorskikh · 2026-05-26T22:06:49Z

Superseded by #17 (now merged to main at 1771cdb). Closing — the content of this PR is part of that integration merge.

Generated by Claude Code

sourcery-ai Bot reviewed May 26, 2026

View reviewed changes

mariagorskikh mentioned this pull request May 26, 2026

[Platform] Integration v2 #17

Merged

mariagorskikh merged commit 15d8d1d into main May 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Platform] Open problems + charter + judging doc#13

[Platform] Open problems + charter + judging doc#13
mariagorskikh merged 2 commits into
mainfrom
platform/open-problems

mariagorskikh commented May 26, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

mariagorskikh commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mariagorskikh commented May 26, 2026

What this PR does

How I chose the problems

Layer coverage

Difficulty distribution

Anti-overlap with PRs #2-#11

CI

Out of scope

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mariagorskikh commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants