Skip to content

[Platform] Open problems + charter + judging doc#13

Merged
mariagorskikh merged 2 commits into
mainfrom
platform/open-problems
May 26, 2026
Merged

[Platform] Open problems + charter + judging doc#13
mariagorskikh merged 2 commits into
mainfrom
platform/open-problems

Conversation

@mariagorskikh

Copy link
Copy Markdown
Collaborator

What this PR does

Ships the participant-facing scaffolding for the NEST hackathon:

  • docs/hackathon/charter.md — the brief (what NEST is, what to build, the rules)
  • docs/hackathon/problems/ — 10 differentiated open problems
  • docs/hackathon/judging.md — six-dimension rubric and scoreboard explanation
  • README.md — top-level "Hackathon" section linking to the above

No code outside docs/ or the top of README.md is touched. All five CI checks pass locally: uv sync && uv run ruff check . && uv run ruff format --check . && uv run pyright && uv run pytest -v.

How I chose the problems

  1. Read every reference plugin (packages/nest-plugins-reference/) to find which layers have the weakest defaults — the gaps where a participant can ship something materially better with one weekend of work.
  2. Read PRs [Hackathon] harvard-phd: EigenTrust plugin with checkable invariants #2-[Hackathon] google-staff: netem transport + tail-latency metrics + SLO validators #11 to see what's already taken. First-round picks collapsed: 3x EigenTrust (PRs [Hackathon] harvard-phd: EigenTrust plugin with checkable invariants #2, [Hackathon] mit-undergrad: EigenTrust plugin for the trust layer #3, [Hackathon] stanford-ml-phd: EigenTrust plugin for the trust layer #6), 4x latency-transport (PRs [Hackathon] linux-kernel: per-hop latency models for the in-memory transport #8, [Hackathon] meta-backend: realistic transport plugin (latency, jitter, queueing, loss) #10, [Hackathon] google-staff: netem transport + tail-latency metrics + SLO validators #11 + adjacent), 1x DPoP auth ([Hackathon] cybersec-blackhat: dpop_jwt auth plugin + security validators #9), 1x HTLC payments ([Hackathon] coinbase-crypto: htlc_escrow payments plugin (hash- & time-locked conditional payments) #7), 1x sealed-bid coord ([Hackathon] cmu-robotics: sealed-bid auction coordination plugin (first-price + Vickrey) #5), 1x semantic memory ([Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU) #4). Five layers (comms, identity, registry, negotiation, privacy, datafacts) had zero PRs.
  3. Wrote problems that target the gaps, citing specific files and line numbers in each motivation so participants can't argue the gap isn't real. Where the obvious pick was already overrepresented, wrote a harder, sharper version (e.g. coordination = partition-tolerant BFT with view-change, not sealed-bid).

Layer coverage

Layer Problem # Difficulty First-round overlap?
comms 01 easy none
memory 02 easy PR #4 added semantic recall but kept LWW; CRDT story still open
payments 03 easy PR #7 added HTLC (one-shot); streaming is orthogonal
auth 04 easy PR #9 added DPoP; delegation+revocation is orthogonal
identity 05 medium none
registry 06 medium none
negotiation 07 medium none
datafacts 08 medium none
privacy 09 hard none
coordination 10 hard PR #5 added sealed-bid; BFT consensus under partition is a different problem

10 of 12 layers covered (skipped trust and transport — both 3x covered in round 1, deliberately excluded to avoid a fourth pile-on).

Difficulty distribution

  • Easy: 4 (comms, memory, payments, auth)
  • Medium: 4 (identity, registry, negotiation, datafacts)
  • Hard: 2 (privacy, coordination)

Matches the requested 4/4/2 split.

Anti-overlap with PRs #2-#11

Explicit exclusions stated in the charter and reiterated in problem files:

CI

Locally on this branch:

  • uv sync — 0
  • uv run ruff check . — 0 (All checks passed!)
  • uv run ruff format --check . — 0 (94 files already formatted)
  • uv run pyright — 0 (0 errors, 0 warnings, 0 informations)
  • uv run pytest -v — 0 (259 passed, 1 warning, 14.28s)

Out of scope

  • No code in packages/ was touched.
  • No other platform tracks were touched.
  • scripts/judge/rubric.md does not exist yet (parallel judge-panel track); judging.md defers to it when it lands and writes the rubric inline meanwhile.

Generated by Claude Code

Ship a participant-facing brief for the month-long NEST hackathon:
a charter, 10 differentiated open problems spanning 10 of the 12
layers, and a six-dimension judging rubric. The 10 problems are
chosen to avoid the obvious-pick collapse seen in the first round
(3x EigenTrust, 4x latency transport): no plain EigenTrust problem,
no plain in_memory-transport-latency problem, and every problem
cites the specific reference file lines that prove the gap is real.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @mariagorskikh, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

The participant-facing judging doc described a 0-10 per-dimension scale
with `final = sum/6`, and an "adversarial validator fails → zero on
correctness" hard floor. Neither matches what the judge panel
(`scripts/judge/judge_pr.py` + `scripts/judge/rubric.md`) actually
does. The rubric scores six dimensions each on a 1-5 integer scale; the
headline total is the sum (in [6, 30]). The judges read the PR body,
diff, and checks summary — they have no mechanism to evaluate an
"adversarial validator", so the zero-on-correctness rule was writing a
check the code can't cash.

- Rewrite to declare `scripts/judge/rubric.md` as the source of truth
  and explicitly defer to it for anchor descriptions.
- Replace the 0-10 / sum-divided-by-six description with the actual
  1-5 per-dimension scale and `[6, 30]` sum total, matching what gets
  written to `docs/hackathon/scores.json` as the `median` field.
- Drop the "adversarial validator zero" claim and the cross-runner /
  seed-bank description, both of which describe machinery that does
  not exist in the judging pipeline.
- Document the real flow: CI gates → N independent LLM judges via
  `judge_pr.py` → per-dim medians + `median_low(per-judge totals)` →
  consensus narrative → `scores.json` → marketplace UI.
mariagorskikh added a commit that referenced this pull request May 26, 2026
Integration of 5 platform tracks built in parallel by specialist agents:

- platform/ci-hygiene (PR #12): Makefile + pre-commit + idempotent CI feedback bot + CONTRIBUTING Definition of Done
- platform/open-problems (PR #13): 10 differentiated open problems across 10 layers, charter, judging doc
- platform/judge-panel (PR #14): rubric, anthropic + openai providers, run_all CLI, real-diff fixture, live gpt-5.5 scoreboard for PRs #2-#11
- platform/research-harness (PR #15): conditions matrix, claude-CLI live runner, collect + analyze, dry-run fixtures + tests
- platform/marketplace-ui (PR #16): /hackathon Next.js section with author tags, judge scores, layer browser; Python data adapter

Schema reconciled end-to-end (rubric -> scores.json -> adapter -> TS types -> UI) on the 6-dim 1-5 scale with totals in [6, 30].

Local CI: 341 passed, 1 skipped (matplotlib gated), 1 deselected (live marker).

Live judge scoreboard top:
  #2  harvard-phd     trust       26.0/30  (EigenTrust + checkable invariants)
  #7  coinbase-crypto payments    26.0/30  (HTLC escrow)
  #6  stanford-ml-phd trust       25.0/30
  #11 google-staff    transport   25.0/30
@mariagorskikh mariagorskikh merged commit 15d8d1d into main May 26, 2026
4 checks passed

Copy link
Copy Markdown
Collaborator Author

Superseded by #17 (now merged to main at 1771cdb). Closing — the content of this PR is part of that integration merge.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants