Skip to content

feat(core/reviews): parse kind:38000, materialize aggregates, rank by bayesian damping#31

Merged
orveth merged 8 commits into
v2from
feat/core-ranking
Apr 17, 2026
Merged

feat(core/reviews): parse kind:38000, materialize aggregates, rank by bayesian damping#31
orveth merged 8 commits into
v2from
feat/core-ranking

Conversation

@orveth
Copy link
Copy Markdown

@orveth orveth commented Apr 17, 2026

Summary

PR #5 of 8 for the v2 rebuild. Implements the reviews pipeline end-to-end:

  • Parse layer (src/reviews/parse.ts) — kind:38000 parser emitting ReviewRow. Rating precedence (data-model-v1.md §4): structured ["rating","N","5"] > legacy ["rating","N"] > content N/5 or N/10 > leading star-emoji run > null. Pure (no d-tag gate) so callable from log views / pagination dedup.
  • Aggregate layer (src/reviews/aggregate.ts) — Bayesian formula avgRating * log10(ratedCount + 1) (null-avg → 0 for sort-index friendliness); recomputeAggregateInTx runs inside the same Dexie transaction as the review write.
  • Upsert wrapper (src/reviews/upsert.ts) — upsertReviewWithAggregate wraps cache CAS + aggregate recompute atomically, so concurrent readers never see a review without its aggregate reflection. Recompute skipped on rejected-stale / rejected-invalid.
  • Rank export (src/reviews/rank.ts) — rankMints(db, limit = 50) via the bayesianScore index.

Schema

v3 additive migration:

  • bayesianRankbayesianScore (new index)
  • averageRatingavgRating (new index)
  • adds ratedCount
  • retains updatedAt index
  • v3 upgrade callback clears mintAggregate (payload rename — see Fixup pass below)

Scheduler wiring

Kind:38000 branch now delegates to the new module. Layer A bot-spam gate lives in upsertReview with a Fedimint (k=38173) sibling shape check — 38172 Cashu-mint reviews with 16-char legacy d-tags and 38173 Fedimint reviews with non-hex junk d-tags both firewall at the edge.

Test plan

  • bun run test — 224 passed (219 baseline + 5 fixup regressions)
  • bun --filter='*' run test — green workspace-wide
  • bunx biome check packages/ — 0 errors (16 pre-existing warnings for non-null assertions in test files)
  • Real-corpus smoke test against 5 audit-dump kind:38000 events (reviews/corpus.test.ts)
  • Integration test updated: 2 fixture reviews with 16-char d-tags now firewall (rejectedByLayerA: 5→7, accepted: 11→9)

Coverage summary

File Tests Focus
parse.test.ts 31 all 4 rating formats, precedence, null fallback, k/u tags, malformed tag fallthrough
aggregate.test.ts 12 formula edges + recompute scenarios
upsert.test.ts 13 CAS + aggregate-in-sync + Layer A gate + concurrent CAS race
rank.test.ts 8 sort order + limit + bounds (0 / Infinity)
corpus.test.ts 1 real-relay smoke
schema.test.ts 5 v2→v3 upgrade migration clears mintAggregate

Fixup pass

Post-review hardening from the three-reviewer triangulation (code-reviewer + silent-failure-hunter + pr-test-analyzer). No correctness regressions found — these are observability and edge-case firewalls.

  • 19b46ca fix(core/cache): narrow k=38173 d-tag shape check — the prior free-pass on kind:38173 let any short/junk d-tag land as long as the k tag claimed Fedimint. Every real federation ID in the audit corpus is 64-char lowercase hex. Added isValidFedimintDTag as a sibling shape gate in nip87/dtag.ts; applied in both upsertAnnouncement and upsertReview.
  • f8b385b fix(core/scheduler): wrap onEvent cases in try/catch and count handler errors — the scheduler header comment claimed "each kind's handler does its own try/catch" but the 38000 / 38172 / 38173 / 0 / 10002 branches didn't. A thrown Dexie transaction would escape as an unhandled rejection and ingest would look healthy while stats froze. Each case body now wraps a try/catch that bumps a new stats.handlerErrors counter, logs [scheduler] handler error with kind+eventId. Also adds stats.rejectedByParse for parseReview-null drops on the 38000 branch so those silent reject paths become observable.
  • 6c57602 fix(core/cache): clear mintAggregate on v2 → v3 upgrade — Dexie auto-migrates the v3 schema indexes but NOT row PAYLOADS. A dev with v2 IndexedDB would have rows shaped {d, averageRating, bayesianRank, updatedAt} that fail every v3 query shape. Added .upgrade(tx => tx.table("mintAggregate").clear()) — no prod data, the aggregate re-derives from reviews on next upsert, so a clean wipe is the correct migration.
  • 1a9aa49 fix(core/reviews): explicit precedence fallthrough in parseRatingFromContent — the N/5 and N/10 content blocks didn't return when they matched with an out-of-range number; behaviour was correct-by-accident. 7/5 ⭐⭐⭐⭐⭐ would previously fall through to the emoji and report 5★. Now each numeric block commits to its format on match and returns (including undefined for out-of-range).
  • a7d9a45 test(core/reviews): concurrent CAS race, malformed tags, rankMints bounds — three new tests: (1) two concurrent upsertReviewWithAggregate calls for the same d with distinct pubkeys converge across 5 shuffled trials, aggregate sees both; (2) table-driven malformed rating tag coverage (foo, "", missing, null-as-string) all land on rating: null; (3) rankMints with limit=0 returns [], limit=Infinity returns the full ranked list.
  • a1918ea style(core/reviews): inline single-arg parseReview call to match biome format — mechanical formatter fix.

Deviations

None — all three must-fix items, all three should-fix items, and all three new tests landed. The audit/data-model-v1.md §13 spec doc update lives outside the repo (keeper's local audit corpus) and is recorded in the keeper memory, not the PR diff.

🤖 Generated with Claude Code

… bayesian damping

Reviews pipeline split across three modules under `src/reviews/`:

- `parse.ts` — kind:38000 parser emitting `ReviewRow` directly; rating
  precedence per data-model-v1.md §4 (structured `["rating","N","5"]` >
  legacy `["rating","N"]` > content `N/5` or `N/10` > leading star emoji
  run > null). Parse stays pure (no d-tag gate) so it's usable from log
  views and pagination dedup.
- `aggregate.ts` — `bayesianScore(avg, n) = avg * log10(n+1)` (null-avg →
  0 for sort-index friendliness) and `recomputeAggregateInTx(db, d)`
  scoped to the reviews+mintAggregate transaction.
- `upsert.ts` — `upsertReviewWithAggregate` wraps cache CAS + aggregate
  recompute in one `rw` transaction so concurrent reads never see a
  review without its aggregate reflection. Recompute skipped on
  rejected-stale and rejected-invalid.
- `rank.ts` — `rankMints(db, limit=50)` using the `bayesianScore` index.

Schema v3 bump: `bayesianRank` → `bayesianScore`, `averageRating` →
`avgRating`, adds `ratedCount`; indexes `bayesianScore` + `avgRating` +
`updatedAt`. Additive migration via Dexie `.version(3).stores({...})`.

Scheduler's kind:38000 branch now delegates to the reviews module and
counts `rejected-invalid` against `rejectedByLayerA`. The Layer A
bot-spam gate moves into `upsertReview` with a Fedimint (k=38173)
bypass, so 38172 Cashu-mint reviews with bogus 16-char d-tags get
firewalled at the edge, while federation IDs (k=38173) pass through
unconstrained per research.

Tests: +60 added across 6 files — parse branches (30), aggregate formula
+ recompute, CAS+aggregate-in-sync, rank sort + limit, and a 5-event
real-corpus smoke from audit/relay-data/recs-38000.json. Integration
test updated: 2 of 5 fixture reviews now firewall on d-tag shape, which
bumps `rejectedByLayerA` from 5→7 and `accepted` from 11→9.

Full workspace: 219 tests passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
bitcoinmints Error Error Apr 17, 2026 5:42pm

Request Review

orveth and others added 6 commits April 17, 2026 10:34
The prior free-pass on kind=38173 let any short/junk d-tag land in Dexie
as long as the `k` tag claimed Fedimint. Every real federation ID in the
audit corpus (fedimint-observer.md + reviews/corpus.test.ts) is lowercase
64-char hex — bot spam is not. Add isValidFedimintDTag as a sibling shape
gate so Layer A catches the Fedimint branch at the same choke point as
Cashu. Apply in both upsertAnnouncement and upsertReview.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r errors

The prior code relied on an outdated claim (now-removed doc comment) that
each kind's handler did its own try/catch — it didn't. A thrown Dexie
transaction (QuotaExceeded, schema collision, unexpected disk state) in
the 38000/38172/38173/0/10002 branch would escape as an unhandled
rejection at the `void onEvent(event)` subscription boundary and stats
would silently freeze at last-good while ingest continued to look
healthy. Wrap each case body so branch-local exceptions count into a new
`stats.handlerErrors` counter and log with a stable `[scheduler]` prefix.
Also add `stats.rejectedByParse` for the kind:38000 branch so
`parseReview`-null drops are observable rather than silent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dexie auto-migrates the v3 schema indexes (`bayesianScore`, `avgRating`
replacing `bayesianRank`), but does NOT transform row PAYLOADS. A dev
with a v2 IndexedDB would have rows shaped `{d, averageRating,
bayesianRank, updatedAt}` — field names renamed in v3 — which fail every
v3 query shape. No prod data yet and the aggregate re-derives from live
review traffic, so the correct migration is a clean wipe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Content

The N/5 and N/10 content blocks previously didn't return when they
matched but were out of range — the function silently fell through to
the emoji regex. Behaviour was correct-by-accident: `0/10 total trash`
would yield no-rating because there were no leading stars after, but
`7/5 ⭐⭐⭐⭐⭐` would latch onto the emoji run and report 5★ despite the
numeric prefix saying otherwise.

Make the precedence explicit — when an N/5 or N/10 prefix MATCHES the
content, we commit to that format and return (even when the parsed
number is out of range, in which case we return undefined so the null
fallback in parseReview kicks in). Add a header comment naming the
rule so future readers don't re-introduce the silent fall-through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unds

- reviews/upsert.test.ts: two concurrent upserts for the same d with
  distinct pubkeys converge — both reviews land AND the aggregate sees
  both (5 shuffled trials). Mirrors the announcement-side regression
  in cache/upsert.test.ts and guards the aggregate-in-same-transaction
  invariant against a pre-write-snapshot recompute.
- reviews/parse.test.ts: table-driven coverage for malformed rating tag
  shapes (non-numeric, empty, missing, null-as-string). All fall through
  to `rating: null`.
- reviews/rank.test.ts: `limit=0` returns [], `limit=Infinity` returns
  every mint in score-descending order — covers the boundary conditions
  the Dexie limit() clamp handles.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e format

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…k typecheck

TS2783: `...over` spread already supplies both required fields.
@orveth orveth merged commit e0af01d into v2 Apr 17, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant