feat: three-gate auto-moderation (date/relevance/POI) + retroactive domain block by fatherlinux · Pull Request #447 · crunchtools/rotv

fatherlinux · 2026-05-29T12:42:41Z

Summary

Restructures news/event auto-moderation around three independent gates — Date, Relevance, POI — auto-publishing only when all three pass, and showing each gate's verdict in the admin queue so a pending item reveals exactly which gate needs a human. Spec: .specify/specs/030-moderation-gates/.

Migration 070: moderation_gates JSONB on poi_news/poi_events; settings moderation_date_floor_year (2010), moderation_sweep_batch_size (50, was hardcoded 20/type).
Date gate: passes when a date is present, not future, plausible (≥ floor), and trustworthy (consensus ≥ threshold OR trusted domain). Search-engine date weight raised 3→4 (proven reliable); rescore path no longer drops searchDate.
Relevance gate: 3-vote consensus, prompt loosened to keep on-topic evergreen content (trail/history/destination) and reject only off-topic / out-of-region. Unanimous-NO auto-rejects.
POI gate (three-tier): about-assigned-POI → reassign to owner (owner_id) or smallest containing boundary → review. Deny-listed POIs are filtered out of reassignment candidates so a reassignment can never land on a blocked POI.
Retroactive domain block: blocklist_urls is now a hard-reject deny list (was collection-time skip only) — blocking a domain cleans up already-collected items on the next sweep.
Admin UI: three gate badges (pass/review/fail) with reasons in ModerationExtras.

Verified live against production data: ~80% of a 50-item sample auto-cleared (78% published, 2% rejected), with the remainder held for review on POI/missing-date — and date trust validated against real article bylines.

Test plan

GHA build green
Migration 070 applies on prod
Moderation sweep populates gate badges; auto-publish requires all three gates
Blocking a domain retroactively rejects stored items from it

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…omain block Restructures news/event auto-moderation around three independent gates — Date, Relevance, and POI — auto-publishing only when all three pass, and surfacing each gate's verdict in the admin queue so a pending item shows exactly which gate needs a human. - migration 070: moderation_gates JSONB on poi_news/poi_events; settings moderation_date_floor_year (2010) and moderation_sweep_batch_size (50) - moderationService: evaluateDateGate / evaluatePoiGate / relevance gate, combine into the publish/reject/pending decision, persist moderation_gates; configurable sweep batch (was hardcoded 20); getQueue returns moderation_gates - POI gate is three-tier: about-assigned-POI -> reassign to owner/containing-boundary -> review; deny-listed POIs are filtered out of reassignment candidates so a reassignment can never land on a blocked POI - geoService: getReassignmentCandidates (owner via owner_id + smallest containing boundary POI) - relevance prompt loosened to accept on-topic evergreen content (trail/history/ destination), rejecting only off-topic or out-of-region; folds an about_poi vote - dateExtractor: search-engine date weight 3 -> 4 (proven reliable in moderation); rescore path no longer drops searchDate, so SE dates survive a rescore - filterLists: blocklist_urls is now a retroactive hard-reject deny list (was collection-time skip only) — blocking a domain cleans up already-collected items - ModerationExtras: three gate badges (pass/review/fail) with reasons - tests for the date gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request implements a three-gate auto-moderation system (Date, Relevance, and POI) for news and events, persisting gate verdicts in a new JSONB column and updating the admin UI to display these badges. It also introduces a configurable sweep batch size and a retroactive URL blocklist. The review feedback highlights two key improvements: addressing a potential bypass vulnerability in the URL blocklist prefix matching by enforcing domain/path boundaries, and optimizing the POI gate logic with a guard clause to prevent redundant database and LLM calls when no relevance votes are available.

gemini-code-assist · 2026-05-29T12:44:36Z

+    matches: (row, prefixes) => {
+      const norm = normalizeUrlForPrefix(row.source_url);
+      if (!norm) return false;
+      return prefixes.some(p => typeof p === 'string' && p.trim() && norm.startsWith(normalizeBlocklistPrefix(p)));
+    },
+    sweepFragment: (prefixes) => {
+      const valid = prefixes.filter(p => typeof p === 'string' && p.trim());
+      if (!valid.length) return null;
+      const conds = valid.map((_, i) => `regexp_replace(lower(source_url), '^https?://(www\\.)?', '') LIKE $${i + 1}`).join(' OR ');
+      return { sql: `source_url IS NOT NULL AND (${conds})`, params: valid.map(p => normalizeBlocklistPrefix(p) + '%') };
+    }


Using startsWith and LIKE 'prefix%' without boundary checks can lead to false positives or bypasses. For example, if example.com is blocklisted, it will match example.com.attacker.com because example.com.attacker.com starts with example.com. To prevent this, we should ensure that the prefix match respects domain/path boundaries (i.e., the prefix must match exactly or be followed by a /).

matches: (row, prefixes) => { const norm = normalizeUrlForPrefix(row.source_url); if (!norm) return false; return prefixes.some(p => { if (typeof p !== 'string' || !p.trim()) return false; const prefix = normalizeBlocklistPrefix(p); return norm === prefix || norm.startsWith(prefix + '/'); }); }, sweepFragment: (prefixes) => { const valid = prefixes.filter(p => typeof p === 'string' && p.trim()); if (!valid.length) return null; const conds = valid.map((_, i) => { const idx = i + 1; return `regexp_replace(lower(source_url), '^https?://(www\\\\.)?', '') = $${idx} OR regexp_replace(lower(source_url), '^https?://(www\\\\.)?', '') LIKE $${idx} || '/%'`; }).join(' OR '); return { sql: `source_url IS NOT NULL AND (${conds})`, params: valid.map(p => normalizeBlocklistPrefix(p)) }; }

gemini-code-assist · 2026-05-29T12:44:36Z

+export async function evaluatePoiGate(pool, row, votes, deniedPoiIds = new Set()) {
+  const total = votes.length;
+  const aboutCount = votes.filter(v => v.about_poi).length;
+  if (total > 0 && aboutCount * 2 >= total) {
+    return { verdict: 'pass', tier: 1, reason: `About assigned POI (${aboutCount}/${total} votes)`, reassigned_from: null, reassigned_to: null, newPoiId: null };
+  }


If votes is empty (e.g., relevance voting failed or returned no results), evaluatePoiGate will still try to run Tier 2 reassignment, which makes unnecessary database queries and an extra LLM call. Guarding evaluatePoiGate to immediately return a review verdict when votes is empty avoids these redundant operations.

Suggested change

export async function evaluatePoiGate(pool, row, votes, deniedPoiIds = new Set()) {

const total = votes.length;

const aboutCount = votes.filter(v => v.about_poi).length;

if (total > 0 && aboutCount * 2 >= total) {

return { verdict: 'pass', tier: 1, reason: `About assigned POI (${aboutCount}/${total} votes)`, reassigned_from: null, reassigned_to: null, newPoiId: null };

}

export async function evaluatePoiGate(pool, row, votes, deniedPoiIds = new Set()) {

const total = votes.length;

if (total === 0) {

return { verdict: 'review', tier: 3, reason: 'No relevance votes available', reassigned_from: null, reassigned_to: null, newPoiId: null };

}

const aboutCount = votes.filter(v => v.about_poi).length;

if (aboutCount * 2 >= total) {

return { verdict: 'pass', tier: 1, reason: `About assigned POI (${aboutCount}/${total} votes)`, reassigned_from: null, reassigned_to: null, newPoiId: null };

}

…klist SQL trailing-slash) - migration 071: GiST index on pois.boundary_geom for ST_Contains reassignment lookups (PR #447 review) - moderationService: consolidate the two post-gate UPDATE paths into one with COALESCE(publication_date) (PR #447 review) - filterLists: align blocklist sweep SQL normalization with JS (strip trailing slash) (PR #447 review) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…view) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ion-gates

…Gourmand gate) Pre-existing violation from #445 surfaced by the merge; rename to tokenRow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

searchDate is now weighted 4 (on par with JSON-LD): alone scores 4, beats weak signals, and a conflict with JSON-LD is a 4-4 tie -> score 0 (review). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fatherlinux and others added 2 commits May 29, 2026 05:56

spec: three-gate auto-moderation (date/relevance/POI)

6b04167

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

fatherlinux and others added 5 commits May 29, 2026 08:47

fix: inline single-use URL normalizer to satisfy Gourmand (PR #447 re…

afd7edb

…view) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into feature/030-moderat…

65c16da

…ion-gates

fix: rename generic 'result' var in userSettings.js mcp-token route (…

9da4571

…Gourmand gate) Pre-existing violation from #445 surfaced by the merge; rename to tokenRow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fatherlinux merged commit 1603437 into master May 29, 2026
3 checks passed

fatherlinux deleted the feature/030-moderation-gates branch May 29, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: three-gate auto-moderation (date/relevance/POI) + retroactive domain block#447

feat: three-gate auto-moderation (date/relevance/POI) + retroactive domain block#447
fatherlinux merged 7 commits into
masterfrom
feature/030-moderation-gates

fatherlinux commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fatherlinux commented May 29, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant