Skip to content

release(spam-only): comment spam defense → production (excludes 七日書)#4849

Merged
mashbean merged 11 commits into
masterfrom
release/comment-spam-only
Jun 15, 2026
Merged

release(spam-only): comment spam defense → production (excludes 七日書)#4849
mashbean merged 11 commits into
masterfrom
release/comment-spam-only

Conversation

@mashbean

Copy link
Copy Markdown
Contributor

留言垃圾防治專用 release,從 develop 精選出來、排除七日書(campaign-discussion #4841 / quote-wall #4842),讓垃圾防治不被七日書時程綁住(使用者 2026-06-15 拍板 plan B)。

內容(cherry-pick 自 develop,僅留言 spam)

不含

  • ❌ campaign-discussion / quote-wall(七日書)及其 migration / schema 變更
  • 已驗證:本分支相對 master 只動 spam 檔,commentService 無 campaign 參照,schema/types 維持 master 原狀。

性質

部署後

  • ops 開 MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true(要啟用折疊才需要)
  • report_reason migration 會跑
  • L2 SQS/S3/worker 已備,部署即擷取

🤖 Generated with Claude Code

mashbean and others added 11 commits June 15, 2026 21:35
Comments currently share the short-content (moment) spam model. Add a dedicated
commentSpamDetectionApiUrl (MATTERS_COMMENT_SPAM_DETECTION_API_URL) so comments
score against the comment-specific model (e5-small + logreg, recall 0.88 vs 0.68
on unseen templates). Falls back to the short-content URL when unset, so
behaviour is unchanged until ops points it at the new endpoint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comments are scored by the dedicated comment spam model (#4838) but nothing
acts on the score yet. Articles already auto-demote via excludeSpam; this adds
the comment equivalent, using the softer 'collapsed' state (folded but still
expandable in-thread) per the '不刪除,只是不再被看見' governance principle.

When MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true, detectSpam collapses an active
comment whose spamScore reaches the tunable system spam threshold, skipping
authors on the bypassSpamDetection whitelist (same carve-out as articles).
Default off → scoring stays observe-only until ops opts in (zero-downtime,
same rollout pattern as #4838). Collapse is reversible; no deletion.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The GraphQL ReportReason enum declares community_watch_porn_ad and
community_watch_spam_ad, but report_reason_check (from 20231221154057) was never
updated to permit them. submitReport inserts the raw reason, so any report with
those values failed in production with a report_reason_check violation
(INTERNAL_SERVER_ERROR) — surfaced by the coastguard bot's Tier-1 reports.

Realign the DB constraint with the schema. communityWatchRemoveComment is
unaffected (it syncs illegal_advertising, not these values).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
L2 of the spam-data-retention roadmap: emit de-identified labeled samples to
SQS at the moderation boundary so the spam-model training signal survives later
content deletion that L1's passive DB extraction can't recover —
clearCommunityWatchOriginalContent nulls the snapshot, and account purge erases
content.

- common/notifications/spamSample.ts: enqueueSpamSample, mirrors
  enqueueReportAlert (best-effort SQS, never throws, no-op when unconfigured).
  Ids are HMAC-SHA256(salt) at emit so no raw user/content ids enter the queue;
  only the text the model trains on is carried verbatim.
- wired: communityWatchRemoveComment (confirmed spam at removal),
  clearCommunityWatchOriginalContent (capture before the snapshot is nulled;
  reversed action -> hard-negative ham).
- env: MATTERS_AWS_SPAM_SAMPLE_QUEUE_URL, MATTERS_SPAM_SAMPLE_HASH_SALT.

A separate Lambda worker consumes the queue and appends de-identified rows to
the S3 training bucket (see spam-detection-scaffold). Off until ops provisions
the queue + salt.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CI lint failed: #-subpath/external/node: imports are one alphabetized group
with no blank lines. Reorder spamSample.ts imports accordingly.
…shot)

The clear mutation now snapshots the action before nulling (axis-2 L2), so its
test context must provide findCommunityWatchActionByUUID. enqueueSpamSample
no-ops without queue/salt env, so nothing is sent in tests.
Mirror reportAlert.test.ts: payload shape, HMAC de-identification (ids hashed,
never raw + deterministic), null score for ham, and no-op guards (queue unset /
salt unset / blank text) + AWS-error swallowing. Brings spamSample.ts diff
coverage to green.
CI test scripts only run build/{connectors,common/utils,routes,types}; the
common/notifications dir has no script, so the standalone spamSample.test.ts
never ran and spamSample.ts stayed at 38%. Remove that dead test and instead
exercise enqueueSpamSample's full body from communityWatchRemoveComment.test
(common/utils, which IS run): set the queue URL + hash salt, stub
aws.sqsSendMessage, and assert a de-identified sample (hashed ids) is enqueued
on removal.
Integration tests for the spam auto-collapse path: collapses an active comment
at/above the system threshold, leaves it active below threshold, and skips
bypassSpamDetection-whitelisted authors. Sets the spam_detection feature flag
and toggles commentSpamAutoCollapse around each case. Raises diff coverage.
… project)

spamSample.ts was at 76.9% (lines 66, 84-85 uncovered). Add two removal cases:
aws throws -> removal still succeeds (covers the swallow/catch); removed comment
has blank content -> sample skipped (covers the blank-text guard). Brings the
file to ~full coverage so codecov/project no longer dips.
Total repo coverage fluctuates run-to-run (sharded integration suites) and
codecov compares against the nearest ancestor with a coverage upload (develop
merge commits publish none), so PRs show spurious project drops even when their
own diff is 100% covered (e.g. #4846 at -0.46%). Add a 1% project threshold to
absorb that noise; patch stays strict so new code must still be tested.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.74074% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.45%. Comparing base (6b2a5c1) to head (4b98624).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/connectors/commentService.ts 81.25% 2 Missing and 1 partial ⚠️
...3000000_alter_report_reason_add_community_watch.js 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4849      +/-   ##
==========================================
+ Coverage   72.40%   72.45%   +0.04%     
==========================================
  Files        1054     1056       +2     
  Lines       20908    20960      +52     
  Branches     4515     4550      +35     
==========================================
+ Hits        15139    15186      +47     
+ Misses       5699     5300     -399     
- Partials       70      474     +404     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mashbean mashbean marked this pull request as ready for review June 15, 2026 13:48
@mashbean mashbean requested a review from a team as a code owner June 15, 2026 13:48
@mashbean mashbean merged commit 89ac7bf into master Jun 15, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant