release(spam-only): comment spam defense → production (excludes 七日書)#4849
Merged
Conversation
Comments currently share the short-content (moment) spam model. Add a dedicated commentSpamDetectionApiUrl (MATTERS_COMMENT_SPAM_DETECTION_API_URL) so comments score against the comment-specific model (e5-small + logreg, recall 0.88 vs 0.68 on unseen templates). Falls back to the short-content URL when unset, so behaviour is unchanged until ops points it at the new endpoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comments are scored by the dedicated comment spam model (#4838) but nothing acts on the score yet. Articles already auto-demote via excludeSpam; this adds the comment equivalent, using the softer 'collapsed' state (folded but still expandable in-thread) per the '不刪除,只是不再被看見' governance principle. When MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true, detectSpam collapses an active comment whose spamScore reaches the tunable system spam threshold, skipping authors on the bypassSpamDetection whitelist (same carve-out as articles). Default off → scoring stays observe-only until ops opts in (zero-downtime, same rollout pattern as #4838). Collapse is reversible; no deletion. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The GraphQL ReportReason enum declares community_watch_porn_ad and community_watch_spam_ad, but report_reason_check (from 20231221154057) was never updated to permit them. submitReport inserts the raw reason, so any report with those values failed in production with a report_reason_check violation (INTERNAL_SERVER_ERROR) — surfaced by the coastguard bot's Tier-1 reports. Realign the DB constraint with the schema. communityWatchRemoveComment is unaffected (it syncs illegal_advertising, not these values). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
L2 of the spam-data-retention roadmap: emit de-identified labeled samples to SQS at the moderation boundary so the spam-model training signal survives later content deletion that L1's passive DB extraction can't recover — clearCommunityWatchOriginalContent nulls the snapshot, and account purge erases content. - common/notifications/spamSample.ts: enqueueSpamSample, mirrors enqueueReportAlert (best-effort SQS, never throws, no-op when unconfigured). Ids are HMAC-SHA256(salt) at emit so no raw user/content ids enter the queue; only the text the model trains on is carried verbatim. - wired: communityWatchRemoveComment (confirmed spam at removal), clearCommunityWatchOriginalContent (capture before the snapshot is nulled; reversed action -> hard-negative ham). - env: MATTERS_AWS_SPAM_SAMPLE_QUEUE_URL, MATTERS_SPAM_SAMPLE_HASH_SALT. A separate Lambda worker consumes the queue and appends de-identified rows to the S3 training bucket (see spam-detection-scaffold). Off until ops provisions the queue + salt. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CI lint failed: #-subpath/external/node: imports are one alphabetized group with no blank lines. Reorder spamSample.ts imports accordingly.
…shot) The clear mutation now snapshots the action before nulling (axis-2 L2), so its test context must provide findCommunityWatchActionByUUID. enqueueSpamSample no-ops without queue/salt env, so nothing is sent in tests.
Mirror reportAlert.test.ts: payload shape, HMAC de-identification (ids hashed, never raw + deterministic), null score for ham, and no-op guards (queue unset / salt unset / blank text) + AWS-error swallowing. Brings spamSample.ts diff coverage to green.
CI test scripts only run build/{connectors,common/utils,routes,types}; the
common/notifications dir has no script, so the standalone spamSample.test.ts
never ran and spamSample.ts stayed at 38%. Remove that dead test and instead
exercise enqueueSpamSample's full body from communityWatchRemoveComment.test
(common/utils, which IS run): set the queue URL + hash salt, stub
aws.sqsSendMessage, and assert a de-identified sample (hashed ids) is enqueued
on removal.
Integration tests for the spam auto-collapse path: collapses an active comment at/above the system threshold, leaves it active below threshold, and skips bypassSpamDetection-whitelisted authors. Sets the spam_detection feature flag and toggles commentSpamAutoCollapse around each case. Raises diff coverage.
… project) spamSample.ts was at 76.9% (lines 66, 84-85 uncovered). Add two removal cases: aws throws -> removal still succeeds (covers the swallow/catch); removed comment has blank content -> sample skipped (covers the blank-text guard). Brings the file to ~full coverage so codecov/project no longer dips.
Total repo coverage fluctuates run-to-run (sharded integration suites) and codecov compares against the nearest ancestor with a coverage upload (develop merge commits publish none), so PRs show spurious project drops even when their own diff is 100% covered (e.g. #4846 at -0.46%). Add a 1% project threshold to absorb that noise; patch stays strict so new code must still be tested.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4849 +/- ##
==========================================
+ Coverage 72.40% 72.45% +0.04%
==========================================
Files 1054 1056 +2
Lines 20908 20960 +52
Branches 4515 4550 +35
==========================================
+ Hits 15139 15186 +47
+ Misses 5699 5300 -399
- Partials 70 474 +404 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This was referenced Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
留言垃圾防治專用 release,從 develop 精選出來、排除七日書(campaign-discussion #4841 / quote-wall #4842),讓垃圾防治不被七日書時程綁住(使用者 2026-06-15 拍板 plan B)。
內容(cherry-pick 自 develop,僅留言 spam)
codecov.yml(project threshold 1%)不含
性質
部署後
MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true(要啟用折疊才需要)🤖 Generated with Claude Code