release(spam-only): comment spam defense → production (excludes 七日書) by mashbean · Pull Request #4849 · thematters/matters-server

mashbean · 2026-06-15T13:37:14Z

留言垃圾防治專用 release，從 develop 精選出來、排除七日書(campaign-discussion #4841 / quote-wall #4842),讓垃圾防治不被七日書時程綁住（使用者 2026-06-15 拍板 plan B）。

內容（cherry-pick 自 develop，僅留言 spam）

feat(comment): dedicated comment spam model endpoint #4838 留言打分路由到專用留言模型
feat(comment): auto-collapse spam comments behind env flag #4843 留言 auto-collapse（env flag 預設關）
fix(report): allow community_watch_* reasons in DB check constraint #4844 report_reason DB constraint 修正（含 migration）
feat(spam): capture moderated comments as training samples (axis-2 L2) #4846 L2 訓練樣本擷取（enqueueSpamSample → SQS）
對應測試 + codecov.yml（project threshold 1%）

不含

❌ campaign-discussion / quote-wall（七日書）及其 migration / schema 變更
已驗證：本分支相對 master 只動 spam 檔，commentService 無 campaign 參照，schema/types 維持 master 原狀。

性質

留言 spam 不改 GraphQL schema（純 service 邏輯 + migration + env），故能乾淨抽離。
draft 暫存：等留言+文章 spam 都 ready，再轉 Ready→merge（觸發 prod 部署）。屆時若七日書也好了，也可改回用 develop→master(release: develop → production (留言防治 + quote-wall + campaign-discussion) #4848) 整批上；二擇一。

部署後

ops 開 MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true（要啟用折疊才需要）
report_reason migration 會跑
L2 SQS/S3/worker 已備，部署即擷取

🤖 Generated with Claude Code

Comments currently share the short-content (moment) spam model. Add a dedicated commentSpamDetectionApiUrl (MATTERS_COMMENT_SPAM_DETECTION_API_URL) so comments score against the comment-specific model (e5-small + logreg, recall 0.88 vs 0.68 on unseen templates). Falls back to the short-content URL when unset, so behaviour is unchanged until ops points it at the new endpoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Comments are scored by the dedicated comment spam model (#4838) but nothing acts on the score yet. Articles already auto-demote via excludeSpam; this adds the comment equivalent, using the softer 'collapsed' state (folded but still expandable in-thread) per the '不刪除，只是不再被看見' governance principle. When MATTERS_COMMENT_SPAM_AUTO_COLLAPSE=true, detectSpam collapses an active comment whose spamScore reaches the tunable system spam threshold, skipping authors on the bypassSpamDetection whitelist (same carve-out as articles). Default off → scoring stays observe-only until ops opts in (zero-downtime, same rollout pattern as #4838). Collapse is reversible; no deletion. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The GraphQL ReportReason enum declares community_watch_porn_ad and community_watch_spam_ad, but report_reason_check (from 20231221154057) was never updated to permit them. submitReport inserts the raw reason, so any report with those values failed in production with a report_reason_check violation (INTERNAL_SERVER_ERROR) — surfaced by the coastguard bot's Tier-1 reports. Realign the DB constraint with the schema. communityWatchRemoveComment is unaffected (it syncs illegal_advertising, not these values). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

L2 of the spam-data-retention roadmap: emit de-identified labeled samples to SQS at the moderation boundary so the spam-model training signal survives later content deletion that L1's passive DB extraction can't recover — clearCommunityWatchOriginalContent nulls the snapshot, and account purge erases content. - common/notifications/spamSample.ts: enqueueSpamSample, mirrors enqueueReportAlert (best-effort SQS, never throws, no-op when unconfigured). Ids are HMAC-SHA256(salt) at emit so no raw user/content ids enter the queue; only the text the model trains on is carried verbatim. - wired: communityWatchRemoveComment (confirmed spam at removal), clearCommunityWatchOriginalContent (capture before the snapshot is nulled; reversed action -> hard-negative ham). - env: MATTERS_AWS_SPAM_SAMPLE_QUEUE_URL, MATTERS_SPAM_SAMPLE_HASH_SALT. A separate Lambda worker consumes the queue and appends de-identified rows to the S3 training bucket (see spam-detection-scaffold). Off until ops provisions the queue + salt. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

CI lint failed: #-subpath/external/node: imports are one alphabetized group with no blank lines. Reorder spamSample.ts imports accordingly.

…shot) The clear mutation now snapshots the action before nulling (axis-2 L2), so its test context must provide findCommunityWatchActionByUUID. enqueueSpamSample no-ops without queue/salt env, so nothing is sent in tests.

Mirror reportAlert.test.ts: payload shape, HMAC de-identification (ids hashed, never raw + deterministic), null score for ham, and no-op guards (queue unset / salt unset / blank text) + AWS-error swallowing. Brings spamSample.ts diff coverage to green.

CI test scripts only run build/{connectors,common/utils,routes,types}; the common/notifications dir has no script, so the standalone spamSample.test.ts never ran and spamSample.ts stayed at 38%. Remove that dead test and instead exercise enqueueSpamSample's full body from communityWatchRemoveComment.test (common/utils, which IS run): set the queue URL + hash salt, stub aws.sqsSendMessage, and assert a de-identified sample (hashed ids) is enqueued on removal.

Integration tests for the spam auto-collapse path: collapses an active comment at/above the system threshold, leaves it active below threshold, and skips bypassSpamDetection-whitelisted authors. Sets the spam_detection feature flag and toggles commentSpamAutoCollapse around each case. Raises diff coverage.

… project) spamSample.ts was at 76.9% (lines 66, 84-85 uncovered). Add two removal cases: aws throws -> removal still succeeds (covers the swallow/catch); removed comment has blank content -> sample skipped (covers the blank-text guard). Brings the file to ~full coverage so codecov/project no longer dips.

Total repo coverage fluctuates run-to-run (sharded integration suites) and codecov compares against the nearest ancestor with a coverage upload (develop merge commits publish none), so PRs show spurious project drops even when their own diff is 100% covered (e.g. #4846 at -0.46%). Add a 1% project threshold to absorb that noise; patch stays strict so new code must still be tested.

codecov · 2026-06-15T13:46:55Z

Codecov Report

❌ Patch coverage is 90.74074% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.45%. Comparing base (6b2a5c1) to head (4b98624).
⚠️ Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
src/connectors/commentService.ts	81.25%	2 Missing and 1 partial ⚠️
...3000000_alter_report_reason_add_community_watch.js	80.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4849      +/-   ##
==========================================
+ Coverage   72.40%   72.45%   +0.04%     
==========================================
  Files        1054     1056       +2     
  Lines       20908    20960      +52     
  Branches     4515     4550      +35     
==========================================
+ Hits        15139    15186      +47     
+ Misses       5699     5300     -399     
- Partials       70      474     +404

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mashbean and others added 11 commits June 15, 2026 21:35

style: fix import/order in spamSample (eslint required check)

e24f06c

CI lint failed: #-subpath/external/node: imports are one alphabetized group with no blank lines. Reorder spamSample.ts imports accordingly.

mashbean mentioned this pull request Jun 15, 2026

release: develop → production (留言防治 + quote-wall + campaign-discussion) #4848

Draft

mashbean marked this pull request as ready for review June 15, 2026 13:48

mashbean requested a review from a team as a code owner June 15, 2026 13:48

mashbean merged commit 89ac7bf into master Jun 15, 2026
4 of 5 checks passed

This was referenced Jun 15, 2026

chore: back-merge master → develop (dedupe spam release commits) #4850

Merged

deploy(comment-spam): 3-tier telegram alerting to prod (notify-only, curated) #4852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release(spam-only): comment spam defense → production (excludes 七日書)#4849

release(spam-only): comment spam defense → production (excludes 七日書)#4849
mashbean merged 11 commits into
masterfrom
release/comment-spam-only

mashbean commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashbean commented Jun 15, 2026

內容（cherry-pick 自 develop，僅留言 spam）

不含

性質

部署後

Uh oh!

codecov Bot commented Jun 15, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant