Skip to content

fix(channel): keep spam out of topic channel feeds#4864

Merged
mashbean merged 1 commit into
masterfrom
codex/prod-channel-spam-exclusion
Jun 21, 2026
Merged

fix(channel): keep spam out of topic channel feeds#4864
mashbean merged 1 commit into
masterfrom
codex/prod-channel-spam-exclusion

Conversation

@mashbean

Copy link
Copy Markdown
Contributor

Why

Topic channel feeds could surface spam articles because findTopicChannelArticles only filtered article.is_spam, ignoring high spam_score rows where is_spam is still null. The channel assignment path also cleared article.is_spam to false whenever an article was added or re-enabled in a channel, which could override moderation/model state.

This is especially risky after the article spam model was updated and deployed, because high spam_score should now be honored consistently in channel feeds.

What changed

  • Use the shared excludeSpam modifier in topic channel article queries, so unpinned channel feeds honor both is_spam and spam_score >= spam_threshold.
  • Stop clearing article.is_spam when assigning topic channels.
  • Update tests to lock the new behavior and cover the high-score channel exclusion case.

Production audit SQL

with spam_threshold as (
  select value::numeric as threshold
  from feature_flag
  where name = 'spam_detection'
    and flag = 'on'
  limit 1
)
select
  tc.name as channel_name,
  count(*) as leaked_count,
  count(*) filter (where a.is_spam = false and a.spam_score >= st.threshold) as manual_false_high_score_count,
  count(*) filter (where a.is_spam is null and a.spam_score >= st.threshold) as null_high_score_count
from topic_channel_article tca
join topic_channel tc on tc.id = tca.channel_id
join article a on a.id = tca.article_id
cross join spam_threshold st
where tca.enabled = true
  and a.state = 'active'
  and a.channel_enabled = true
  and tc.name in ('生活', '書影音', '時事', '還有')
  and a.created_at >= now() - interval '7 days'
  and (
    a.is_spam = true
    or (a.is_spam is null and a.spam_score >= st.threshold)
    or (a.is_spam = false and a.spam_score >= st.threshold)
  )
group by tc.name
order by leaked_count desc;

Verification

  • npm run build
  • npm run lint
  • npx prettier --check src/connectors/channel/channelService.ts src/connectors/__test__/channelService/topicChannel.test.ts src/connectors/__test__/channelService/findTopicChannelArticles.test.ts

Connector Jest was attempted but blocked in this local environment because Postgres is not running, no MATTERS_PG_* env vars are set, and Docker is unavailable. The failure occurs during genConnections() setup before test assertions run.

@mashbean mashbean requested a review from a team as a code owner June 20, 2026 11:24
@mashbean mashbean force-pushed the codex/prod-channel-spam-exclusion branch from 72d5d4a to 19b7eba Compare June 20, 2026 14:08
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.12%. Comparing base (89f67f5) to head (73dd7a6).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4864      +/-   ##
==========================================
- Coverage   73.12%   73.12%   -0.01%     
==========================================
  Files        1081     1081              
  Lines       21644    21641       -3     
  Branches     4735     4735              
==========================================
- Hits        15828    15825       -3     
  Misses       5339     5339              
  Partials      477      477              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mashbean mashbean force-pushed the codex/prod-channel-spam-exclusion branch from 19b7eba to 73dd7a6 Compare June 20, 2026 14:55
@mashbean

Copy link
Copy Markdown
Contributor Author

Hotfix exception approval recorded on 2026-06-21 Asia/Taipei.

Reason: production topic channel feeds can expose spam-scored articles, and channel assignment can clear existing spam state. User-facing topic channel surfaces including 生活, 書影音, 時事, and 還有 are affected.

SOP evidence:

  • Base branch: master
  • Head commit: 73dd7a6
  • Regression guard after rebasing onto latest origin/master: TARGET_AHEAD=0, HEAD_AHEAD=1, hot-zone empty, deletion-only sentinel scan clean
  • GitHub checks: WIP pass, build / build pass for push and pull_request runs, codecov/patch pass, codecov/project pass
  • Production action boundary: approval covers hotfix exception and merge path; post-merge production smoke should remain read-only unless a separate moderation mutation is explicitly approved.

@mashbean mashbean merged commit f5df615 into master Jun 21, 2026
5 checks passed
@mashbean

Copy link
Copy Markdown
Contributor Author

Clarification after reviewing publicationService.ts line 627:

The automatic post-publication path already computes spam from score before channel classification and returns early when the article is spam, so high-score spam should not be sent to automatic channel classification from that path. _detectSpam persists spam_score; the _isSpam decision in post-processing is local unless another path has persisted article.is_spam.

The remaining production risks addressed by this PR are narrower and still valid:

  • topic channel feed queries must exclude is_spam IS NULL AND spam_score >= threshold for articles that reach topic_channel_article through other paths
  • admin/manual topic-channel writes and topic-channel feedback acceptance must not clear an already persisted article.is_spam = true to false

This also means historical rows with is_spam = false AND spam_score >= threshold may need a separate read-only audit before any data correction, because the query intentionally treats persisted is_spam = false as ham.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant