ARIA: extend promotional sender filter (prefixes, domains, clickbait)#352
Open
holoduke wants to merge 4 commits into
Open
ARIA: extend promotional sender filter (prefixes, domains, clickbait)#352holoduke wants to merge 4 commits into
holoduke wants to merge 4 commits into
Conversation
Drift audit 2026-05-20 flagged that the "Active threads" section of the working-memory prompt was being polluted by promotional/automated streams (currently the AutoScout24 "Nieuwe matches voor je Zoekopdracht" newsletter was the *only* active thread shown). Real signal-to-noise on that section had dropped to 0%. Defense in depth: - working-memory.ts: introduce isNewsletterParticipant() and reject new threads whose sender/chat matches noreply / no-reply / notifications. / newsletter / savedsearches / mailings. / updates@ / bounce, or known one-way notification domains (autoscout24, schoolkassa, rdw, anwb notifications). Also sweep any pre-existing newsletter threads on every update tick — fixes the currently-stuck AutoScout24 entry. - brain-prompt.ts: filter active threads at render time using the same helper, so even if a newsletter slips past the write-time guard via another path, the prompt stays clean. Intent-summary: Newsletter and automation senders were being promoted to "active conversation threads" in the working-memory prompt, crowding out real conversations Gillis is in. Intent-tokens: newsletter, noise, active-threads, prompt-pollution, working-memory, automation-sender, filter Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…agent run summaries The reflect-tick commitment-review surface was mining intra-run scratch text from sa_moltbook sub-agent run summaries/details (e.g. "I'll reply to the 6 highest-signal comments", "let me write a helper that handles verification") and surfacing them as personal commitments needing follow-through. Those phrases were sub-agent self-narration about actions it already executed within that same run — not promises to a human channel. Fix in buildCommitmentsBlock (backend/brain-prompt.ts): the Moltbook activity coming from getRecentMoltbookActivity() is sourced entirely from sub-agent run summary/details fields, so stop running extractAndClassifyCommitments() over it. Still show the activity for context, but explicitly label the section as "already executed, NOT personal commitments" and add an action-line note telling reflect not to treat those phrases as promises. extractAndClassifyCommitments() is still applied to recentOutgoingActivity (whatsapp DMs, email, brain messages to human channels), where the audience is actually a human and commitment language is meaningful. Intent-summary: phantom commitments were being surfaced from sub-agent intra-run narrative text because the commitment extractor did not distinguish sub-agent task transcripts from real human-channel messages. Intent-tokens: subagent, commitment, attribution, phantom, moltbook, transcript, narration Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, 150 chars The commitment-review block in buildCommitmentsBlock() listed all ~10 sa_moltbook sub-agent run summaries at 300 chars each. These are explicitly context-only / non-actionable (per 75a8129), so the full list wastes ~3KB of reflect-prompt budget every cycle without changing any decision. Slice to the 3 most recent entries and truncate each to 150 chars. Non-actionable labeling is unchanged; display volume only. Intent-summary: Non-actionable sub-agent run summaries flooded the reflect prompt, crowding out decision-relevant context with redundant noise. Intent-tokens: prompt, noise, truncation, moltbook, reflect, context, verbosity Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Today an AliExpress promo blast ("AliExpress <promotion@aliexpress.com>:
Uw voertuig wacht op u") still showed up as an active thread in the
working-memory prompt. The newsletter filter from 942b427 didn't catch
it because (a) the gmail-side PROMOTIONAL_SENDER_PATTERNS was a tiny
allowlist of exact regexes and (b) the working-memory NEWSLETTER_*
patterns didn't cover generic promotional mailbox prefixes or known
bulk-marketing domains.
Extensions:
- backend/integrations/gmail.ts — isPromotionalSender now also matches
strong promotional local-parts (promotion / promotions / marketing /
newsletter / news / deals / offers / mailing), known bulk-marketing
domains (aliexpress.com, temu.com, shein.com, wish.com, banggood.com,
matched with subdomain support), and weak prefixes (info / hello / hi)
*only when* the subject is clickbait. The clickbait subject patterns
cover both NL and EN: "wacht op u", "klik hier", "X% korting",
"laatste kans", "limited time", "act now", "alleen vandaag", etc.
Subject extraction was moved above the filter so it can be passed in.
- backend/memory/working-memory.ts — NEWSLETTER_SUBSTRINGS now also
matches promotion@ / promotions@ / marketing@ / news@ / newsletter@ /
deals@ / offers@ / info@ / mailing@ prefixes inside participant
strings, and NEWSLETTER_DOMAINS adds the same bulk-marketing
retailers. New isClickbaitTopic() helper applies the same clickbait
patterns to thread topics (first 60 chars of obs text, which for
emails starts with "[EMAIL] Subject: …"), used both at write time
and in the sweep step so already-stuck promo threads get evicted on
the next update tick.
- backend/brain-prompt.ts — render-time active-threads filter also
drops threads whose topic matches a clickbait pattern, even if the
sender slipped past the participant filter.
Defense in depth: same patterns applied at intake (gmail.ts),
thread-write (working-memory.ts), and prompt-render (brain-prompt.ts).
Note: observer.ts and history.ts were listed as candidate target files
but neither contains email-filtering logic — the real plumbing lives in
gmail.ts (intake) and working-memory.ts (active-threads). No changes
needed there.
Intent-summary: Promotional email blasts from generic mailbox prefixes (promotion@, marketing@) and mixed-use bulk-marketing domains (aliexpress.com, temu.com) were slipping past the newsletter-sender filter and showing up as active conversation threads in the prompt.
Intent-tokens: promotional, marketing, active-threads, prompt-pollution, clickbait, aliexpress, mailbox-prefix, filter
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Today an AliExpress promo blast (
AliExpress <promotion@aliexpress.com>: Subject: Uw voertuig wacht op u) showed up as an "active thread" in the working-memory prompt. The newsletter filter from #341 (commit 942b427) didn't catch it: gmail-sidePROMOTIONAL_SENDER_PATTERNSwas a tiny allowlist of exact regexes, and the working-memoryNEWSLETTER_*lists didn't cover generic promotional mailbox prefixes or known bulk-marketing domains.Extends the filter in three places (defense in depth):
backend/integrations/gmail.ts—isPromotionalSendernow also matches:promotion@/promotions@/marketing@/newsletter@/news@/deals@/offers@/mailing@),aliexpress.com,temu.com,shein.com,wish.com,banggood.com, with subdomain support),info@/hello@/hi@) only when the subject is clickbait — so real correspondence from a small business'sinfo@mailbox isn't dropped.Clickbait patterns cover NL+EN:
wacht op u,klik hier,X% korting/off,laatste kans,limited time,act now,alleen vandaag, etc. Subject is now extracted before the filter so it can be passed in.backend/memory/working-memory.ts—NEWSLETTER_SUBSTRINGSandNEWSLETTER_DOMAINSextended with the same prefixes and retail domains. NewisClickbaitTopic()helper applies the same clickbait patterns to thread topics. Applied at write-time (new threads) and in the sweep step (evicts already-stuck promo threads on the next tick — fixes the current AliExpress entry without manual cleanup).backend/brain-prompt.ts— render-time active-threads filter also rejects topics matching clickbait patterns, even if a sender slips past the participant filter.Why this layer matters
Active threads in the prompt are valuable context budget — they should only contain human conversations Gillis is currently engaged in. A promotional blast from
promotion@aliexpress.comtitled "Uw voertuig wacht op u" (he doesn't own that vehicle; it's pure clickbait) crowds out signal.Notes
observer.tsandhistory.tswere listed as candidate target files but neither contains email-filtering logic — the real plumbing lives ingmail.ts(intake) andworking-memory.ts(active-threads). No changes needed there.Test plan
npx tsc --noEmitfrom/app— cleanwm.conversationThreadson next update tickinfo@<localbusiness>correspondence (non-clickbait subjects) still flows through🤖 Generated with Claude Code