Skip to content

ARIA: extend promotional sender filter (prefixes, domains, clickbait)#352

Open
holoduke wants to merge 4 commits into
mainfrom
aria/extend-promotional-filter
Open

ARIA: extend promotional sender filter (prefixes, domains, clickbait)#352
holoduke wants to merge 4 commits into
mainfrom
aria/extend-promotional-filter

Conversation

@holoduke

Copy link
Copy Markdown
Owner

Summary

Today an AliExpress promo blast (AliExpress <promotion@aliexpress.com>: Subject: Uw voertuig wacht op u) showed up as an "active thread" in the working-memory prompt. The newsletter filter from #341 (commit 942b427) didn't catch it: gmail-side PROMOTIONAL_SENDER_PATTERNS was a tiny allowlist of exact regexes, and the working-memory NEWSLETTER_* lists didn't cover generic promotional mailbox prefixes or known bulk-marketing domains.

Extends the filter in three places (defense in depth):

  • backend/integrations/gmail.tsisPromotionalSender now also matches:

    • strong promotional local-parts (promotion@ / promotions@ / marketing@ / newsletter@ / news@ / deals@ / offers@ / mailing@),
    • known bulk-marketing domains (aliexpress.com, temu.com, shein.com, wish.com, banggood.com, with subdomain support),
    • weak prefixes (info@ / hello@ / hi@) only when the subject is clickbait — so real correspondence from a small business's info@ mailbox isn't dropped.
      Clickbait patterns cover NL+EN: wacht op u, klik hier, X% korting/off, laatste kans, limited time, act now, alleen vandaag, etc. Subject is now extracted before the filter so it can be passed in.
  • backend/memory/working-memory.tsNEWSLETTER_SUBSTRINGS and NEWSLETTER_DOMAINS extended with the same prefixes and retail domains. New isClickbaitTopic() helper applies the same clickbait patterns to thread topics. Applied at write-time (new threads) and in the sweep step (evicts already-stuck promo threads on the next tick — fixes the current AliExpress entry without manual cleanup).

  • backend/brain-prompt.ts — render-time active-threads filter also rejects topics matching clickbait patterns, even if a sender slips past the participant filter.

Why this layer matters

Active threads in the prompt are valuable context budget — they should only contain human conversations Gillis is currently engaged in. A promotional blast from promotion@aliexpress.com titled "Uw voertuig wacht op u" (he doesn't own that vehicle; it's pure clickbait) crowds out signal.

Notes

observer.ts and history.ts were listed as candidate target files but neither contains email-filtering logic — the real plumbing lives in gmail.ts (intake) and working-memory.ts (active-threads). No changes needed there.

Test plan

  • npx tsc --noEmit from /app — clean
  • After deploy: confirm the AliExpress thread is evicted from wm.conversationThreads on next update tick
  • Confirm legitimate info@<localbusiness> correspondence (non-clickbait subjects) still flows through

🤖 Generated with Claude Code

ARIA and others added 4 commits May 20, 2026 22:06
Drift audit 2026-05-20 flagged that the "Active threads" section of the
working-memory prompt was being polluted by promotional/automated streams
(currently the AutoScout24 "Nieuwe matches voor je Zoekopdracht" newsletter
was the *only* active thread shown). Real signal-to-noise on that section
had dropped to 0%.

Defense in depth:
- working-memory.ts: introduce isNewsletterParticipant() and reject new
  threads whose sender/chat matches noreply / no-reply / notifications. /
  newsletter / savedsearches / mailings. / updates@ / bounce, or known
  one-way notification domains (autoscout24, schoolkassa, rdw, anwb
  notifications). Also sweep any pre-existing newsletter threads on every
  update tick — fixes the currently-stuck AutoScout24 entry.
- brain-prompt.ts: filter active threads at render time using the same
  helper, so even if a newsletter slips past the write-time guard via
  another path, the prompt stays clean.

Intent-summary: Newsletter and automation senders were being promoted to "active conversation threads" in the working-memory prompt, crowding out real conversations Gillis is in.
Intent-tokens: newsletter, noise, active-threads, prompt-pollution, working-memory, automation-sender, filter

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…agent run summaries

The reflect-tick commitment-review surface was mining intra-run scratch
text from sa_moltbook sub-agent run summaries/details (e.g. "I'll reply
to the 6 highest-signal comments", "let me write a helper that handles
verification") and surfacing them as personal commitments needing
follow-through. Those phrases were sub-agent self-narration about
actions it already executed within that same run — not promises to a
human channel.

Fix in buildCommitmentsBlock (backend/brain-prompt.ts): the Moltbook
activity coming from getRecentMoltbookActivity() is sourced entirely
from sub-agent run summary/details fields, so stop running
extractAndClassifyCommitments() over it. Still show the activity for
context, but explicitly label the section as "already executed, NOT
personal commitments" and add an action-line note telling reflect not
to treat those phrases as promises.

extractAndClassifyCommitments() is still applied to
recentOutgoingActivity (whatsapp DMs, email, brain messages to human
channels), where the audience is actually a human and commitment
language is meaningful.

Intent-summary: phantom commitments were being surfaced from sub-agent intra-run narrative text because the commitment extractor did not distinguish sub-agent task transcripts from real human-channel messages.
Intent-tokens: subagent, commitment, attribution, phantom, moltbook, transcript, narration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, 150 chars

The commitment-review block in buildCommitmentsBlock() listed all ~10
sa_moltbook sub-agent run summaries at 300 chars each. These are
explicitly context-only / non-actionable (per 75a8129), so the full
list wastes ~3KB of reflect-prompt budget every cycle without changing
any decision. Slice to the 3 most recent entries and truncate each to
150 chars. Non-actionable labeling is unchanged; display volume only.

Intent-summary: Non-actionable sub-agent run summaries flooded the reflect prompt, crowding out decision-relevant context with redundant noise.
Intent-tokens: prompt, noise, truncation, moltbook, reflect, context, verbosity

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Today an AliExpress promo blast ("AliExpress <promotion@aliexpress.com>:
Uw voertuig wacht op u") still showed up as an active thread in the
working-memory prompt. The newsletter filter from 942b427 didn't catch
it because (a) the gmail-side PROMOTIONAL_SENDER_PATTERNS was a tiny
allowlist of exact regexes and (b) the working-memory NEWSLETTER_*
patterns didn't cover generic promotional mailbox prefixes or known
bulk-marketing domains.

Extensions:

- backend/integrations/gmail.ts — isPromotionalSender now also matches
  strong promotional local-parts (promotion / promotions / marketing /
  newsletter / news / deals / offers / mailing), known bulk-marketing
  domains (aliexpress.com, temu.com, shein.com, wish.com, banggood.com,
  matched with subdomain support), and weak prefixes (info / hello / hi)
  *only when* the subject is clickbait. The clickbait subject patterns
  cover both NL and EN: "wacht op u", "klik hier", "X% korting",
  "laatste kans", "limited time", "act now", "alleen vandaag", etc.
  Subject extraction was moved above the filter so it can be passed in.

- backend/memory/working-memory.ts — NEWSLETTER_SUBSTRINGS now also
  matches promotion@ / promotions@ / marketing@ / news@ / newsletter@ /
  deals@ / offers@ / info@ / mailing@ prefixes inside participant
  strings, and NEWSLETTER_DOMAINS adds the same bulk-marketing
  retailers. New isClickbaitTopic() helper applies the same clickbait
  patterns to thread topics (first 60 chars of obs text, which for
  emails starts with "[EMAIL] Subject: …"), used both at write time
  and in the sweep step so already-stuck promo threads get evicted on
  the next update tick.

- backend/brain-prompt.ts — render-time active-threads filter also
  drops threads whose topic matches a clickbait pattern, even if the
  sender slipped past the participant filter.

Defense in depth: same patterns applied at intake (gmail.ts),
thread-write (working-memory.ts), and prompt-render (brain-prompt.ts).

Note: observer.ts and history.ts were listed as candidate target files
but neither contains email-filtering logic — the real plumbing lives in
gmail.ts (intake) and working-memory.ts (active-threads). No changes
needed there.

Intent-summary: Promotional email blasts from generic mailbox prefixes (promotion@, marketing@) and mixed-use bulk-marketing domains (aliexpress.com, temu.com) were slipping past the newsletter-sender filter and showing up as active conversation threads in the prompt.
Intent-tokens: promotional, marketing, active-threads, prompt-pollution, clickbait, aliexpress, mailbox-prefix, filter

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant