Skip to content

Remove email summarization from ReadEmailsTool and fix Zoho client is…#861

Closed
alifeinbinary wants to merge 44 commits intojaredlockhart:mainfrom
alifeinbinary:zoho-cleanup
Closed

Remove email summarization from ReadEmailsTool and fix Zoho client is…#861
alifeinbinary wants to merge 44 commits intojaredlockhart:mainfrom
alifeinbinary:zoho-cleanup

Conversation

@alifeinbinary
Copy link
Copy Markdown
Contributor

@alifeinbinary alifeinbinary commented Mar 25, 2026

…sues

ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly.

Changes:

  • Remove OllamaClient and user_query params from ReadEmailsTool
  • Return raw email content joined with separators instead of summary
  • Remove ReadEmailsArgs Pydantic model (use kwargs directly)
  • Remove EMAIL_SUMMARIZE

this modifies penny/commands/email.py and read_emails.py as well, FYI. I'm getting better summaries, faster, with this approach.

…sues

ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly.

Changes:
- Remove OllamaClient and user_query params from ReadEmailsTool
- Return raw email content joined with separators instead of summary
- Remove ReadEmailsArgs Pydantic model (use kwargs directly)
- Remove EMAIL_SUMMARIZE
@alifeinbinary
Copy link
Copy Markdown
Contributor Author

I'll fix the failing test and update the PR with the next commit.

Copy link
Copy Markdown
Contributor

@penny-team penny-team bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Andrew, removing the Ollama summarization layer is a solid win — faster and the agent already has the content in context. The Zoho client changes have some regressions though. See inline comments for details. Consider splitting into two PRs (ReadEmailsTool cleanup vs Zoho changes).

Copy link
Copy Markdown
Contributor

@penny-team penny-team bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see inline comments below)

args = ReadEmailsArgs(**kwargs)
email_ids = args.email_ids
"""Read emails and return their content."""
email_ids = kwargs["email_ids"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project convention (from CLAUDE.md): "every Tool.execute(**kwargs) must validate through a Pydantic args model as its first line." ReadEmailsArgs should be kept:

args = ReadEmailsArgs(**kwargs)
email_ids = args.email_ids

"Dec",
]

class _HTMLTextExtractor(html.parser.HTMLParser):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the shared penny/html_utils.py module which already has an identical _HTMLTextExtractor and strip_html. Please keep using from penny.html_utils import strip_html instead of copying the implementation here.

# Use 'entire:' for full-text search across all email content
# Quote if contains spaces for exact phrase matching
if " " in text:
search_parts.append(f'entire:"{text}"')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the .replace('"', "") sanitization is a regression. If the LLM passes a query containing double quotes, it'll break the Zoho search syntax. The existing quote escaping should be preserved for text, from_addr, and subject.

if zoho_date:
search_parts.append(f"toDate:{zoho_date}")
search_parts.append(f"subject:{subject}")
# Note: Zoho date format is DD-MMM-YYYY (e.g., 12-Sep-2017)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the ISO 8601 to Zoho date conversion (_to_zoho_date / _convert_to_zoho_date). LLMs naturally produce ISO dates like 2026-01-15, but now those will silently fail the _is_valid_zoho_date check and date filtering won't work. The conversion logic should be kept.


return f"{folder_id}:{message_id}"

def _parse_email_id(self, email_id: str) -> tuple[str | None, str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_email_id() is defined but never called anywhere. Dead code — hold it until the caller is ready.

return ""
try:
ts_int = int(ts)
from datetime import UTC, datetime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving from datetime import UTC, datetime from top-level into a method body is non-standard. Keep it at the top with the other imports.

if text_body and re.search(r"<[a-zA-Z][^>]*>", text_body):
text_body = strip_html(text_body)
# Strip HTML if content appears to be HTML
if text_body and "<" in text_body:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from re.search(r"<[a-zA-Z][^>]*>", text_body) (matches actual HTML tags) to "<" in text_body (matches any less-than sign). This will false-positive on email content with math expressions, comparisons, or angle brackets. Keep the regex check.

mock_agent_instance.run.assert_called_once_with(
"what packages am I expecting", max_steps=email_context.config.email_max_steps
)
mock_agent_instance.run.assert_called_once_with("what packages am I expecting")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

email.py line 67 still passes max_steps=context.config.email_max_steps to agent.run(), so this assertion will fail — the mock receives max_steps but the assertion doesn't expect it.

penny-team bot and others added 17 commits March 30, 2026 14:20
…liest (jaredlockhart#863)

_find_unrolled_weeks used get_recent(limit=1) which returns the most
recent daily entry, but treated it as the earliest. When the most recent
entry is in the current week, first_monday == current_monday and the
scan loop never executes — so no completed weeks are ever found.

- Add get_earliest() to HistoryStore (ASC ordering)
- Use get_earliest() in _find_unrolled_weeks
- Update test to seed current-week entries alongside past weeks

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…wns (jaredlockhart#864)

* Improve notification scoring, thinking distribution, and topic cooldowns

- Normalize novelty and sentiment scores to [0,1] via min-max scaling before
  applying weights, so both dimensions contribute proportionally instead of
  novelty dominating due to its ~4x larger raw range
- Add per-topic 24h notification cooldown: once a preference (or free thought)
  is notified, that topic is excluded from candidates for 24 hours
- Add MAX_UNNOTIFIED_THOUGHTS config param (default 20) — thinking agent skips
  cycles when unnotified thoughts reach the cap
- Replace random-roll thinking mode selection with distribution-based steering:
  compare actual free/seeded ratio against target probabilities and pick
  whichever type is underrepresented
- Add ThoughtStore.count_unnotified() and count_unnotified_free() queries
- Add THOUGHT_TOPIC_COOLDOWN_SECONDS constant (86400)
- 12 new tests covering normalization, cooldown, cap, and distribution logic
- All existing tests updated to monkeypatch probability constants for
  determinism independent of production values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move thinking distribution constants to runtime config params

FREE_THINKING_PROBABILITY and NEWS_THINKING_PROBABILITY are now runtime-
configurable via /config instead of hardcoded constants. The seeded
probability is implicit (1 - free - news). Tests pass probabilities
through make_config() instead of monkeypatching PennyConstants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ockhart#865)

* Move scoring weights to runtime config params (default 50/50)

NOVELTY_WEIGHT and SENTIMENT_WEIGHT are now runtime-configurable via
/config instead of hardcoded constants. Default changed from 40/60 to
50/50 for equal weighting now that normalization makes both dimensions
comparable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix thinking agent flooding logs when at unnotified cap

When MAX_UNNOTIFIED_THOUGHTS is reached, get_prompt returned None which
made execute_for_user return False. The scheduler treated that as "no
work" and retried every tick (~1s), flooding the log.

Move the cap check to execute_for_user and return True when skipping,
so the scheduler calls mark_complete and waits for the next interval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt#866)

Append -site: exclusions for blocked domains (facebook, instagram, tiktok)
to the Serper query so Google filters them server-side. Previously we only
filtered after download, so queries dominated by these domains returned no
image at all.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The THINKING_REPORT_PROMPT was producing thoughts framed as corrections
or debunking ("it turns out X is NOT Y"), which sounds wrong in
spontaneous notifications where there's nothing to correct. Updated the
prompt to frame findings as standalone new discoveries and to discard
searches that only found something doesn't exist.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add topic context intro to notify prompt

Thought notifications were jumping straight into details without
establishing what the topic is, leaving the reader confused (e.g.,
"Kokoroko's new RSD-2026 vinyl..." with no mention that Kokoroko is
a band). Updated NOTIFY_SYSTEM_PROMPT to instruct the model to open
with a brief identifying phrase before diving into details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add full system prompt assertions for news and checkin notify modes

Extends the test coverage pattern to all three notification modes.
ThoughtMode already had a full prompt assertion; now NewsMode and
CheckinMode do too, catching structural drift in prompt composition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…redlockhart#871)

* Make thought notifications conversational instead of report-style

The thinking report prompt produces structured content (bullets, headers,
tables) which the notify agent was regurgitating verbatim. Changed the
instruction from "Share what's in it — the thought IS the substance"
to "Retell it conversationally — no bullet lists, no headers, no tables"
so notifications read like a friend explaining what they found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix flaky schedule test by polling for expected message content

Replace wait_for_message (returns last message, vulnerable to race
conditions) with wait_until + _has_message pattern that polls for
the specific expected content. This matches the convention used by
the rest of the test suite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#872)

* Steer thinking agent away from troubleshooting/support content

The thinking agent was searching for bug reports and support articles
(e.g., "UAD plugin glitch") and surfacing them as interesting
discoveries. Added guidance to look for releases, creative work, and
discoveries while avoiding troubleshooting guides and bug reports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add casual greeting to proactive notifications

Notifications were jumping straight into content without a greeting.
Added "Start with a casual greeting" to NOTIFY_SYSTEM_PROMPT,
matching the pattern already used by the news notification prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#873)

NEWS_NOTIFY_MAX_STEPS was 1, but the agent base class strips tools on
the final step. With only 1 step, fetch_news could never execute —
the model's tool call was discarded as "hallucinated on final step"
and every news attempt produced an empty response that got
disqualified. Bumped to 3 steps so the model can call the tool and
format results.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#874)

* Use thought title as image search fallback for notifications

When a thought notification has no tool calls (model retells thought
context directly), the image search fell back to using the full
message text, producing bad image results. Now ThoughtMode extracts
the first bold headline from the thought content as the image query
(e.g., "Bad Cat Era 30 – A Hand-Wired EL84 Head"), which is a much
better match for finding a relevant product/topic image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Strip generic prefixes from thought titles for image search

Thought titles like "Briefing: Tone King Royalist" or "Here is
something interesting I learned about the Vox AC15HWR1" had generic
prefixes that diluted image search results. Added _clean_thought_title
that strips common prefixes (Briefing:, Detailed Briefing:, etc.)
and filters out completely generic titles. Tested against 100 recent
thoughts: 97/100 produce good image queries after cleaning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Reduce fuzzy duplicate preference extraction

The preference extraction prompt was creating near-duplicate preferences
like "Tubesteader Eggnog user reviews" and "Tubesteader Eggnog 12AX7
pre-amp" when "Tubesteader pedals" already existed. These slipped past
both TCR and embedding dedup because short strings with slightly
different wording produce low similarity scores.

Added explicit guidance that asking about reviews, specs, or details of
a known item is engagement with the existing preference, not a new one.
Added a concrete example matching the observed failure pattern.

Dry-ran against the actual prompt that produced the duplicate — 3/3 runs
correctly classified it as existing instead of new.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Skip questions and tasks in preference extraction

The model was extracting questions and troubleshooting requests as
preferences (e.g., "Running preamp into front of amp", "preamp output
confusion", "pedals powered via XLink Out"). Added explicit guidance
to skip questions, tasks, and troubleshooting requests.

Dry-ran against 4 prompts that produced task preferences — all 4
previously-bad extractions are now suppressed or significantly reduced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Cache embeddings on thought and messagelog tables

Thoughts and outgoing messages were being re-embedded from scratch on
every dedup check and novelty comparison. Added embedding BLOB columns
to both tables so embeddings are computed once and reused.

- Migration 0014: adds embedding column to thought and messagelog
- ThinkingAgent: embeds and stores at thought creation time, uses
  cached embeddings in dedup (skips thoughts without embeddings)
- NotifyAgent: uses cached message embeddings for novelty scoring,
  backfills on first access
- Startup backfill job extended to populate thought embeddings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Embed messages at insert time and backfill at startup

Messages were being lazily backfilled in the notify agent on read.
Moved embedding to send_response (insert time) so every outgoing
message gets its embedding cached immediately. Added startup backfill
for existing messages without embeddings, and a test assertion that
thoughts get embeddings stored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The NOTIFY_NEWS prompt said "the source in parentheses" which the
model interpreted as the outlet name (e.g., "(New York Times)") rather
than the actual URL from the tool results. Changed to "the source URL
from the tool results" so URLs are included.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chat was instructed to "Focus on ONE topic per response" and "go deep"
which produced narrow answers that missed important angles (e.g.,
trauma/immune question only covered physical trauma, ignored PTSD).
Changed to "Go WIDE: cover as many angles as possible" with multiple
search queries and follow-up searches for comprehensive answers.

Thinking mode stays go-deep (autonomous exploration of one thread).
Chat mode is now go-wide (user wants the full picture).

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#879)

* Skip daily history entries already covered by weekly rollups

The history context was including both weekly rollups AND their
constituent daily entries, causing duplicate topics in the system
prompt. Now _format_daily_entries checks each day against the weekly
rollup date ranges and skips days that fall within a completed week.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add test for daily/weekly history overlap filtering

Verifies that daily entries within a weekly rollup's date range are
excluded from the history context, while daily entries outside the
range are still included.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changed THINKING_REPORT_PROMPT from structured report format (tables,
headers, 500 words) to conversational message format (casual greeting,
details, URL, 300 words). Thoughts are now stored in the shape they'll
be shared, cutting context size in half.

Loosened NOTIFY_SYSTEM_PROMPT to relay the thought as-is instead of
re-summarizing. Old prompt: "Retell it conversationally, no bullets/
headers/tables." New prompt: "Share it with the user, don't compress
or summarize, just relay in your own voice."

Tested end-to-end on 3 examples: new pipeline produces notifications
with equal or better detail than the original two-step process.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

The "No greetings, no sign-offs" rule was in PENNY_IDENTITY which is
shared by all agents, causing proactive notifications to skip greetings
even though the notify prompt said to include one. Moved the rule to
CONVERSATION_PROMPT so it only applies when responding to user messages.

Also removed the greeting from THINKING_REPORT_PROMPT since the notify
agent now handles greetings — the stored thought shouldn't include one.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
penny-team bot and others added 26 commits March 30, 2026 14:20
…redlockhart#882)

* Score thoughts by cached embedding before generating notification

Previously generated N candidates through the model, then scored them.
Now scores the raw thoughts using cached embeddings (novelty +
sentiment), picks the winner, then runs only the winner through the
notify agent. With NOTIFY_CANDIDATES=5, this cuts model calls from
5 to 1 per notification cycle.

Possible because thoughts are now stored in notification-ready shape
with pre-computed embeddings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add integration test for embedding-based thought scoring

Tests the full notification flow with 3 thought candidates: seeds DB
with preferences, thoughts with embeddings, and an incoming message,
then runs execute_for_user and asserts a notification was sent and
exactly 1 of 3 thoughts was marked notified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Assert on full call chain in embedding scoring test

Verify every edge of the score-then-generate flow:
- 1 Ollama chat call (winner only, not all candidates)
- 1 embed call (outgoing message at send time, not during scoring)
- 1 serper image search
- Message delivered via Signal
- 2 of 3 thoughts remain unnotified
- 1 thought marked notified in DB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Simplify image search to use thought content directly

The bold-title extraction and prefix-cleaning logic was built for
the old structured report format. With conversational thoughts, bold
titles are rare. Now uses first 300 chars of thought content as the
image query — the subject name consistently appears in the first
sentence or two, and serper is smart enough to extract it.

Removed dead code: _clean_thought_title, _is_generic_title,
_TITLE_STRIP_PREFIXES.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add thought title for dedup and image search

The thinking report prompt now emits a 'Topic: <title>' line that gets
parsed and stored separately. Titles are short (e.g., "Tubesteader
Beekeeper pedal") so they embed closely for duplicates and work well
as image search queries.

Key changes:
- Migration 0015: adds title column to thought table
- THINKING_REPORT_PROMPT: emits 'Topic: ...' on last line
- ThinkingAgent: parses title, embeds title (not content), stores both
- Thought dedup: now global (all thoughts, not per-preference) using
  TCR_OR_EMBEDDING on titles — catches cross-preference duplicates
- Image search: uses thought.title when available
- New runtime config: THOUGHT_DEDUP_TCR_THRESHOLD (default 0.6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Separate title and content embeddings on thoughts

Title embedding for dedup (short string, high discrimination),
content embedding for novelty/sentiment scoring (full message vs
messages/preferences). Both computed at creation time and cached.

- Added title_embedding column to thought table (migration 0015)
- ThinkingAgent stores both embeddings at creation
- Dedup uses title_embedding, scoring uses embedding (content)
- Added THOUGHT_DEDUP_TCR_THRESHOLD runtime config param (0.6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#884)

OR strategy produced false positives from common short words ("2026",
"AI", "agent") matching via TCR on short titles. Switched to AND
(both TCR >= 0.6 AND embedding >= 0.6 required) which eliminates
all false positives while catching real duplicates.

Also lowercase titles before embedding so casing doesn't affect
similarity (e.g., "THE GHOST IN THE SHELL" vs "Ghost in the Shell"
was 0.381, now 0.652 after lowercasing).

Lowered THOUGHT_DEDUP_EMBEDDING_THRESHOLD default from 0.80 to 0.60
since title embeddings score lower than full-content embeddings.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ed (jaredlockhart#896)

* Add browser extension with WebSocket server and dev tooling

Browser sidebar extension connects to Penny via WebSocket (echo-only for now).
Adds web-ext dev setup with auto-reload, exposes port 9090 from Docker,
and wires up BROWSER_ENABLED config to start the server alongside Signal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add multi-channel architecture with device routing and shared history

ChannelManager implements MessageChannel as a routing proxy — all agents,
scheduler, and commands interact with it instead of a single channel.
Messages from any device (Signal, browser) resolve to the same user
identity, giving full conversation continuity across channels.

New: Device table + DeviceStore, ChannelManager, BrowserChannel (full
MessageChannel), migration 0016, ChannelType enum, browser sidebar
device registration flow. 418 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add browser HTML formatting, image URLs, reconnect indicator, and single-user fix

BrowserChannel.prepare_outgoing converts markdown to HTML (bold, italic,
code, links, tables-to-bullets). Images use URLs via search_image_url
instead of base64 download, rendered as <img> tags prepended to messages.
Sidebar shows reconnecting spinner. Background agents use get_primary_sender
from UserInfo instead of mining MessageLog for user identity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Set up TypeScript, typed protocol, light/dark theme, and streamlined UI

Converts browser extension to TypeScript with strict mode. Shared
protocol.ts defines typed constants and discriminated unions for the
WebSocket protocol. CSS refactored to custom properties with
prefers-color-scheme for automatic light/dark support. Header removed,
status indicator is now a minimal dot at bottom-right of messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Persist chat history in browser local storage with smart scrolling

Messages stored in browser.storage.local (capped at 200) and rehydrated
on sidebar open. New messages scroll to show the top of the message;
rehydration jumps to bottom instantly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move WebSocket to background script, sidebar uses runtime messaging

Background script owns the server connection and persists across sidebar
open/close. Sidebar communicates via browser.runtime messaging with typed
RuntimeMessage protocol. Connection state synced on sidebar open via port.
Smart scroll: short messages anchor at bottom, long messages show top first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add browse_url tool with hidden tab, content extraction, and domain permissions

First browser tool: browse_url opens a hidden tab with full web engine and
user session, injects a content script to extract visible text, then the
server summarizes it in a sandboxed model call before the agent sees it.

Domain permission flow: unknown domains prompt the user via sidebar dialog,
decisions stored for future calls. Tool available dynamically to chat and
thinking agents when a browser is connected.

Protocol: tool_request/tool_response RPC over WebSocket with correlation IDs.
BrowserChannel resolves asyncio Futures when responses arrive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix single-user identity resolution for commands, reactions, and startup

Commands, reactions, and command logs now resolve device identifiers to the
primary user sender via _resolve_user_sender. Startup announcement uses
get_primary_sender and skips when no message history exists. Tests added
for user sender resolution and startup skip behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix /draw in browser by handling raw base64 and data URI attachments

_prepend_images now supports three attachment formats: HTTP URLs, data URIs,
and raw base64 (wrapped as data:image/png). Previously only HTTP URLs were
rendered, so /draw output was silently dropped in the browser sidebar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add active tab context injection for browser sidebar messages

Background script extracts visible text from the active tab on tab switch
and page load, holds it in a buffer, and attaches it to chat messages.
Server injects it into the chat agent's system prompt as a
"Current Browser Page" context section. Truncated to 5,000 chars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix scroll positioning by re-scrolling after image load

scrollIntoView fires before images render, so offsetHeight is wrong
for messages with images. Now re-scrolls on each img load event to
account for the final dimensions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Replace content extraction with Defuddle, inject page context as synthetic tool call

Content script now uses Defuddle for smart page extraction (strips nav,
sidebars, boilerplate) with CSS heuristic and TreeWalker fallbacks.
Bundled via esbuild since content scripts can't use imports.

Page context injected as a synthetic browse_url tool call + result in the
message history instead of system prompt. The model sees a pre-completed
tool exchange and answers from it directly. System prompt carries a minimal
hint (title + URL) to disambiguate "this page" references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add page context toggle, og:image extraction, and flush image styling

Sidebar shows current page title with checkbox to include page content.
Content script extracts og:image metadata. Responses to page-context
messages show the page image and "In response to" link inside the message
bubble. All images in Penny messages now render flush to bubble edges
with matching border-radius. Input disabled while waiting for response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update browser extension architecture doc with implementation status

Reflects all completed work: multi-channel architecture, device table,
browse_url tool, active tab context, Defuddle extraction, permission
flow, TypeScript protocol, page context toggle, and additional features
not in the original plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add thoughts feed page with new/archive tabs, image URLs, and modal viewer

Feed page renders thoughts as a card grid with images, titles, seed topic
bylines, and HTML-formatted content (via server-side prepare_outgoing).
New/Archive tabs split by notified_at. Clickable cards open a modal with
full content. Sidebar nav bar links to feed page.

image_url stored on Thought model at creation time. Startup backfill
populates existing thoughts in parallel batches. Migration 0017 adds
image_url column. Seed topic resolved from preference FK for bylines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add thought reactions, unnotified count, Font Awesome icons, and periodic polling

Thumbs up/down on feed cards and modal overlay — logs reaction as incoming
message with parent_id to synthetic outgoing (same pipeline as Signal
reactions for preference extraction), marks thought notified, fades card.

Font Awesome installed locally (no CDN). Sidebar nav shows unnotified
thought count. Background polls thoughts every 5 minutes for fresh count.
Reaction buttons float on card corners with hover color effects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add Penny logo with transparent background, extension icons, and Signal avatar

penny.png made transparent and resized to 48px/96px for extension icons.
Added to README header. Signal profile picture set via signal-cli-rest-api
PUT /v1/profiles endpoint. New `make signal-avatar` target for setting it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add Penny logo, SVG icons, thought reactions, feed polish, and image backfill

Logo: penny.svg traced from PNG via potrace, auto-cropped, rendered to
16/32/48/96px PNGs from SVG for crisp icons at all sizes. Added to
README, sidebar nav, feed page header, and extension manifest.

Feed: thumbs up/down reactions log to preference extraction pipeline,
Font Awesome icons (local), periodic thought polling, unnotified count
in sidebar nav, seed topic bylines, modal viewer with reactions,
server-side markdown-to-HTML for thought content.

Infrastructure: thought.image_url stored at creation time, startup
backfill for existing thoughts, migration 0017, make signal-avatar
target. 5-minute thought poll interval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update architecture doc with feed page, reactions, logo, and new features

Documents feed page implementation (card grid, new/archive tabs, modal,
reactions pipeline, image URLs at creation time), logo/SVG workflow,
Font Awesome, thought count polling, and updated directory structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update CLAUDE.md with browser extension, multi-channel, and new commands

Documents browser extension directory structure, dev workflow, config
vars (BROWSER_ENABLED/HOST/PORT), make signal-avatar target, single-user
model, and design doc references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update README with browser extension, multi-channel, and feed page

Adds Browser Extension section documenting sidebar chat, active tab
context, browse_url tool, thoughts feed, and multi-device support.
Updates overview to mention browser channel and shared history.
Adds Firefox badge, browser config vars, and make signal-avatar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Use PageContext Pydantic model instead of raw dicts throughout

PageContext defined in channels/base.py (alongside IncomingMessage),
imported by browser/models.py. All page context references use typed
model attributes instead of dict.get() calls. Renamed abbreviated
variable names (ctx → context).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move inline imports to top level and batch seed topic query

All inline imports of penny modules moved to top-level imports.
Inline imports only remain for optional external packages (github_api)
inside try/except guards. Seed topic resolution uses batch get_by_ids
query instead of N individual queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Sanitize all web content at the BrowserChannel boundary

All page content from the browser is sanitized through a sandboxed model
call in BrowserChannel before reaching any downstream consumer. Both
browse_url tool responses and active tab context go through the same
_sanitize_page_content method — comprehensive rewrite preserving URLs,
structure, and details. BrowseUrlTool no longer does its own
summarization; it receives pre-sanitized content from the channel.

Single enforcement point: consumers can't accidentally bypass
sanitization because it happens at the channel boundary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move sanitize prompt and constants to proper files, add favicon, fix title color

PAGE_SANITIZE_PROMPT moved to Prompt class. TOOL_REQUEST_TIMEOUT and
MAX_PAGE_CONTENT_CHARS moved to PennyConstants. Feed page gets favicon
and black title instead of purple.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Start typing indicator before page content sanitization

Typing indicator now fires before the sandboxed summarization step so
the user sees immediate feedback while page content is being processed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Increase tool timeouts for browse_url + sanitization chain

Browser tool request timeout bumped from 30s to 60s. Overall tool
timeout bumped from 60s to 120s to accommodate the full chain:
browser round-trip + page load + content extraction + sanitization
model call. IMDB pages were timing out at 60s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add tests for page content sanitization and BrowseUrlTool passthrough

Tests cover: sandboxed sanitization happy path, fallback when no model
client, fallback on model failure, content truncation at max chars,
BrowseUrlTool returning pre-sanitized content directly, and empty
content handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Show newest thoughts first on the feed page

Added get_newest() method to ThoughtStore that returns newest-first
ordering. Feed page handler uses it instead of reversing get_recent().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Only recheck page context toggle when URL actually changes

Prevents background tab update events from resetting the toggle when
the user unchecked it on the same page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add TODO section to architecture doc for deferred work

Browse_url page headers, sender column cleanup, domain allowlist UI,
and tool rate limiting noted for future PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#899)

* Add Likes/Dislikes tabs to browser extension sidebar

Adds two new tabs to the sidebar for managing preferences directly from
the browser. Each tab lists preferences with mention counts and an × to
delete, plus an input at the bottom to add new ones. The connection
status indicator is now in the nav bar so it's visible on all tabs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove sandboxed model summarization step for web page content

The sandboxed model call (40s on 20B) wasn't providing meaningful
security — domain allowlist and no-code-execution already close the
real attack surface. Small models (gemma3:1b, qwen2.5:1.5b) hallucinate
facts making them worse than passing through Defuddle-extracted content
directly. Defuddle already strips nav/boilerplate at the source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (jaredlockhart#900)

* Store thought valence from reactions and filter thinking by preferences

Thumb reactions on thoughts now store valence (1/-1) directly on the
thought row instead of extracting a mention=1 preference. This cleans up
the preference table (which previously had noisy thought-title entries)
and provides a foundation for future thought-based scoring.

The thinking agent now gates new thought storage behind a mention-weighted
preference filter: if qualifying positive preferences exist (mention>1),
a thought must score >= 0 against them before being stored. Inactive
when no signal exists yet.

Notification scoring is simplified to pure novelty (no sentiment) since
the thought loop filter already gates on preference alignment.

Key changes:
- migration 0018: add thought.valence column
- ThoughtStore: set_valence() and get_valenced()
- similarity: replace compute_sentiment_score with compute_mention_weighted_sentiment
- BrowserChannel: store valence on thought, remove synthetic message creation
- HistoryAgent: route thought reactions to set_valence, mark processed immediately
- ThinkingAgent: _passes_preference_filter gates new thought storage
- NotifyAgent: pure novelty scoring (_select_most_novel @staticmethod)
- config_params: remove NOVELTY_WEIGHT and SENTIMENT_WEIGHT

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Backfill thought valence from existing reactions in migration

The migration now walks messagelog to find emoji reactions that point
to notification messages (thought_id IS NOT NULL) and sets the
corresponding thought.valence = 1 or -1. Only fills NULL valence
to avoid overwriting a later reaction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove reaction-based preference extraction from history agent

Preference extraction now runs only on text messages. Reactions are
processed solely for thought valence (set_valence on thought reactions)
and then marked as processed — no LLM call, no preference created.

Removes: ExtractedTopic, ExtractedTopics models, _extract_reaction_preferences,
_build_reaction_items, _extract_reaction_topics, _store_reaction_preferences,
_classify_reaction_emoji, and REACTION_TOPIC_EXTRACTION_PROMPT.

Replaces with: _process_reactions (thought valence only) + _emoji_to_int_valence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Wire PREFERENCE_MENTION_THRESHOLD into sentiment scoring

compute_mention_weighted_sentiment now takes an explicit min_mentions
parameter (no default) so the threshold is always sourced from config.
_passes_preference_filter reads PREFERENCE_MENTION_THRESHOLD and passes
it through, keeping seed-topic eligibility and sentiment filtering in sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix has_signal gate to check any qualifying preference, add negative-only test

The preference filter gate was only checking positive preferences, meaning
thoughts would slip through unfiltered if a user had only negative prefs
qualifying for the mention threshold. Now checks any qualifying preference
(positive or negative), and adds a test confirming the filter activates with
negative-only qualifying prefs (score = 0 - 1 = -1 → filtered).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix stale docstring in _passes_preference_filter

Gate activates on any qualifying preference (positive or negative),
not just positive ones — updated after the has_signal fix in cae119f.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…hart#901)

- Browser sends a heartbeat to the server on every URL navigation,
  resetting the idle timer so proactive notifications are suppressed
  while the user is actively browsing.

- PeriodicSchedule gains requires_idle flag (default True). History
  and thinking agents set requires_idle=False so they run on their own
  wall-clock timers independent of user activity. Only NotifyAgent
  remains idle-gated.

- BackgroundScheduler.notify_activity() resets _last_message_time
  without touching schedule intervals, used by the heartbeat handler.

- Test fixtures suppress independent schedules via long intervals in
  DEFAULT_TEST_RUNTIME_OVERRIDES (previously the idle gate did this
  implicitly).

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The backfill fired search_image_url for every thought with a NULL
image_url on startup. On first deploy after migration 0017 it ran
565 concurrent Serper calls, exhausting the API quota and breaking
Signal notification images for the rest of the day.

All existing thoughts now have image_url set (NULL or empty string),
so the backfill was a no-op going forward. New thoughts get image_url
assigned at creation time via ThinkingAgent.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#905)

* Add browser extension settings panel with icons, domains, and config

- Restructure nav into two-tier header: logo/title/thoughts-link/gear in
  top bar; Chat tab below. Thoughts is now a link button, not a tab.
- Add FontAwesome icons throughout sidebar and feed interaction points
- Add settings panel (gear icon) that takes over the sidebar:
  - Likes/Dislikes tabs (moved from main nav)
  - Domains tab: list, toggle allow/deny, delete, and add new entries
    from browser.storage.local — pure frontend, no backend needed
  - Config tab: all runtime ConfigParams rendered from live Python
    registry (key, description, type, current value, default); edits
    write to runtime_config DB via new config_request/config_update
    WebSocket messages; green toast confirms save
- Animated typing indicator (staggered dots) and two-tier nav CSS
- Fix feed card image corners clipping reaction buttons (border-radius
  on image directly instead of overflow:hidden on card)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Fix ruff import ordering in _handle_config_update

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Add tests for config_request and config_update browser channel handlers

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…lockhart#906)

Proactive messages (thoughts, news, check-ins) have no parent_id and
were being merged into large assistant blobs in the conversation context
window — up to 20K chars from a day's worth of notifications. They
don't belong there: they're already represented via the thought section
in the system prompt, and history rollups cover what was discussed.

Only user messages and direct replies (parent_id set) are now included
in get_messages_since. Conversation turns stay properly ordered since
threaded replies are always logged after the messages they reply to.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…t#907)

* Reflect tool calls to browser UI as live status updates

When Penny is in the agentic loop, tool calls are now surfaced to the
browser sidebar in real-time: "Searching for X...", "Reading https://...",
"Fetching news about Y..." etc., updating the typing indicator in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix typing indicator showing blank text before first tool call

BrowserOutgoing.content defaulted to "" which serialized as empty string
over the wire. JS nullish coalescing ("" ?? fallback) doesn't trigger on
empty string, so the sidebar rendered just "..." instead of "Penny is
thinking...". Changed to str | None = None so null coalescing works correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Use tool class name attributes instead of hardcoded strings

Replace magic string literals in _format_tool_status and tests with
references to the tool class name attributes (SearchTool.name, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Use class registration pattern for tool action strings

Tool subclasses auto-register via __init_subclass__ when they define a
name. Tool.format_status(tool_name, arguments) dispatches to the right
subclass's to_action_str() with no explicit list of tools anywhere.
BrowserChannel now only imports Tool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove forward reference quotes from Tool registry type annotation

Ruff UP037 flagged the quoted "Tool" — unquoted since Python 3.14
supports self-referential class annotations at class body scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…aredlockhart#908)

PeriodicSchedule and BackgroundScheduler were capturing interval and
idle_threshold values at construction time, so changes via /config
had no effect until restart. Changed both to accept Callable[[], float]
so values are read on each tick — consistent with the config system's
guarantee that changes take effect immediately.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…tch (jaredlockhart#910)

* Fix search hallucination: single-query tool with parallel agent dispatch

Root cause: SearchTool accepted a queries list and concatenated multiple
results into one tool message, which got truncated mid-content. The model
then hallucinated the rest from memory.

Fix: SearchTool.execute() now takes a single query: str. Parallelism moves
to the agent loop — _process_tool_calls uses asyncio.gather() to dispatch
all tool calls concurrently, then appends one tool message per result. This
matches Ollama's native parallel tool call protocol.

Also rewrites CONVERSATION_PROMPT and THINKING_SYSTEM_PROMPT to be
tool-agnostic — search-specific language replaced with neutral equivalents
so the model uses the right tool (search, browse_url, etc.) for the job.

Adds _make_parallel_tool_calls_response to the mock and a new
TestParallelToolCalls test that verifies two tool calls in one turn produce
two separate tool messages in the next Ollama call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Rework MultiTool as fetch with single queries array and URL auto-routing

The previous MultiTool design used separate arrays (queries, urls, news)
and a complex inner-call schema that gpt-oss:20b couldn't reliably follow.
The model kept putting URLs in queries, inventing its own call formats,
or hedging by duplicating entries across arrays.

New design: single queries array — the model dumps everything in one list
and Python routes URLs to browse_url via regex, plain text to search.
This matches the pattern the model already learned from the original
single-query search tool.

Key changes:
- MultiTool renamed to "fetch" (avoids name collision with SearchTool)
- Schema simplified to just queries[] — URLs auto-detected and routed
- _create_search_tool returns SearchTool | None (was list for no reason)
- MAX_TOOL_RESULT_CHARS raised from 8k to 50k (web pages need room)
- Chat page context injection uses fetch format (was stale browse_url)
- Browser channel tool status shows cumulative checklist with checkmarks
- CONVERSATION_PROMPT kept tool-agnostic (tool descriptions do the work)
- browse_url retries full tab lifecycle up to 3x on empty content
- Tab load + tool timeouts raised to 60s for JS-heavy pages (e.g. IMDb)
- Test: two 15k-char results both survive into model context without truncation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Hide browse_url tabs with tabHide API

Tabs were visible in the tab bar during page reads because
active: false only prevents focus steal. Now calls
browser.tabs.hide() after creation with graceful fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Give ThinkingAgent its own MultiTool (max_calls=1)

Moves multi_tool support to the base Agent class so both ChatAgent and
ThinkingAgent use MultiTool for tool dispatch. ThinkingAgent gets its own
instance with max_calls=1 (matching the old single-query cap on main).
Both MultiTools share the same browse_url provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Enforce max_calls in MultiTool schema with per-instance maxItems

The model was sending multiple queries from ThinkingAgent because the
schema had no maxItems constraint. Now MultiTool sets description and
parameters per-instance based on max_calls, matching how SearchTool
on main advertised its cap via maxItems in the JSON schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…aredlockhart#911)

Instead of forwarding text queries to Perplexity, MultiTool now routes
them to the browser as kagi.com/search?q=<query>. The browse_url tool
opens the page in a hidden tab and extracts the content.

Adds a Kagi-specific extractor to extract_text.ts that pulls structured
search results (title, URL, snippet) from .search-result elements,
running before Defuddle which would otherwise return nav chrome.

Also: tabHide permission restored, dead lastPageInfo removed from
sidebar, build-content.mjs probe cleanup.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…edlockhart#912) (jaredlockhart#912)

Two bugs fixed:

1. _build_strong_nudge grabbed the first user message in conversation
   history instead of the last (current) question. When the agentic loop
   exhausted tool calls and fell back to the nudge, the model answered a
   prior question from history instead of the current one.

2. Kagi search results rendered via JS after page load. The generic
   200-char content threshold accepted page chrome (nav, footer) as valid
   content, returning empty search results. Kagi pages now use only the
   Kagi extractor with a ready flag — pollForContent retries until
   .search-result elements appear in the DOM.

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#913)

- Default uncheck "send page content" toggle on URL change
- Tool status typing indicator: animated dots only on in-progress lines,
  completed lines show static checkmark
- Connection dot always visible: orange spinner when disconnected or
  reconnecting, green when connected (was hidden when disconnected)
- Wire on_tool_start callback for background thinking agent so tool
  activity is visible in the addon penny is using for browsing
- browse_url retries up to 3 fresh tab loads when content extraction
  fails (not-ready pages), instead of returning garbage after one attempt
- Throw error when page never becomes ready instead of falling through

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix browser not registering connection on connect

The server only populated _connections when a chat message arrived from
the browser. If the addon connected but the user never sent a chat
message from it (e.g., using Signal instead), tool requests failed with
"No browser connected" even though the WebSocket was alive and the addon
showed a green connection dot.

Add a register message that the addon sends immediately after receiving
the status:connected response, including its device label. The server
populates _connections on register, making the browser available for
tool requests without requiring a chat message first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Serialize concurrent domain permission prompts

When multiple browse_url requests arrive in parallel (e.g., "read cbc,
bbc, and ap"), each unknown domain triggered a permission dialog
simultaneously. The sidebar dialog got clobbered — only the last
domain's request_id was wired to the buttons, so the first two prompts
never resolved. The user saw one prompt but all three URLs were allowed.

Fix: queue permission prompts via a promise chain so only one dialog
shows at a time. Each queued prompt re-checks the domain (a prior
prompt may have already resolved it via parent-domain matching) before
showing the dialog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each addon can enable/disable tool use via a toggle in the config tab.
The setting is stored locally and sent to the server as a
capabilities_update message on connect and on toggle. The server tracks
ConnectionInfo (ws, tool_use_enabled, last_heartbeat) per connection and
routes tool requests only to addons with tool_use enabled, preferring
the one with the most recent heartbeat.

- New protocol: capabilities_update (browser→server), tool_use_toggle
  and tool_use_state (sidebar↔background runtime messages)
- Server: ConnectionInfo dataclass replaces raw ServerConnection in
  _connections dict; _get_tool_connection filters by tool_use_enabled;
  has_tool_connection checks for any tool-capable addon
- Addon: toggle in config tab, persisted to browser.storage.local,
  state synced to sidebar on connect
- Tests: capabilities_update sets flag, has_tool_connection requires
  enabled, smart routing picks enabled addon, returns None when all
  disabled

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

- Wrench icon in header bar visible when tool use is enabled
- Same icon on the "Allow tool use" label in config tab
- Switch npm run dev to use penny-release Firefox profile

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#917)

Domain permissions now live in a server-side SQLite table instead of
browser.storage.local. All connected addons receive the full list on
connect and after every mutation, keeping them in sync. The local
storage is kept as a read cache for fast permission checks during tool
execution.

- New migration 0019: domain_permission table (domain, permission,
  created_at, updated_at)
- New DomainPermissionStore with get_all, check_domain (parent matching),
  set_permission (upsert), delete
- New protocol: domain_update and domain_delete (browser→server),
  domain_permissions_sync (server→browser broadcast)
- Server syncs full list on register and after every domain mutation
- Sidebar domains tab reads/writes via runtime messages through the
  server instead of directly to browser.storage.local
- Permission dialog approvals also sent to server for persistence
- DomainPermission SQLModel added to models.py

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#918)

Domain permission checks now happen server-side in send_tool_request
before dispatching to the addon. When a domain is unknown, the server
broadcasts a permission prompt to all connected addons and sends a
Signal message with emoji react (👍/👎). First response from any device
resolves the permission — the domain is stored, synced to all addons,
and remaining dialogs are dismissed.

- Server-side permission check in send_tool_request using urlparse for
  domain extraction
- Concurrent requests for the same domain share one future (dedup)
- BrowserPermissionPrompt/Dismiss/Decision protocol messages
- Signal reaction callback mechanism on SignalChannel for one-shot
  permission reactions
- Signal permission message deleted after resolution (remote delete)
- Client-side permission check removed from addon executeBrowseUrl —
  server is now authoritative
- Tests: 7 browser channel permission tests + 2 signal reaction
  callback tests

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#919)

The browser content script already extracts og:image from pages, but it
was dropped in formatResult. Now the addon downloads the image using the
browser's fetch API (which has session cookies for CDN auth), base64
encodes it, and sends it as a separate image field on the tool response.

The server threads it through SearchResult.image_base64 → MultiTool
(first image wins) → ControllerResponse.attachments → send_response.
When attachments are present, the Serper image search fallback is
skipped. Signal receives the base64 data URI directly.

This was necessary because CDN bot detection (e.g., Akamai on CBC)
blocks server-side image downloads via httpx — the browser's fetch has
the authenticated session and cookies needed to pass bot checks.

- Addon: downloadImageAsDataUri in browse_url.ts using browser fetch
- Addon: BrowseResult type with text + image fields
- Addon: sendToolResponse includes image field
- Server: BrowserToolResponse.image field (optional)
- Server: send_tool_request returns tuple[str, str | None]
- Server: BrowseUrlTool.execute returns SearchResult with image
- Server: MultiTool passes first image through combined result

Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Discord event handlers for reconnecting.

Adds extensive logging to debug Discord message reception issues:
- Log intents configuration, bot user ID, gateway latency on ready
- Add on_connect, on_disconnect, on_resumed gateway event handlers
- Log raw MESSAGE_CREATE gateway events via on_socket_raw_receive
- Log ALL messages in on_message before filtering (author, channel, content)
- Log filter decisions (own message, wrong channel) with [DIAG] prefix

Add validate_connectivity()
…sues

ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly.

Changes:
- Remove OllamaClient and user_query params from ReadEmailsTool
- Return raw email content joined with separators instead of summary
- Remove ReadEmailsArgs Pydantic model (use kwargs directly)
- Remove EMAIL_SUMMARIZE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant