Remove email summarization from ReadEmailsTool and fix Zoho client is…#861
Remove email summarization from ReadEmailsTool and fix Zoho client is…#861alifeinbinary wants to merge 44 commits intojaredlockhart:mainfrom
Conversation
…sues ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly. Changes: - Remove OllamaClient and user_query params from ReadEmailsTool - Return raw email content joined with separators instead of summary - Remove ReadEmailsArgs Pydantic model (use kwargs directly) - Remove EMAIL_SUMMARIZE
|
I'll fix the failing test and update the PR with the next commit. |
There was a problem hiding this comment.
Hey Andrew, removing the Ollama summarization layer is a solid win — faster and the agent already has the content in context. The Zoho client changes have some regressions though. See inline comments for details. Consider splitting into two PRs (ReadEmailsTool cleanup vs Zoho changes).
| args = ReadEmailsArgs(**kwargs) | ||
| email_ids = args.email_ids | ||
| """Read emails and return their content.""" | ||
| email_ids = kwargs["email_ids"] |
There was a problem hiding this comment.
Project convention (from CLAUDE.md): "every Tool.execute(**kwargs) must validate through a Pydantic args model as its first line." ReadEmailsArgs should be kept:
args = ReadEmailsArgs(**kwargs)
email_ids = args.email_ids| "Dec", | ||
| ] | ||
|
|
||
| class _HTMLTextExtractor(html.parser.HTMLParser): |
There was a problem hiding this comment.
This duplicates the shared penny/html_utils.py module which already has an identical _HTMLTextExtractor and strip_html. Please keep using from penny.html_utils import strip_html instead of copying the implementation here.
| # Use 'entire:' for full-text search across all email content | ||
| # Quote if contains spaces for exact phrase matching | ||
| if " " in text: | ||
| search_parts.append(f'entire:"{text}"') |
There was a problem hiding this comment.
Removing the .replace('"', "") sanitization is a regression. If the LLM passes a query containing double quotes, it'll break the Zoho search syntax. The existing quote escaping should be preserved for text, from_addr, and subject.
| if zoho_date: | ||
| search_parts.append(f"toDate:{zoho_date}") | ||
| search_parts.append(f"subject:{subject}") | ||
| # Note: Zoho date format is DD-MMM-YYYY (e.g., 12-Sep-2017) |
There was a problem hiding this comment.
This removes the ISO 8601 to Zoho date conversion (_to_zoho_date / _convert_to_zoho_date). LLMs naturally produce ISO dates like 2026-01-15, but now those will silently fail the _is_valid_zoho_date check and date filtering won't work. The conversion logic should be kept.
|
|
||
| return f"{folder_id}:{message_id}" | ||
|
|
||
| def _parse_email_id(self, email_id: str) -> tuple[str | None, str]: |
There was a problem hiding this comment.
_parse_email_id() is defined but never called anywhere. Dead code — hold it until the caller is ready.
| return "" | ||
| try: | ||
| ts_int = int(ts) | ||
| from datetime import UTC, datetime |
There was a problem hiding this comment.
Moving from datetime import UTC, datetime from top-level into a method body is non-standard. Keep it at the top with the other imports.
| if text_body and re.search(r"<[a-zA-Z][^>]*>", text_body): | ||
| text_body = strip_html(text_body) | ||
| # Strip HTML if content appears to be HTML | ||
| if text_body and "<" in text_body: |
There was a problem hiding this comment.
Changed from re.search(r"<[a-zA-Z][^>]*>", text_body) (matches actual HTML tags) to "<" in text_body (matches any less-than sign). This will false-positive on email content with math expressions, comparisons, or angle brackets. Keep the regex check.
| mock_agent_instance.run.assert_called_once_with( | ||
| "what packages am I expecting", max_steps=email_context.config.email_max_steps | ||
| ) | ||
| mock_agent_instance.run.assert_called_once_with("what packages am I expecting") |
There was a problem hiding this comment.
email.py line 67 still passes max_steps=context.config.email_max_steps to agent.run(), so this assertion will fail — the mock receives max_steps but the assertion doesn't expect it.
…liest (jaredlockhart#863) _find_unrolled_weeks used get_recent(limit=1) which returns the most recent daily entry, but treated it as the earliest. When the most recent entry is in the current week, first_monday == current_monday and the scan loop never executes — so no completed weeks are ever found. - Add get_earliest() to HistoryStore (ASC ordering) - Use get_earliest() in _find_unrolled_weeks - Update test to seed current-week entries alongside past weeks Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…wns (jaredlockhart#864) * Improve notification scoring, thinking distribution, and topic cooldowns - Normalize novelty and sentiment scores to [0,1] via min-max scaling before applying weights, so both dimensions contribute proportionally instead of novelty dominating due to its ~4x larger raw range - Add per-topic 24h notification cooldown: once a preference (or free thought) is notified, that topic is excluded from candidates for 24 hours - Add MAX_UNNOTIFIED_THOUGHTS config param (default 20) — thinking agent skips cycles when unnotified thoughts reach the cap - Replace random-roll thinking mode selection with distribution-based steering: compare actual free/seeded ratio against target probabilities and pick whichever type is underrepresented - Add ThoughtStore.count_unnotified() and count_unnotified_free() queries - Add THOUGHT_TOPIC_COOLDOWN_SECONDS constant (86400) - 12 new tests covering normalization, cooldown, cap, and distribution logic - All existing tests updated to monkeypatch probability constants for determinism independent of production values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move thinking distribution constants to runtime config params FREE_THINKING_PROBABILITY and NEWS_THINKING_PROBABILITY are now runtime- configurable via /config instead of hardcoded constants. The seeded probability is implicit (1 - free - news). Tests pass probabilities through make_config() instead of monkeypatching PennyConstants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ockhart#865) * Move scoring weights to runtime config params (default 50/50) NOVELTY_WEIGHT and SENTIMENT_WEIGHT are now runtime-configurable via /config instead of hardcoded constants. Default changed from 40/60 to 50/50 for equal weighting now that normalization makes both dimensions comparable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix thinking agent flooding logs when at unnotified cap When MAX_UNNOTIFIED_THOUGHTS is reached, get_prompt returned None which made execute_for_user return False. The scheduler treated that as "no work" and retried every tick (~1s), flooding the log. Move the cap check to execute_for_user and return True when skipping, so the scheduler calls mark_complete and waits for the next interval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt#866) Append -site: exclusions for blocked domains (facebook, instagram, tiktok) to the Serper query so Google filters them server-side. Previously we only filtered after download, so queries dominated by these domains returned no image at all. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The THINKING_REPORT_PROMPT was producing thoughts framed as corrections
or debunking ("it turns out X is NOT Y"), which sounds wrong in
spontaneous notifications where there's nothing to correct. Updated the
prompt to frame findings as standalone new discoveries and to discard
searches that only found something doesn't exist.
Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add topic context intro to notify prompt Thought notifications were jumping straight into details without establishing what the topic is, leaving the reader confused (e.g., "Kokoroko's new RSD-2026 vinyl..." with no mention that Kokoroko is a band). Updated NOTIFY_SYSTEM_PROMPT to instruct the model to open with a brief identifying phrase before diving into details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add full system prompt assertions for news and checkin notify modes Extends the test coverage pattern to all three notification modes. ThoughtMode already had a full prompt assertion; now NewsMode and CheckinMode do too, catching structural drift in prompt composition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…redlockhart#871) * Make thought notifications conversational instead of report-style The thinking report prompt produces structured content (bullets, headers, tables) which the notify agent was regurgitating verbatim. Changed the instruction from "Share what's in it — the thought IS the substance" to "Retell it conversationally — no bullet lists, no headers, no tables" so notifications read like a friend explaining what they found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix flaky schedule test by polling for expected message content Replace wait_for_message (returns last message, vulnerable to race conditions) with wait_until + _has_message pattern that polls for the specific expected content. This matches the convention used by the rest of the test suite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#872) * Steer thinking agent away from troubleshooting/support content The thinking agent was searching for bug reports and support articles (e.g., "UAD plugin glitch") and surfacing them as interesting discoveries. Added guidance to look for releases, creative work, and discoveries while avoiding troubleshooting guides and bug reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add casual greeting to proactive notifications Notifications were jumping straight into content without a greeting. Added "Start with a casual greeting" to NOTIFY_SYSTEM_PROMPT, matching the pattern already used by the news notification prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#873) NEWS_NOTIFY_MAX_STEPS was 1, but the agent base class strips tools on the final step. With only 1 step, fetch_news could never execute — the model's tool call was discarded as "hallucinated on final step" and every news attempt produced an empty response that got disqualified. Bumped to 3 steps so the model can call the tool and format results. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#874) * Use thought title as image search fallback for notifications When a thought notification has no tool calls (model retells thought context directly), the image search fell back to using the full message text, producing bad image results. Now ThoughtMode extracts the first bold headline from the thought content as the image query (e.g., "Bad Cat Era 30 – A Hand-Wired EL84 Head"), which is a much better match for finding a relevant product/topic image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Strip generic prefixes from thought titles for image search Thought titles like "Briefing: Tone King Royalist" or "Here is something interesting I learned about the Vox AC15HWR1" had generic prefixes that diluted image search results. Added _clean_thought_title that strips common prefixes (Briefing:, Detailed Briefing:, etc.) and filters out completely generic titles. Tested against 100 recent thoughts: 97/100 produce good image queries after cleaning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Reduce fuzzy duplicate preference extraction The preference extraction prompt was creating near-duplicate preferences like "Tubesteader Eggnog user reviews" and "Tubesteader Eggnog 12AX7 pre-amp" when "Tubesteader pedals" already existed. These slipped past both TCR and embedding dedup because short strings with slightly different wording produce low similarity scores. Added explicit guidance that asking about reviews, specs, or details of a known item is engagement with the existing preference, not a new one. Added a concrete example matching the observed failure pattern. Dry-ran against the actual prompt that produced the duplicate — 3/3 runs correctly classified it as existing instead of new. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Skip questions and tasks in preference extraction The model was extracting questions and troubleshooting requests as preferences (e.g., "Running preamp into front of amp", "preamp output confusion", "pedals powered via XLink Out"). Added explicit guidance to skip questions, tasks, and troubleshooting requests. Dry-ran against 4 prompts that produced task preferences — all 4 previously-bad extractions are now suppressed or significantly reduced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Cache embeddings on thought and messagelog tables Thoughts and outgoing messages were being re-embedded from scratch on every dedup check and novelty comparison. Added embedding BLOB columns to both tables so embeddings are computed once and reused. - Migration 0014: adds embedding column to thought and messagelog - ThinkingAgent: embeds and stores at thought creation time, uses cached embeddings in dedup (skips thoughts without embeddings) - NotifyAgent: uses cached message embeddings for novelty scoring, backfills on first access - Startup backfill job extended to populate thought embeddings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Embed messages at insert time and backfill at startup Messages were being lazily backfilled in the notify agent on read. Moved embedding to send_response (insert time) so every outgoing message gets its embedding cached immediately. Added startup backfill for existing messages without embeddings, and a test assertion that thoughts get embeddings stored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The NOTIFY_NEWS prompt said "the source in parentheses" which the model interpreted as the outlet name (e.g., "(New York Times)") rather than the actual URL from the tool results. Changed to "the source URL from the tool results" so URLs are included. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chat was instructed to "Focus on ONE topic per response" and "go deep" which produced narrow answers that missed important angles (e.g., trauma/immune question only covered physical trauma, ignored PTSD). Changed to "Go WIDE: cover as many angles as possible" with multiple search queries and follow-up searches for comprehensive answers. Thinking mode stays go-deep (autonomous exploration of one thread). Chat mode is now go-wide (user wants the full picture). Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#879) * Skip daily history entries already covered by weekly rollups The history context was including both weekly rollups AND their constituent daily entries, causing duplicate topics in the system prompt. Now _format_daily_entries checks each day against the weekly rollup date ranges and skips days that fall within a completed week. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add test for daily/weekly history overlap filtering Verifies that daily entries within a weekly rollup's date range are excluded from the history context, while daily entries outside the range are still included. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changed THINKING_REPORT_PROMPT from structured report format (tables, headers, 500 words) to conversational message format (casual greeting, details, URL, 300 words). Thoughts are now stored in the shape they'll be shared, cutting context size in half. Loosened NOTIFY_SYSTEM_PROMPT to relay the thought as-is instead of re-summarizing. Old prompt: "Retell it conversationally, no bullets/ headers/tables." New prompt: "Share it with the user, don't compress or summarize, just relay in your own voice." Tested end-to-end on 3 examples: new pipeline produces notifications with equal or better detail than the original two-step process. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) The "No greetings, no sign-offs" rule was in PENNY_IDENTITY which is shared by all agents, causing proactive notifications to skip greetings even though the notify prompt said to include one. Moved the rule to CONVERSATION_PROMPT so it only applies when responding to user messages. Also removed the greeting from THINKING_REPORT_PROMPT since the notify agent now handles greetings — the stored thought shouldn't include one. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…redlockhart#882) * Score thoughts by cached embedding before generating notification Previously generated N candidates through the model, then scored them. Now scores the raw thoughts using cached embeddings (novelty + sentiment), picks the winner, then runs only the winner through the notify agent. With NOTIFY_CANDIDATES=5, this cuts model calls from 5 to 1 per notification cycle. Possible because thoughts are now stored in notification-ready shape with pre-computed embeddings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add integration test for embedding-based thought scoring Tests the full notification flow with 3 thought candidates: seeds DB with preferences, thoughts with embeddings, and an incoming message, then runs execute_for_user and asserts a notification was sent and exactly 1 of 3 thoughts was marked notified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Assert on full call chain in embedding scoring test Verify every edge of the score-then-generate flow: - 1 Ollama chat call (winner only, not all candidates) - 1 embed call (outgoing message at send time, not during scoring) - 1 serper image search - Message delivered via Signal - 2 of 3 thoughts remain unnotified - 1 thought marked notified in DB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Simplify image search to use thought content directly The bold-title extraction and prefix-cleaning logic was built for the old structured report format. With conversational thoughts, bold titles are rare. Now uses first 300 chars of thought content as the image query — the subject name consistently appears in the first sentence or two, and serper is smart enough to extract it. Removed dead code: _clean_thought_title, _is_generic_title, _TITLE_STRIP_PREFIXES. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add thought title for dedup and image search The thinking report prompt now emits a 'Topic: <title>' line that gets parsed and stored separately. Titles are short (e.g., "Tubesteader Beekeeper pedal") so they embed closely for duplicates and work well as image search queries. Key changes: - Migration 0015: adds title column to thought table - THINKING_REPORT_PROMPT: emits 'Topic: ...' on last line - ThinkingAgent: parses title, embeds title (not content), stores both - Thought dedup: now global (all thoughts, not per-preference) using TCR_OR_EMBEDDING on titles — catches cross-preference duplicates - Image search: uses thought.title when available - New runtime config: THOUGHT_DEDUP_TCR_THRESHOLD (default 0.6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Separate title and content embeddings on thoughts Title embedding for dedup (short string, high discrimination), content embedding for novelty/sentiment scoring (full message vs messages/preferences). Both computed at creation time and cached. - Added title_embedding column to thought table (migration 0015) - ThinkingAgent stores both embeddings at creation - Dedup uses title_embedding, scoring uses embedding (content) - Added THOUGHT_DEDUP_TCR_THRESHOLD runtime config param (0.6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#884) OR strategy produced false positives from common short words ("2026", "AI", "agent") matching via TCR on short titles. Switched to AND (both TCR >= 0.6 AND embedding >= 0.6 required) which eliminates all false positives while catching real duplicates. Also lowercase titles before embedding so casing doesn't affect similarity (e.g., "THE GHOST IN THE SHELL" vs "Ghost in the Shell" was 0.381, now 0.652 after lowercasing). Lowered THOUGHT_DEDUP_EMBEDDING_THRESHOLD default from 0.80 to 0.60 since title embeddings score lower than full-content embeddings. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ed (jaredlockhart#896) * Add browser extension with WebSocket server and dev tooling Browser sidebar extension connects to Penny via WebSocket (echo-only for now). Adds web-ext dev setup with auto-reload, exposes port 9090 from Docker, and wires up BROWSER_ENABLED config to start the server alongside Signal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add multi-channel architecture with device routing and shared history ChannelManager implements MessageChannel as a routing proxy — all agents, scheduler, and commands interact with it instead of a single channel. Messages from any device (Signal, browser) resolve to the same user identity, giving full conversation continuity across channels. New: Device table + DeviceStore, ChannelManager, BrowserChannel (full MessageChannel), migration 0016, ChannelType enum, browser sidebar device registration flow. 418 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add browser HTML formatting, image URLs, reconnect indicator, and single-user fix BrowserChannel.prepare_outgoing converts markdown to HTML (bold, italic, code, links, tables-to-bullets). Images use URLs via search_image_url instead of base64 download, rendered as <img> tags prepended to messages. Sidebar shows reconnecting spinner. Background agents use get_primary_sender from UserInfo instead of mining MessageLog for user identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Set up TypeScript, typed protocol, light/dark theme, and streamlined UI Converts browser extension to TypeScript with strict mode. Shared protocol.ts defines typed constants and discriminated unions for the WebSocket protocol. CSS refactored to custom properties with prefers-color-scheme for automatic light/dark support. Header removed, status indicator is now a minimal dot at bottom-right of messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Persist chat history in browser local storage with smart scrolling Messages stored in browser.storage.local (capped at 200) and rehydrated on sidebar open. New messages scroll to show the top of the message; rehydration jumps to bottom instantly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move WebSocket to background script, sidebar uses runtime messaging Background script owns the server connection and persists across sidebar open/close. Sidebar communicates via browser.runtime messaging with typed RuntimeMessage protocol. Connection state synced on sidebar open via port. Smart scroll: short messages anchor at bottom, long messages show top first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add browse_url tool with hidden tab, content extraction, and domain permissions First browser tool: browse_url opens a hidden tab with full web engine and user session, injects a content script to extract visible text, then the server summarizes it in a sandboxed model call before the agent sees it. Domain permission flow: unknown domains prompt the user via sidebar dialog, decisions stored for future calls. Tool available dynamically to chat and thinking agents when a browser is connected. Protocol: tool_request/tool_response RPC over WebSocket with correlation IDs. BrowserChannel resolves asyncio Futures when responses arrive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix single-user identity resolution for commands, reactions, and startup Commands, reactions, and command logs now resolve device identifiers to the primary user sender via _resolve_user_sender. Startup announcement uses get_primary_sender and skips when no message history exists. Tests added for user sender resolution and startup skip behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix /draw in browser by handling raw base64 and data URI attachments _prepend_images now supports three attachment formats: HTTP URLs, data URIs, and raw base64 (wrapped as data:image/png). Previously only HTTP URLs were rendered, so /draw output was silently dropped in the browser sidebar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add active tab context injection for browser sidebar messages Background script extracts visible text from the active tab on tab switch and page load, holds it in a buffer, and attaches it to chat messages. Server injects it into the chat agent's system prompt as a "Current Browser Page" context section. Truncated to 5,000 chars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix scroll positioning by re-scrolling after image load scrollIntoView fires before images render, so offsetHeight is wrong for messages with images. Now re-scrolls on each img load event to account for the final dimensions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Replace content extraction with Defuddle, inject page context as synthetic tool call Content script now uses Defuddle for smart page extraction (strips nav, sidebars, boilerplate) with CSS heuristic and TreeWalker fallbacks. Bundled via esbuild since content scripts can't use imports. Page context injected as a synthetic browse_url tool call + result in the message history instead of system prompt. The model sees a pre-completed tool exchange and answers from it directly. System prompt carries a minimal hint (title + URL) to disambiguate "this page" references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add page context toggle, og:image extraction, and flush image styling Sidebar shows current page title with checkbox to include page content. Content script extracts og:image metadata. Responses to page-context messages show the page image and "In response to" link inside the message bubble. All images in Penny messages now render flush to bubble edges with matching border-radius. Input disabled while waiting for response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update browser extension architecture doc with implementation status Reflects all completed work: multi-channel architecture, device table, browse_url tool, active tab context, Defuddle extraction, permission flow, TypeScript protocol, page context toggle, and additional features not in the original plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add thoughts feed page with new/archive tabs, image URLs, and modal viewer Feed page renders thoughts as a card grid with images, titles, seed topic bylines, and HTML-formatted content (via server-side prepare_outgoing). New/Archive tabs split by notified_at. Clickable cards open a modal with full content. Sidebar nav bar links to feed page. image_url stored on Thought model at creation time. Startup backfill populates existing thoughts in parallel batches. Migration 0017 adds image_url column. Seed topic resolved from preference FK for bylines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add thought reactions, unnotified count, Font Awesome icons, and periodic polling Thumbs up/down on feed cards and modal overlay — logs reaction as incoming message with parent_id to synthetic outgoing (same pipeline as Signal reactions for preference extraction), marks thought notified, fades card. Font Awesome installed locally (no CDN). Sidebar nav shows unnotified thought count. Background polls thoughts every 5 minutes for fresh count. Reaction buttons float on card corners with hover color effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Penny logo with transparent background, extension icons, and Signal avatar penny.png made transparent and resized to 48px/96px for extension icons. Added to README header. Signal profile picture set via signal-cli-rest-api PUT /v1/profiles endpoint. New `make signal-avatar` target for setting it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Penny logo, SVG icons, thought reactions, feed polish, and image backfill Logo: penny.svg traced from PNG via potrace, auto-cropped, rendered to 16/32/48/96px PNGs from SVG for crisp icons at all sizes. Added to README, sidebar nav, feed page header, and extension manifest. Feed: thumbs up/down reactions log to preference extraction pipeline, Font Awesome icons (local), periodic thought polling, unnotified count in sidebar nav, seed topic bylines, modal viewer with reactions, server-side markdown-to-HTML for thought content. Infrastructure: thought.image_url stored at creation time, startup backfill for existing thoughts, migration 0017, make signal-avatar target. 5-minute thought poll interval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update architecture doc with feed page, reactions, logo, and new features Documents feed page implementation (card grid, new/archive tabs, modal, reactions pipeline, image URLs at creation time), logo/SVG workflow, Font Awesome, thought count polling, and updated directory structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update CLAUDE.md with browser extension, multi-channel, and new commands Documents browser extension directory structure, dev workflow, config vars (BROWSER_ENABLED/HOST/PORT), make signal-avatar target, single-user model, and design doc references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update README with browser extension, multi-channel, and feed page Adds Browser Extension section documenting sidebar chat, active tab context, browse_url tool, thoughts feed, and multi-device support. Updates overview to mention browser channel and shared history. Adds Firefox badge, browser config vars, and make signal-avatar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Use PageContext Pydantic model instead of raw dicts throughout PageContext defined in channels/base.py (alongside IncomingMessage), imported by browser/models.py. All page context references use typed model attributes instead of dict.get() calls. Renamed abbreviated variable names (ctx → context). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move inline imports to top level and batch seed topic query All inline imports of penny modules moved to top-level imports. Inline imports only remain for optional external packages (github_api) inside try/except guards. Seed topic resolution uses batch get_by_ids query instead of N individual queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Sanitize all web content at the BrowserChannel boundary All page content from the browser is sanitized through a sandboxed model call in BrowserChannel before reaching any downstream consumer. Both browse_url tool responses and active tab context go through the same _sanitize_page_content method — comprehensive rewrite preserving URLs, structure, and details. BrowseUrlTool no longer does its own summarization; it receives pre-sanitized content from the channel. Single enforcement point: consumers can't accidentally bypass sanitization because it happens at the channel boundary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move sanitize prompt and constants to proper files, add favicon, fix title color PAGE_SANITIZE_PROMPT moved to Prompt class. TOOL_REQUEST_TIMEOUT and MAX_PAGE_CONTENT_CHARS moved to PennyConstants. Feed page gets favicon and black title instead of purple. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Start typing indicator before page content sanitization Typing indicator now fires before the sandboxed summarization step so the user sees immediate feedback while page content is being processed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Increase tool timeouts for browse_url + sanitization chain Browser tool request timeout bumped from 30s to 60s. Overall tool timeout bumped from 60s to 120s to accommodate the full chain: browser round-trip + page load + content extraction + sanitization model call. IMDB pages were timing out at 60s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add tests for page content sanitization and BrowseUrlTool passthrough Tests cover: sandboxed sanitization happy path, fallback when no model client, fallback on model failure, content truncation at max chars, BrowseUrlTool returning pre-sanitized content directly, and empty content handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Show newest thoughts first on the feed page Added get_newest() method to ThoughtStore that returns newest-first ordering. Feed page handler uses it instead of reversing get_recent(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Only recheck page context toggle when URL actually changes Prevents background tab update events from resetting the toggle when the user unchecked it on the same page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add TODO section to architecture doc for deferred work Browse_url page headers, sender column cleanup, domain allowlist UI, and tool rate limiting noted for future PRs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#899) * Add Likes/Dislikes tabs to browser extension sidebar Adds two new tabs to the sidebar for managing preferences directly from the browser. Each tab lists preferences with mention counts and an × to delete, plus an input at the bottom to add new ones. The connection status indicator is now in the nav bar so it's visible on all tabs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove sandboxed model summarization step for web page content The sandboxed model call (40s on 20B) wasn't providing meaningful security — domain allowlist and no-code-execution already close the real attack surface. Small models (gemma3:1b, qwen2.5:1.5b) hallucinate facts making them worse than passing through Defuddle-extracted content directly. Defuddle already strips nav/boilerplate at the source. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (jaredlockhart#900) * Store thought valence from reactions and filter thinking by preferences Thumb reactions on thoughts now store valence (1/-1) directly on the thought row instead of extracting a mention=1 preference. This cleans up the preference table (which previously had noisy thought-title entries) and provides a foundation for future thought-based scoring. The thinking agent now gates new thought storage behind a mention-weighted preference filter: if qualifying positive preferences exist (mention>1), a thought must score >= 0 against them before being stored. Inactive when no signal exists yet. Notification scoring is simplified to pure novelty (no sentiment) since the thought loop filter already gates on preference alignment. Key changes: - migration 0018: add thought.valence column - ThoughtStore: set_valence() and get_valenced() - similarity: replace compute_sentiment_score with compute_mention_weighted_sentiment - BrowserChannel: store valence on thought, remove synthetic message creation - HistoryAgent: route thought reactions to set_valence, mark processed immediately - ThinkingAgent: _passes_preference_filter gates new thought storage - NotifyAgent: pure novelty scoring (_select_most_novel @staticmethod) - config_params: remove NOVELTY_WEIGHT and SENTIMENT_WEIGHT Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Backfill thought valence from existing reactions in migration The migration now walks messagelog to find emoji reactions that point to notification messages (thought_id IS NOT NULL) and sets the corresponding thought.valence = 1 or -1. Only fills NULL valence to avoid overwriting a later reaction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove reaction-based preference extraction from history agent Preference extraction now runs only on text messages. Reactions are processed solely for thought valence (set_valence on thought reactions) and then marked as processed — no LLM call, no preference created. Removes: ExtractedTopic, ExtractedTopics models, _extract_reaction_preferences, _build_reaction_items, _extract_reaction_topics, _store_reaction_preferences, _classify_reaction_emoji, and REACTION_TOPIC_EXTRACTION_PROMPT. Replaces with: _process_reactions (thought valence only) + _emoji_to_int_valence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Wire PREFERENCE_MENTION_THRESHOLD into sentiment scoring compute_mention_weighted_sentiment now takes an explicit min_mentions parameter (no default) so the threshold is always sourced from config. _passes_preference_filter reads PREFERENCE_MENTION_THRESHOLD and passes it through, keeping seed-topic eligibility and sentiment filtering in sync. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix has_signal gate to check any qualifying preference, add negative-only test The preference filter gate was only checking positive preferences, meaning thoughts would slip through unfiltered if a user had only negative prefs qualifying for the mention threshold. Now checks any qualifying preference (positive or negative), and adds a test confirming the filter activates with negative-only qualifying prefs (score = 0 - 1 = -1 → filtered). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix stale docstring in _passes_preference_filter Gate activates on any qualifying preference (positive or negative), not just positive ones — updated after the has_signal fix in cae119f. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…hart#901) - Browser sends a heartbeat to the server on every URL navigation, resetting the idle timer so proactive notifications are suppressed while the user is actively browsing. - PeriodicSchedule gains requires_idle flag (default True). History and thinking agents set requires_idle=False so they run on their own wall-clock timers independent of user activity. Only NotifyAgent remains idle-gated. - BackgroundScheduler.notify_activity() resets _last_message_time without touching schedule intervals, used by the heartbeat handler. - Test fixtures suppress independent schedules via long intervals in DEFAULT_TEST_RUNTIME_OVERRIDES (previously the idle gate did this implicitly). Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The backfill fired search_image_url for every thought with a NULL image_url on startup. On first deploy after migration 0017 it ran 565 concurrent Serper calls, exhausting the API quota and breaking Signal notification images for the rest of the day. All existing thoughts now have image_url set (NULL or empty string), so the backfill was a no-op going forward. New thoughts get image_url assigned at creation time via ThinkingAgent. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#905) * Add browser extension settings panel with icons, domains, and config - Restructure nav into two-tier header: logo/title/thoughts-link/gear in top bar; Chat tab below. Thoughts is now a link button, not a tab. - Add FontAwesome icons throughout sidebar and feed interaction points - Add settings panel (gear icon) that takes over the sidebar: - Likes/Dislikes tabs (moved from main nav) - Domains tab: list, toggle allow/deny, delete, and add new entries from browser.storage.local — pure frontend, no backend needed - Config tab: all runtime ConfigParams rendered from live Python registry (key, description, type, current value, default); edits write to runtime_config DB via new config_request/config_update WebSocket messages; green toast confirms save - Animated typing indicator (staggered dots) and two-tier nav CSS - Fix feed card image corners clipping reaction buttons (border-radius on image directly instead of overflow:hidden on card) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Fix ruff import ordering in _handle_config_update Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Add tests for config_request and config_update browser channel handlers Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…lockhart#906) Proactive messages (thoughts, news, check-ins) have no parent_id and were being merged into large assistant blobs in the conversation context window — up to 20K chars from a day's worth of notifications. They don't belong there: they're already represented via the thought section in the system prompt, and history rollups cover what was discussed. Only user messages and direct replies (parent_id set) are now included in get_messages_since. Conversation turns stay properly ordered since threaded replies are always logged after the messages they reply to. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…t#907) * Reflect tool calls to browser UI as live status updates When Penny is in the agentic loop, tool calls are now surfaced to the browser sidebar in real-time: "Searching for X...", "Reading https://...", "Fetching news about Y..." etc., updating the typing indicator in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix typing indicator showing blank text before first tool call BrowserOutgoing.content defaulted to "" which serialized as empty string over the wire. JS nullish coalescing ("" ?? fallback) doesn't trigger on empty string, so the sidebar rendered just "..." instead of "Penny is thinking...". Changed to str | None = None so null coalescing works correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use tool class name attributes instead of hardcoded strings Replace magic string literals in _format_tool_status and tests with references to the tool class name attributes (SearchTool.name, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use class registration pattern for tool action strings Tool subclasses auto-register via __init_subclass__ when they define a name. Tool.format_status(tool_name, arguments) dispatches to the right subclass's to_action_str() with no explicit list of tools anywhere. BrowserChannel now only imports Tool. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove forward reference quotes from Tool registry type annotation Ruff UP037 flagged the quoted "Tool" — unquoted since Python 3.14 supports self-referential class annotations at class body scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…aredlockhart#908) PeriodicSchedule and BackgroundScheduler were capturing interval and idle_threshold values at construction time, so changes via /config had no effect until restart. Changed both to accept Callable[[], float] so values are read on each tick — consistent with the config system's guarantee that changes take effect immediately. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…tch (jaredlockhart#910) * Fix search hallucination: single-query tool with parallel agent dispatch Root cause: SearchTool accepted a queries list and concatenated multiple results into one tool message, which got truncated mid-content. The model then hallucinated the rest from memory. Fix: SearchTool.execute() now takes a single query: str. Parallelism moves to the agent loop — _process_tool_calls uses asyncio.gather() to dispatch all tool calls concurrently, then appends one tool message per result. This matches Ollama's native parallel tool call protocol. Also rewrites CONVERSATION_PROMPT and THINKING_SYSTEM_PROMPT to be tool-agnostic — search-specific language replaced with neutral equivalents so the model uses the right tool (search, browse_url, etc.) for the job. Adds _make_parallel_tool_calls_response to the mock and a new TestParallelToolCalls test that verifies two tool calls in one turn produce two separate tool messages in the next Ollama call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Rework MultiTool as fetch with single queries array and URL auto-routing The previous MultiTool design used separate arrays (queries, urls, news) and a complex inner-call schema that gpt-oss:20b couldn't reliably follow. The model kept putting URLs in queries, inventing its own call formats, or hedging by duplicating entries across arrays. New design: single queries array — the model dumps everything in one list and Python routes URLs to browse_url via regex, plain text to search. This matches the pattern the model already learned from the original single-query search tool. Key changes: - MultiTool renamed to "fetch" (avoids name collision with SearchTool) - Schema simplified to just queries[] — URLs auto-detected and routed - _create_search_tool returns SearchTool | None (was list for no reason) - MAX_TOOL_RESULT_CHARS raised from 8k to 50k (web pages need room) - Chat page context injection uses fetch format (was stale browse_url) - Browser channel tool status shows cumulative checklist with checkmarks - CONVERSATION_PROMPT kept tool-agnostic (tool descriptions do the work) - browse_url retries full tab lifecycle up to 3x on empty content - Tab load + tool timeouts raised to 60s for JS-heavy pages (e.g. IMDb) - Test: two 15k-char results both survive into model context without truncation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Hide browse_url tabs with tabHide API Tabs were visible in the tab bar during page reads because active: false only prevents focus steal. Now calls browser.tabs.hide() after creation with graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Give ThinkingAgent its own MultiTool (max_calls=1) Moves multi_tool support to the base Agent class so both ChatAgent and ThinkingAgent use MultiTool for tool dispatch. ThinkingAgent gets its own instance with max_calls=1 (matching the old single-query cap on main). Both MultiTools share the same browse_url provider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Enforce max_calls in MultiTool schema with per-instance maxItems The model was sending multiple queries from ThinkingAgent because the schema had no maxItems constraint. Now MultiTool sets description and parameters per-instance based on max_calls, matching how SearchTool on main advertised its cap via maxItems in the JSON schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…aredlockhart#911) Instead of forwarding text queries to Perplexity, MultiTool now routes them to the browser as kagi.com/search?q=<query>. The browse_url tool opens the page in a hidden tab and extracts the content. Adds a Kagi-specific extractor to extract_text.ts that pulls structured search results (title, URL, snippet) from .search-result elements, running before Defuddle which would otherwise return nav chrome. Also: tabHide permission restored, dead lastPageInfo removed from sidebar, build-content.mjs probe cleanup. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…edlockhart#912) (jaredlockhart#912) Two bugs fixed: 1. _build_strong_nudge grabbed the first user message in conversation history instead of the last (current) question. When the agentic loop exhausted tool calls and fell back to the nudge, the model answered a prior question from history instead of the current one. 2. Kagi search results rendered via JS after page load. The generic 200-char content threshold accepted page chrome (nav, footer) as valid content, returning empty search results. Kagi pages now use only the Kagi extractor with a ready flag — pollForContent retries until .search-result elements appear in the DOM. Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aredlockhart#913) - Default uncheck "send page content" toggle on URL change - Tool status typing indicator: animated dots only on in-progress lines, completed lines show static checkmark - Connection dot always visible: orange spinner when disconnected or reconnecting, green when connected (was hidden when disconnected) - Wire on_tool_start callback for background thinking agent so tool activity is visible in the addon penny is using for browsing - browse_url retries up to 3 fresh tab loads when content extraction fails (not-ready pages), instead of returning garbage after one attempt - Throw error when page never becomes ready instead of falling through Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix browser not registering connection on connect The server only populated _connections when a chat message arrived from the browser. If the addon connected but the user never sent a chat message from it (e.g., using Signal instead), tool requests failed with "No browser connected" even though the WebSocket was alive and the addon showed a green connection dot. Add a register message that the addon sends immediately after receiving the status:connected response, including its device label. The server populates _connections on register, making the browser available for tool requests without requiring a chat message first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Serialize concurrent domain permission prompts When multiple browse_url requests arrive in parallel (e.g., "read cbc, bbc, and ap"), each unknown domain triggered a permission dialog simultaneously. The sidebar dialog got clobbered — only the last domain's request_id was wired to the buttons, so the first two prompts never resolved. The user saw one prompt but all three URLs were allowed. Fix: queue permission prompts via a promise chain so only one dialog shows at a time. Each queued prompt re-checks the domain (a prior prompt may have already resolved it via parent-domain matching) before showing the dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each addon can enable/disable tool use via a toggle in the config tab. The setting is stored locally and sent to the server as a capabilities_update message on connect and on toggle. The server tracks ConnectionInfo (ws, tool_use_enabled, last_heartbeat) per connection and routes tool requests only to addons with tool_use enabled, preferring the one with the most recent heartbeat. - New protocol: capabilities_update (browser→server), tool_use_toggle and tool_use_state (sidebar↔background runtime messages) - Server: ConnectionInfo dataclass replaces raw ServerConnection in _connections dict; _get_tool_connection filters by tool_use_enabled; has_tool_connection checks for any tool-capable addon - Addon: toggle in config tab, persisted to browser.storage.local, state synced to sidebar on connect - Tests: capabilities_update sets flag, has_tool_connection requires enabled, smart routing picks enabled addon, returns None when all disabled Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) - Wrench icon in header bar visible when tool use is enabled - Same icon on the "Allow tool use" label in config tab - Switch npm run dev to use penny-release Firefox profile Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#917) Domain permissions now live in a server-side SQLite table instead of browser.storage.local. All connected addons receive the full list on connect and after every mutation, keeping them in sync. The local storage is kept as a read cache for fast permission checks during tool execution. - New migration 0019: domain_permission table (domain, permission, created_at, updated_at) - New DomainPermissionStore with get_all, check_domain (parent matching), set_permission (upsert), delete - New protocol: domain_update and domain_delete (browser→server), domain_permissions_sync (server→browser broadcast) - Server syncs full list on register and after every domain mutation - Sidebar domains tab reads/writes via runtime messages through the server instead of directly to browser.storage.local - Permission dialog approvals also sent to server for persistence - DomainPermission SQLModel added to models.py Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lockhart#918) Domain permission checks now happen server-side in send_tool_request before dispatching to the addon. When a domain is unknown, the server broadcasts a permission prompt to all connected addons and sends a Signal message with emoji react (👍/👎). First response from any device resolves the permission — the domain is stored, synced to all addons, and remaining dialogs are dismissed. - Server-side permission check in send_tool_request using urlparse for domain extraction - Concurrent requests for the same domain share one future (dedup) - BrowserPermissionPrompt/Dismiss/Decision protocol messages - Signal reaction callback mechanism on SignalChannel for one-shot permission reactions - Signal permission message deleted after resolution (remote delete) - Client-side permission check removed from addon executeBrowseUrl — server is now authoritative - Tests: 7 browser channel permission tests + 2 signal reaction callback tests Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckhart#919) The browser content script already extracts og:image from pages, but it was dropped in formatResult. Now the addon downloads the image using the browser's fetch API (which has session cookies for CDN auth), base64 encodes it, and sends it as a separate image field on the tool response. The server threads it through SearchResult.image_base64 → MultiTool (first image wins) → ControllerResponse.attachments → send_response. When attachments are present, the Serper image search fallback is skipped. Signal receives the base64 data URI directly. This was necessary because CDN bot detection (e.g., Akamai on CBC) blocks server-side image downloads via httpx — the browser's fetch has the authenticated session and cookies needed to pass bot checks. - Addon: downloadImageAsDataUri in browse_url.ts using browser fetch - Addon: BrowseResult type with text + image fields - Addon: sendToolResponse includes image field - Server: BrowserToolResponse.image field (optional) - Server: send_tool_request returns tuple[str, str | None] - Server: BrowseUrlTool.execute returns SearchResult with image - Server: MultiTool passes first image through combined result Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Discord event handlers for reconnecting. Adds extensive logging to debug Discord message reception issues: - Log intents configuration, bot user ID, gateway latency on ready - Add on_connect, on_disconnect, on_resumed gateway event handlers - Log raw MESSAGE_CREATE gateway events via on_socket_raw_receive - Log ALL messages in on_message before filtering (author, channel, content) - Log filter decisions (own message, wrong channel) with [DIAG] prefix Add validate_connectivity()
…sues ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly. Changes: - Remove OllamaClient and user_query params from ReadEmailsTool - Return raw email content joined with separators instead of summary - Remove ReadEmailsArgs Pydantic model (use kwargs directly) - Remove EMAIL_SUMMARIZE
…sues
ReadEmailsTool was running fetched emails through Ollama summarization, adding latency and losing detail. The agent already has the full email content in context and can answer questions directly.
Changes:
this modifies penny/commands/email.py and read_emails.py as well, FYI. I'm getting better summaries, faster, with this approach.