You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix search hallucination: single-query tool with parallel agent dispatch (jaredlockhart#910)
* Fix search hallucination: single-query tool with parallel agent dispatch
Root cause: SearchTool accepted a queries list and concatenated multiple
results into one tool message, which got truncated mid-content. The model
then hallucinated the rest from memory.
Fix: SearchTool.execute() now takes a single query: str. Parallelism moves
to the agent loop — _process_tool_calls uses asyncio.gather() to dispatch
all tool calls concurrently, then appends one tool message per result. This
matches Ollama's native parallel tool call protocol.
Also rewrites CONVERSATION_PROMPT and THINKING_SYSTEM_PROMPT to be
tool-agnostic — search-specific language replaced with neutral equivalents
so the model uses the right tool (search, browse_url, etc.) for the job.
Adds _make_parallel_tool_calls_response to the mock and a new
TestParallelToolCalls test that verifies two tool calls in one turn produce
two separate tool messages in the next Ollama call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Rework MultiTool as fetch with single queries array and URL auto-routing
The previous MultiTool design used separate arrays (queries, urls, news)
and a complex inner-call schema that gpt-oss:20b couldn't reliably follow.
The model kept putting URLs in queries, inventing its own call formats,
or hedging by duplicating entries across arrays.
New design: single queries array — the model dumps everything in one list
and Python routes URLs to browse_url via regex, plain text to search.
This matches the pattern the model already learned from the original
single-query search tool.
Key changes:
- MultiTool renamed to "fetch" (avoids name collision with SearchTool)
- Schema simplified to just queries[] — URLs auto-detected and routed
- _create_search_tool returns SearchTool | None (was list for no reason)
- MAX_TOOL_RESULT_CHARS raised from 8k to 50k (web pages need room)
- Chat page context injection uses fetch format (was stale browse_url)
- Browser channel tool status shows cumulative checklist with checkmarks
- CONVERSATION_PROMPT kept tool-agnostic (tool descriptions do the work)
- browse_url retries full tab lifecycle up to 3x on empty content
- Tab load + tool timeouts raised to 60s for JS-heavy pages (e.g. IMDb)
- Test: two 15k-char results both survive into model context without truncation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Hide browse_url tabs with tabHide API
Tabs were visible in the tab bar during page reads because
active: false only prevents focus steal. Now calls
browser.tabs.hide() after creation with graceful fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Give ThinkingAgent its own MultiTool (max_calls=1)
Moves multi_tool support to the base Agent class so both ChatAgent and
ThinkingAgent use MultiTool for tool dispatch. ThinkingAgent gets its own
instance with max_calls=1 (matching the old single-query cap on main).
Both MultiTools share the same browse_url provider.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Enforce max_calls in MultiTool schema with per-instance maxItems
The model was sending multiple queries from ThinkingAgent because the
schema had no maxItems constraint. Now MultiTool sets description and
parameters per-instance based on max_calls, matching how SearchTool
on main advertised its cap via maxItems in the JSON schema.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
0 commit comments