Skip to content

feat(gemini-planner): support native Gemini tools (google_search, url_context, code_execution, google_maps)#2

Open
christopherwxyz wants to merge 1 commit into
mainfrom
upstream-pr/planner-native-tools
Open

feat(gemini-planner): support native Gemini tools (google_search, url_context, code_execution, google_maps)#2
christopherwxyz wants to merge 1 commit into
mainfrom
upstream-pr/planner-native-tools

Conversation

@christopherwxyz
Copy link
Copy Markdown

Summary

Plumb Gemini's native server-side tool surfaces — google_search, url_context, code_execution, google_maps — through geminiPlannerAgent. Callers opt in by listing the tool name in GeminiConfig.Tools []string (an existing config field that until now was only honoured by the standalone gemini agent type, not the planner).

Motivation

The planner currently builds its Tools list exclusively from registered AX subagents via agentsToTools. There was no way to give the planner web grounding, URL fetching, code execution, or maps without writing a dedicated subagent. With this change, an operator can drop a one-liner into ax.yaml and get Gemini-side grounding for free:

gemini:
  tools:
    - google_search
    - url_context

Three correctness points (each backed by a test)

  1. Single-Tool merge. Per Gemini's tool-combination docs, all FunctionDeclarations + the native tool field must live on the same *genai.Tool object. agentsToTools naturally produces one *genai.Tool per registered agent, so process() flattens them and merges native fields onto a single mergedTool. Verified empirically against gemini-3-flash-preview on Vertex — splitting across multiple *genai.Tool entries causes Gemini 3 to emit the native tool's name as a regular function call instead of auto-executing it server-side.
    Test: TestProcess_MergesNativeToolsWithFunctionDeclarations.

  2. Empty-Tool guard. When the registry is empty AND no native tools are configured, the planner must send no tools at all — not Tools: []*genai.Tool{ {} }. Vertex rejects the zero-valued Tool with 400 INVALID_ARGUMENT: Tool must contain at least one of function_declarations, google_search, url_context, code_execution.
    Test: TestProcess_NoToolsWhenRegistryAndNativeBothEmpty.

  3. Native-field preservation from agentsToTools' nativeTools variadic. The merge loop copies .GoogleSearch / .URLContext / .CodeExecution / .GoogleMaps from each raw Tool — not just FunctionDeclarations. First-non-nil wins with a stderr warning on collision. No in-tree caller exercises the variadic yet, but the contract is what existing callers documented, and silently dropping native fields would be a confusing future bug.
    Test: TestProcess_MergePreservesNativeFieldsFromRawTools.

Plus a smoke test that ties (1) and the config plumbing together end-to-end: TestProcess_AppendsNativeToolsFromConfig.

Vertex caveat (documented in code, not implemented here)

IncludeServerSideToolInvocations (added in google.golang.org/genai v1.51) is documented and SDK-enforced as Gemini-Developer-API-only. Vertex auto-execution of native tools is gated by Google; downstream consumers work around this with a separate agent-as-tool pattern, but that's out of scope for this PR.

Relationship to the structured-subagent-args PR

This PR is the sibling of feat(proto): pass subagent prompt/history as structured AgentStart fields (#TBD). The two PRs touch different functions in internal/gemini/gemini_planner.go (process here, handleSubagentCall there) and do not conflict at the line level. They can land in either order.

Files touched

internal/gemini/gemini_planner.go      |  87 +++++++++-
internal/gemini/gemini_planner_test.go | 299 +++++++++++++++++++++++++++++++++

No proto changes, no new examples, no SDK bump required (existing google.golang.org/genai v1.43.0 already exposes GoogleSearch / URLContext / ToolCodeExecution / GoogleMaps on *genai.Tool).

Test plan

  • go build ./... clean
  • go test -race ./internal/gemini/... passes (new tests + existing)
  • go test -race ./... passes (full repo)
  • Manual end-to-end against Vertex gemini-3-flash-preview with google_search enabled (covered by author in feature branch; behaviour confirmed before extracting this PR)

The Gemini planner currently builds its Tools list exclusively from
registered AX subagents (via agentsToTools). This commit lets ax.yaml
opt into Gemini's native tool surfaces (`google_search`, `url_context`,
`code_execution`, `google_maps`) by listing them in
GeminiConfig.Tools []string, which already exists in config.go but
was only consumed by the standalone gemini agent type.

Three correctness points the implementation gets right (each driven
by an empirical Vertex error, with a test):

1. Single-Tool merge. Per Gemini's tool-combination docs, all
   FunctionDeclarations + the native tool field must live on the
   SAME *genai.Tool object. agentsToTools naturally produces one
   Tool per registered agent, so we flatten + merge in process().

2. Empty-Tool guard. When both registry and config.Tools are empty,
   don't send Tools: []*genai.Tool{ {} } — Vertex rejects with
   400 INVALID_ARGUMENT ("Tool must contain at least one of
   function_declarations, google_search, url_context, code_execution").

3. Native-field preservation from agentsToTools' nativeTools variadic.
   The merge loop copies .GoogleSearch / .URLContext /
   .CodeExecution / .GoogleMaps from each rawTool, not just
   .FunctionDeclarations. First-non-nil wins with a stderr warning
   on collision. Latent today (no in-tree caller exercises the
   variadic) but the contract is what existing callers documented.

Tests:
  TestProcess_AppendsNativeToolsFromConfig
  TestProcess_MergesNativeToolsWithFunctionDeclarations
  TestProcess_NoToolsWhenRegistryAndNativeBothEmpty
  TestProcess_MergePreservesNativeFieldsFromRawTools

Vertex caveat documented in code: IncludeServerSideToolInvocations
(added in v1.51 of google.golang.org/genai) is documented and SDK-
enforced as Gemini-Developer-API-only. Vertex auto-execution of
native tools is gated by Google; our impl works around this with a
separate agent-as-tool pattern in downstream consumers, but that's
out of scope here.
christopherwxyz added a commit that referenced this pull request May 25, 2026
…earch_agent)

These two examples were Slack-specific / Vertex-specific application
code that has been extracted into the uno-infra monorepo where they
belong (deployed alongside their K8s manifests).

Keeping them in the AX fork's examples/ created merge conflicts on
every upstream pull from google/ax. With them gone, the fork's
divergence from upstream is bounded to:
  - the three PR branches (#1 #2 #3) of generic AX improvements
  - feat/agent-sandbox-backend's tracking branch

examples/python_sandbox_agent/ stays — it's a generic reference
implementation of the agent-sandbox backend pattern, useful upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant