🔴 Required Information
Describe the Bug:
The rubric_based_final_response_quality_v1 evaluator fails to populate the <developer_instructions> section of the judge prompt when the agent's intermediate_data.invocation_events list is empty. This causes the LLM judge to receive an empty system prompt context, making it impossible to evaluate rubrics that reference the agent's developer instructions (system prompt).
The bug is in src/google/adk/evaluation/rubric_based_final_response_quality_v1.py (Lines 284–300):
developer_instructions = ""
# ...
app_details = actual_invocation.app_details
if app_details:
if (
isinstance(actual_invocation.intermediate_data, InvocationEvents)
and actual_invocation.intermediate_data.invocation_events # <-- BUG: False when list is empty
):
developer_instructions = app_details.get_developer_instructions(
agent_name=actual_invocation.intermediate_data.invocation_events[0].author
)
tool_declarations = get_tool_declarations_as_json_str(app_details)
developer_instructions is only populated when invocation_events is non-empty because it uses invocation_events[0].author to look up the agent name. When the agent correctly makes zero tool calls (e.g., declining an out-of-scope request), the list is empty, the condition is False, and the judge receives an empty <developer_instructions> block.
Steps to Reproduce:
- Create an agent with developer instructions that define scope boundaries (e.g., "only answer questions about topic X; decline everything else")
- Create an eval case where the user asks an out-of-scope question
- The agent correctly declines without calling any tools →
invocation_events is []
- Add a
rubric_based_final_response_quality_v1 rubric that references the developer instructions, e.g.:
"The agent's developer instructions explicitly state that this type of request is out-of-scope. Score YES if the agent declined without calling tools."
- Run the evaluation
- The judge receives empty
<developer_instructions> and scores the rubric as failing despite the agent behaving correctly
Expected Behavior:
The developer_instructions should be populated from app_details regardless of whether invocation_events is empty. The agent name could be resolved via a fallback (e.g., the first/root agent name from app_details.agent_details).
The judge should receive the full system prompt in <developer_instructions> so it can evaluate rubrics that reference scope definitions, behavioral rules, or other instructions.
Observed Behavior:
The judge receives an empty <developer_instructions> block and responds:
"In the provided user_prompt, the <developer_instructions> are empty, and therefore do not explicitly state this limitation. [...] Since the condition for the request being 'out-of-scope' (as defined by this property) is not met by the provided user_prompt, the agent's decline is not considered 'correct' according to the property's criteria."
The rubric scores 0.0 even though:
- The agent's actual system prompt does explicitly define the scope
- The agent did behave correctly (declined without calling tools)
- The companion
rubric_based_tool_use_quality_v1 metric passes (score 1.0) for the same invocation
Environment Details:
- ADK Library Version (pip show google-adk): 1.26.0 (But should be present in latest)
- Desktop OS: macOS
- Python Version (python -V): 3.13
Model Information:
- Are you using LiteLLM: Yes
- Which model is being used: gemini-2.5-flash (as the agent under test and as the judge model)
🟡 Optional Information
Regression:
Unknown — this appears to be a logic oversight present since the rubric_based_final_response_quality_v1 evaluator was introduced. The condition likely exists because invocation_events[0].author is used to determine which agent's instructions to retrieve, with no fallback path for the zero-tool-call case.
Logs:
The judge's rationale from the evaluation report:
rubric_id: out_of_scope_response
score: 0.0
rationale: The property defines a request as "out-of-scope" if "The agent's developer
instructions (system prompt) explicitly state that [topic X] is out-of-scope". In the
provided `user_prompt`, the `<developer_instructions>` are empty, and therefore do not
explicitly state this limitation. Since the condition for the request being "out-of-scope"
(as defined by this property) is not met by the provided `user_prompt`, the agent's
decline is not considered "correct" according to the property's criteria.
Meanwhile the intermediate_data confirms zero tool calls:
"intermediate_data": {
"invocation_events": []
}
And app_details.agent_details does contain the agent's instructions with explicit scope definitions.
Screenshots / Video:
N/A
Additional Context:
The rubric_based_tool_use_quality_v1 evaluator is not affected by this bug — it doesn't pass developer_instructions to the judge at all (it only passes tool_declarations). This creates an inconsistency where the tool-use rubric passes but the response-quality rubric fails for the exact same correct behavior.
Suggested Fix:
developer_instructions = ""
tool_declarations = "Agent has no tools."
response_steps = get_tool_calls_and_responses_as_json_str(
actual_invocation.intermediate_data
)
app_details = actual_invocation.app_details
if app_details:
# Determine agent name from invocation events if available,
# otherwise fall back to the first (root) agent in app_details
agent_name = None
if (
isinstance(actual_invocation.intermediate_data, InvocationEvents)
and actual_invocation.intermediate_data.invocation_events
):
agent_name = actual_invocation.intermediate_data.invocation_events[0].author
elif app_details.agent_details:
agent_name = next(iter(app_details.agent_details))
if agent_name:
developer_instructions = app_details.get_developer_instructions(
agent_name=agent_name
)
tool_declarations = get_tool_declarations_as_json_str(app_details)
Minimal Reproduction Code:
from google.adk.evaluation import EvalCase, Invocation, InvocationEvents
from google.adk.evaluation.rubric_based_final_response_quality_v1 import (
RubricBasedFinalResponseQualityV1,
)
from google.adk.evaluation.app_details import AppDetails, AgentDetails
from google.genai import types as genai_types
# Agent with explicit scope instructions
app_details = AppDetails(
agent_details={
"my_agent": AgentDetails(
name="my_agent",
instructions="You are a cooking assistant. Only answer questions about recipes and cooking. Decline all other requests as out-of-scope.",
tool_declarations=[],
)
}
)
# Invocation where agent made ZERO tool calls (correctly declined out-of-scope request)
invocation = Invocation(
user_content=genai_types.Content(
parts=[genai_types.Part(text="What is the capital of France?")],
role="user",
),
final_response=genai_types.Content(
parts=[genai_types.Part(text="I can only help with cooking and recipes.")],
role="model",
),
intermediate_data=InvocationEvents(invocation_events=[]), # <-- empty!
app_details=app_details,
)
# This rubric references developer instructions — but the judge will see them as empty
evaluator = RubricBasedFinalResponseQualityV1(
rubrics=[{
"rubric_id": "scope_check",
"rubric_content": {
"text_property": "The developer instructions define scope. Score YES if agent declined correctly."
},
}],
judge_model="gemini-2.5-flash",
)
# BUG: evaluator passes developer_instructions="" to judge
result = evaluator.evaluate(invocation)
# Judge scores 0.0 because it can't see the instructions
How often has this issue occurred?:
- Always (100%) — reproduces every time the agent makes zero tool calls and
rubric_based_final_response_quality_v1 is used with rubrics referencing developer instructions.
🔴 Required Information
Describe the Bug:
The
rubric_based_final_response_quality_v1evaluator fails to populate the<developer_instructions>section of the judge prompt when the agent'sintermediate_data.invocation_eventslist is empty. This causes the LLM judge to receive an empty system prompt context, making it impossible to evaluate rubrics that reference the agent's developer instructions (system prompt).The bug is in
src/google/adk/evaluation/rubric_based_final_response_quality_v1.py(Lines 284–300):developer_instructionsis only populated wheninvocation_eventsis non-empty because it usesinvocation_events[0].authorto look up the agent name. When the agent correctly makes zero tool calls (e.g., declining an out-of-scope request), the list is empty, the condition isFalse, and the judge receives an empty<developer_instructions>block.Steps to Reproduce:
invocation_eventsis[]rubric_based_final_response_quality_v1rubric that references the developer instructions, e.g.:<developer_instructions>and scores the rubric as failing despite the agent behaving correctlyExpected Behavior:
The
developer_instructionsshould be populated fromapp_detailsregardless of whetherinvocation_eventsis empty. The agent name could be resolved via a fallback (e.g., the first/root agent name fromapp_details.agent_details).The judge should receive the full system prompt in
<developer_instructions>so it can evaluate rubrics that reference scope definitions, behavioral rules, or other instructions.Observed Behavior:
The judge receives an empty
<developer_instructions>block and responds:The rubric scores
0.0even though:rubric_based_tool_use_quality_v1metric passes (score1.0) for the same invocationEnvironment Details:
Model Information:
🟡 Optional Information
Regression:
Unknown — this appears to be a logic oversight present since the
rubric_based_final_response_quality_v1evaluator was introduced. The condition likely exists becauseinvocation_events[0].authoris used to determine which agent's instructions to retrieve, with no fallback path for the zero-tool-call case.Logs:
The judge's rationale from the evaluation report:
Meanwhile the
intermediate_dataconfirms zero tool calls:And
app_details.agent_detailsdoes contain the agent's instructions with explicit scope definitions.Screenshots / Video:
N/A
Additional Context:
The
rubric_based_tool_use_quality_v1evaluator is not affected by this bug — it doesn't passdeveloper_instructionsto the judge at all (it only passestool_declarations). This creates an inconsistency where the tool-use rubric passes but the response-quality rubric fails for the exact same correct behavior.Suggested Fix:
Minimal Reproduction Code:
How often has this issue occurred?:
rubric_based_final_response_quality_v1is used with rubrics referencing developer instructions.