Hi! I’m trying to understand the intended public flow for rubric-based metrics such as rubric_based_final_response_quality_v1 and rubric_based_tool_use_quality_v1.
I realize these appear to sit on top of experimental ADK evaluator APIs. When running the final-response rubric evaluator through an internal/repo-owned helper path, I see the expected ADK experimental warnings, for example:
[EXPERIMENTAL] RubricBasedFinalResponseQualityV1Evaluator
[EXPERIMENTAL] RubricBasedEvaluator
[EXPERIMENTAL] LlmAsJudge
In that controlled path, I can construct the metric with build_eval_metric(..., rubrics=[...]) and get rubric-based scoring from rubric_based_final_response_quality_v1. For example, a small calibration run with four reviewed cases produced the expected pass/fail outcomes, and an advisory positive trace scored successfully with score: 1.0.
So my question is less “is this broken?” and more: what is the intended public surface for this capability?
From current main, /api/metrics exposes these metrics and marks them as requiring rubrics. I also see rubrics documented on eval-set cases/invocations, and the internal builder accepts rubrics. What I could not find is the supported API/CLI/MCP/config path for supplying those rubrics when running the metrics.
This looks like it may simply be a gap in the public surface rather than a disagreement in direction: the metric metadata, eval-set docs, and internal RubricsBasedCriterion construction are already present, while the runner/API/config path does not yet appear to pass rubrics through. If that is the right read, I would be interested in helping fill the gap, but wanted to ask for the preferred design before opening a PR.
Questions:
- Are rubric-based metrics intended to consume rubrics from eval-set case/invocation fields?
- Is a request/config-level rubric field planned for API/CLI/MCP runs?
- Would you prefer config-level rubrics, eval-set rubrics, or both?
- Are these metrics intentionally marked
working=false until that public surface is decided?
- Should users treat
build_eval_metric(..., rubrics=...) as internal only for now?
Relevant code/docs I checked:
src/agentevals/api/routes.py
src/agentevals/builtin_metrics.py
src/agentevals/config.py
src/agentevals/eval_config_loader.py
src/agentevals/cli.py
src/agentevals/mcp_server.py
docs/eval-set-format.md
The flow I’m hoping to support eventually is:
- define rubric criteria with IDs/text
- run
rubric_based_final_response_quality_v1 with a configured judge model
- get overall and ideally per-rubric scoring
- do this through a supported API/CLI/MCP/config path rather than reaching into internals
I asked a similar question in Discord and wanted to open a tracking issue for the intended public API/config direction.
Hi! I’m trying to understand the intended public flow for rubric-based metrics such as
rubric_based_final_response_quality_v1andrubric_based_tool_use_quality_v1.I realize these appear to sit on top of experimental ADK evaluator APIs. When running the final-response rubric evaluator through an internal/repo-owned helper path, I see the expected ADK experimental warnings, for example:
In that controlled path, I can construct the metric with
build_eval_metric(..., rubrics=[...])and get rubric-based scoring fromrubric_based_final_response_quality_v1. For example, a small calibration run with four reviewed cases produced the expected pass/fail outcomes, and an advisory positive trace scored successfully withscore: 1.0.So my question is less “is this broken?” and more: what is the intended public surface for this capability?
From current
main,/api/metricsexposes these metrics and marks them as requiring rubrics. I also seerubricsdocumented on eval-set cases/invocations, and the internal builder accepts rubrics. What I could not find is the supported API/CLI/MCP/config path for supplying those rubrics when running the metrics.This looks like it may simply be a gap in the public surface rather than a disagreement in direction: the metric metadata, eval-set docs, and internal
RubricsBasedCriterionconstruction are already present, while the runner/API/config path does not yet appear to pass rubrics through. If that is the right read, I would be interested in helping fill the gap, but wanted to ask for the preferred design before opening a PR.Questions:
working=falseuntil that public surface is decided?build_eval_metric(..., rubrics=...)as internal only for now?Relevant code/docs I checked:
src/agentevals/api/routes.pysrc/agentevals/builtin_metrics.pysrc/agentevals/config.pysrc/agentevals/eval_config_loader.pysrc/agentevals/cli.pysrc/agentevals/mcp_server.pydocs/eval-set-format.mdThe flow I’m hoping to support eventually is:
rubric_based_final_response_quality_v1with a configured judge modelI asked a similar question in Discord and wanted to open a tracking issue for the intended public API/config direction.