prompt: add query efficiency guidance to reduce excessive tool calls

slashpai · slashpai · commit fb9ac9adee1e · 2026-03-13T16:57:08.000+05:30
- Add QUERY EFFICIENCY section to ServerPrompt with PromQL aggregation patterns
  (topk, sum by, rate) to prevent agents from querying one entity at a time
- Mark Steps 2-3 (get_label_names, get_label_values) as optional when aggregated
  queries suffice
- Add aggregation reminders to ExecuteInstantQueryPrompt and ExecuteRangeQueryPrompt

Signed-off-by: Jayapriya Pai &lt;janantha@redhat.com&gt;
diff --git a/TOOLS.md b/TOOLS.md
@@ -38,7 +38,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
 
 - PREREQUISITE: You MUST call list_metrics first to verify the metric exists
 - WHEN TO USE: - Current state questions: "What is the current error rate?" - Point-in-time snapshots: "How many pods are running?" - Latest values: "Which pods are in Pending state?"
-- The 'query' parameter MUST use metric names that were returned by list_metrics.
+- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.
 
 **Parameters:**
 
@@ -66,7 +66,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
 - PREREQUISITE: You MUST call list_metrics first to verify the metric exists
 - WHEN TO USE: - Trends over time: "What was CPU usage over the last hour?" - Rate calculations: "How many requests per second?" - Historical analysis: "Were there any restarts in the last 5 minutes?"
 - TIME PARAMETERS: - 'duration': Look back from now (e.g., "5m", "1h", "24h") - 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
-- The 'query' parameter MUST use metric names that were returned by list_metrics.
+- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.
 
 **Parameters:**
 
diff --git a/pkg/tools/prompt.go b/pkg/tools/prompt.go
@@ -18,11 +18,13 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
 - Always pass in a name_regex param to it with a best guess of what the metric would be named like.
 - Search the returned list to find the exact metric name that exists
 
-**STEP 2: Call get_label_names for the metric you found**
+**STEP 2 (optional): Call get_label_names for the metric you found**
 - Discover available labels for filtering (namespace, pod, service, etc.)
+- Skip if you can write an aggregated query directly (e.g., sum by, topk)
 
-**STEP 3: Call get_label_values if you need specific filter values**
+**STEP 3 (optional): Call get_label_values if you need specific filter values**
 - Find exact label values (e.g., actual namespace names, pod names)
+- Skip if using aggregation functions that group across all values
 
 **STEP 4: Execute your query using the EXACT metric name from Step 1**
 - Use execute_instant_query for current state questions
@@ -39,7 +41,18 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
 ## Query Type Selection
 
 - **execute_instant_query**: Current values, point-in-time snapshots, "right now" questions
-- **execute_range_query**: Trends over time, rate calculations, historical analysis`
+- **execute_range_query**: Trends over time, rate calculations, historical analysis
+
+## QUERY EFFICIENCY
+
+Write PromQL that answers the question in as few queries as possible. Do NOT query one entity at a time (e.g., one query per pod, per namespace, or per node). Instead, use PromQL aggregation to get all results in a single query.
+
+- Use topk/bottomk to find top or bottom N entities
+- Use sum by, avg by to group results by label
+- Use rate/increase for per-second or total-change calculations
+- Combine them: topk(5, sum by (pod) (rate(metric[5m])))
+
+AIM for 1-3 queries per question. If you are making more than 5 query tool calls for a single question, you are likely querying individual entities instead of using aggregation.`
 
 	ListMetricsPrompt = `MANDATORY FIRST STEP: List all available metric names in Prometheus.
 
@@ -68,7 +81,8 @@ WHEN TO USE:
 - Point-in-time snapshots: "How many pods are running?"
 - Latest values: "Which pods are in Pending state?"
 
-The 'query' parameter MUST use metric names that were returned by list_metrics.`
+The 'query' parameter MUST use metric names that were returned by list_metrics.
+Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.`
 
 	ExecuteRangeQueryPrompt = `Execute a PromQL range query to get time-series data over a period.
 
@@ -83,7 +97,8 @@ TIME PARAMETERS:
 - 'duration': Look back from now (e.g., "5m", "1h", "24h")
 - 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
 
-The 'query' parameter MUST use metric names that were returned by list_metrics.`
+The 'query' parameter MUST use metric names that were returned by list_metrics.
+Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.`
 
 	GetLabelNamesPrompt = `Get all label names (dimensions) available for filtering a metric.