You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prompt: add query efficiency guidance to reduce excessive tool calls
- Add QUERY EFFICIENCY section to ServerPrompt with PromQL aggregation patterns
(topk, sum by, rate) to prevent agents from querying one entity at a time
- Mark Steps 2-3 (get_label_names, get_label_values) as optional when aggregated
queries suffice
- Add aggregation reminders to ExecuteInstantQueryPrompt and ExecuteRangeQueryPrompt
Signed-off-by: Jayapriya Pai <janantha@redhat.com>
Copy file name to clipboardExpand all lines: TOOLS.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
38
38
39
39
- PREREQUISITE: You MUST call list_metrics first to verify the metric exists
40
40
- WHEN TO USE: - Current state questions: "What is the current error rate?" - Point-in-time snapshots: "How many pods are running?" - Latest values: "Which pods are in Pending state?"
41
-
- The 'query' parameter MUST use metric names that were returned by list_metrics.
41
+
- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.
42
42
43
43
**Parameters:**
44
44
@@ -66,7 +66,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
66
66
- PREREQUISITE: You MUST call list_metrics first to verify the metric exists
67
67
- WHEN TO USE: - Trends over time: "What was CPU usage over the last hour?" - Rate calculations: "How many requests per second?" - Historical analysis: "Were there any restarts in the last 5 minutes?"
68
68
- TIME PARAMETERS: - 'duration': Look back from now (e.g., "5m", "1h", "24h") - 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
69
-
- The 'query' parameter MUST use metric names that were returned by list_metrics.
69
+
- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.
Copy file name to clipboardExpand all lines: pkg/tools/prompt.go
+20-5Lines changed: 20 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -18,11 +18,13 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
18
18
- Always pass in a name_regex param to it with a best guess of what the metric would be named like.
19
19
- Search the returned list to find the exact metric name that exists
20
20
21
-
**STEP 2: Call get_label_names for the metric you found**
21
+
**STEP 2 (optional): Call get_label_names for the metric you found**
22
22
- Discover available labels for filtering (namespace, pod, service, etc.)
23
+
- Skip if you can write an aggregated query directly (e.g., sum by, topk)
23
24
24
-
**STEP 3: Call get_label_values if you need specific filter values**
25
+
**STEP 3 (optional): Call get_label_values if you need specific filter values**
25
26
- Find exact label values (e.g., actual namespace names, pod names)
27
+
- Skip if using aggregation functions that group across all values
26
28
27
29
**STEP 4: Execute your query using the EXACT metric name from Step 1**
28
30
- Use execute_instant_query for current state questions
@@ -39,7 +41,18 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
39
41
## Query Type Selection
40
42
41
43
- **execute_instant_query**: Current values, point-in-time snapshots, "right now" questions
42
-
- **execute_range_query**: Trends over time, rate calculations, historical analysis`
44
+
- **execute_range_query**: Trends over time, rate calculations, historical analysis
45
+
46
+
## QUERY EFFICIENCY
47
+
48
+
Write PromQL that answers the question in as few queries as possible. Do NOT query one entity at a time (e.g., one query per pod, per namespace, or per node). Instead, use PromQL aggregation to get all results in a single query.
49
+
50
+
- Use topk/bottomk to find top or bottom N entities
51
+
- Use sum by, avg by to group results by label
52
+
- Use rate/increase for per-second or total-change calculations
53
+
- Combine them: topk(5, sum by (pod) (rate(metric[5m])))
54
+
55
+
AIM for 1-3 queries per question. If you are making more than 5 query tool calls for a single question, you are likely querying individual entities instead of using aggregation.`
43
56
44
57
ListMetricsPrompt=`MANDATORY FIRST STEP: List all available metric names in Prometheus.
45
58
@@ -68,7 +81,8 @@ WHEN TO USE:
68
81
- Point-in-time snapshots: "How many pods are running?"
69
82
- Latest values: "Which pods are in Pending state?"
70
83
71
-
The 'query' parameter MUST use metric names that were returned by list_metrics.`
84
+
The 'query' parameter MUST use metric names that were returned by list_metrics.
85
+
Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.`
72
86
73
87
ExecuteRangeQueryPrompt=`Execute a PromQL range query to get time-series data over a period.
74
88
@@ -83,7 +97,8 @@ TIME PARAMETERS:
83
97
- 'duration': Look back from now (e.g., "5m", "1h", "24h")
84
98
- 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
85
99
86
-
The 'query' parameter MUST use metric names that were returned by list_metrics.`
100
+
The 'query' parameter MUST use metric names that were returned by list_metrics.
101
+
Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.`
87
102
88
103
GetLabelNamesPrompt=`Get all label names (dimensions) available for filtering a metric.
0 commit comments