Skip to content

Commit fb9ac9a

Browse files
committed
prompt: add query efficiency guidance to reduce excessive tool calls
- Add QUERY EFFICIENCY section to ServerPrompt with PromQL aggregation patterns (topk, sum by, rate) to prevent agents from querying one entity at a time - Mark Steps 2-3 (get_label_names, get_label_values) as optional when aggregated queries suffice - Add aggregation reminders to ExecuteInstantQueryPrompt and ExecuteRangeQueryPrompt Signed-off-by: Jayapriya Pai <janantha@redhat.com>
1 parent d22b1ab commit fb9ac9a

2 files changed

Lines changed: 22 additions & 7 deletions

File tree

TOOLS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
3838

3939
- PREREQUISITE: You MUST call list_metrics first to verify the metric exists
4040
- WHEN TO USE: - Current state questions: "What is the current error rate?" - Point-in-time snapshots: "How many pods are running?" - Latest values: "Which pods are in Pending state?"
41-
- The 'query' parameter MUST use metric names that were returned by list_metrics.
41+
- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.
4242

4343
**Parameters:**
4444

@@ -66,7 +66,7 @@ This MCP server exposes the following tools for interacting with Prometheus/Than
6666
- PREREQUISITE: You MUST call list_metrics first to verify the metric exists
6767
- WHEN TO USE: - Trends over time: "What was CPU usage over the last hour?" - Rate calculations: "How many requests per second?" - Historical analysis: "Were there any restarts in the last 5 minutes?"
6868
- TIME PARAMETERS: - 'duration': Look back from now (e.g., "5m", "1h", "24h") - 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
69-
- The 'query' parameter MUST use metric names that were returned by list_metrics.
69+
- The 'query' parameter MUST use metric names that were returned by list_metrics. Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.
7070

7171
**Parameters:**
7272

pkg/tools/prompt.go

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,13 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
1818
- Always pass in a name_regex param to it with a best guess of what the metric would be named like.
1919
- Search the returned list to find the exact metric name that exists
2020
21-
**STEP 2: Call get_label_names for the metric you found**
21+
**STEP 2 (optional): Call get_label_names for the metric you found**
2222
- Discover available labels for filtering (namespace, pod, service, etc.)
23+
- Skip if you can write an aggregated query directly (e.g., sum by, topk)
2324
24-
**STEP 3: Call get_label_values if you need specific filter values**
25+
**STEP 3 (optional): Call get_label_values if you need specific filter values**
2526
- Find exact label values (e.g., actual namespace names, pod names)
27+
- Skip if using aggregation functions that group across all values
2628
2729
**STEP 4: Execute your query using the EXACT metric name from Step 1**
2830
- Use execute_instant_query for current state questions
@@ -39,7 +41,18 @@ If the user mentions a specific alert by name, use get_alerts with a filter to r
3941
## Query Type Selection
4042
4143
- **execute_instant_query**: Current values, point-in-time snapshots, "right now" questions
42-
- **execute_range_query**: Trends over time, rate calculations, historical analysis`
44+
- **execute_range_query**: Trends over time, rate calculations, historical analysis
45+
46+
## QUERY EFFICIENCY
47+
48+
Write PromQL that answers the question in as few queries as possible. Do NOT query one entity at a time (e.g., one query per pod, per namespace, or per node). Instead, use PromQL aggregation to get all results in a single query.
49+
50+
- Use topk/bottomk to find top or bottom N entities
51+
- Use sum by, avg by to group results by label
52+
- Use rate/increase for per-second or total-change calculations
53+
- Combine them: topk(5, sum by (pod) (rate(metric[5m])))
54+
55+
AIM for 1-3 queries per question. If you are making more than 5 query tool calls for a single question, you are likely querying individual entities instead of using aggregation.`
4356

4457
ListMetricsPrompt = `MANDATORY FIRST STEP: List all available metric names in Prometheus.
4558
@@ -68,7 +81,8 @@ WHEN TO USE:
6881
- Point-in-time snapshots: "How many pods are running?"
6982
- Latest values: "Which pods are in Pending state?"
7083
71-
The 'query' parameter MUST use metric names that were returned by list_metrics.`
84+
The 'query' parameter MUST use metric names that were returned by list_metrics.
85+
Use aggregation functions (topk, sum by, avg by) to answer in a single query instead of querying individual entities.`
7286

7387
ExecuteRangeQueryPrompt = `Execute a PromQL range query to get time-series data over a period.
7488
@@ -83,7 +97,8 @@ TIME PARAMETERS:
8397
- 'duration': Look back from now (e.g., "5m", "1h", "24h")
8498
- 'step': Data point resolution (e.g., "1m" for 1-hour duration, "5m" for 24-hour duration)
8599
86-
The 'query' parameter MUST use metric names that were returned by list_metrics.`
100+
The 'query' parameter MUST use metric names that were returned by list_metrics.
101+
Use aggregation functions (topk, sum by, rate, increase) to answer in a single query instead of querying individual entities.`
87102

88103
GetLabelNamesPrompt = `Get all label names (dimensions) available for filtering a metric.
89104

0 commit comments

Comments
 (0)