Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions TOOLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,15 @@ This MCP server exposes the following tools for interacting with Prometheus/Than

- YOU MUST CALL THIS TOOL BEFORE ANY OTHER QUERY TOOL
- This tool MUST be called first for EVERY observability question to: 1. Discover what metrics actually exist in this environment 2. Find the EXACT metric name to use in queries 3. Avoid querying non-existent metrics 4. The 'name_regex' parameter should always be provided, and be a best guess of what the metric would be named like. 5. Do not use a blanket regex like .* or .+ in the 'name_regex' parameter. Use specific ones like kube.*, node.*, etc.
- REGEX PATTERN GUIDANCE: - Prometheus metrics are typically prefixed (e.g., 'prometheus_tsdb_head_series', 'kube_pod_status_phase') - To match metrics CONTAINING a substring, use wildcards: '.*tsdb.*' matches 'prometheus_tsdb_head_series' - Without wildcards, the pattern matches EXACTLY: 'tsdb' only matches a metric literally named 'tsdb' (which rarely exists) - Common patterns: 'kube_pod.*' (pods), '.*memory.*' (memory-related), 'node_.*' (node metrics) - If you get empty results, try adding '.*' before/after your search term
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How common you think it's would be more correct to match prefix vs. inner-word. We could also direct the LLM to use ^ to match prefix, and do "contain" search by default. Exact search would then be achieved by ^tsdb$ . It seems it would be hardly to make by mistake.

Copy link
Copy Markdown
Member

@slashpai slashpai Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a one scenario I tried on a kind cluster Are any containers in OOMKilled state?, there was no metric available for this in the cluster.

Tool Call History: "Are any containers in OOMKilled state?"

First Attempt

# Tool Parameters Result
1 list_metrics name_regex: "oom|OOM|terminated" []
2 list_metrics name_regex: "kube_pod_container_status" []
3 list_metrics name_regex: "container_status|container_state" []
4 list_metrics name_regex: "kube_pod_container" []
5 list_metrics name_regex: ".*reason.*" kube_pod_container_status_waiting_reason, kube_pod_init_container_status_terminated_reason, kube_pod_status_reason
6 list_metrics name_regex: ".*terminated.*" kube_pod_container_status_terminated, kube_pod_init_container_status_terminated, kube_pod_init_container_status_terminated_reason, ...
7 list_metrics name_regex: "kube_pod_container_status_last_terminated_reason" []
8 list_metrics name_regex: "kube_pod_container_status_terminated" kube_pod_container_status_terminated
9 get_label_names metric: "kube_pod_container_status_terminated" Error: Aborted
10 list_metrics name_regex: "kube_pod_container_status" []
11 list_metrics name_regex: ".*terminated_reason" kube_pod_init_container_status_terminated_reason
12 get_label_names metric: "kube_pod_init_container_status_terminated_reason" __name__, container, instance, job, namespace, pod, reason, uid
13 get_label_names metric: "kube_pod_container_status_terminated" __name__, container, instance, job, namespace, pod, uid
14 list_metrics name_regex: "last_terminated" []
15 get_label_values metric: "kube_pod_init_container_status_terminated_reason", label: "reason" ["Completed"]
16 execute_instant_query query: kube_pod_init_container_status_terminated_reason{reason="OOMKilled"} []
17 execute_instant_query query: kube_pod_container_status_terminated{namespace=~".+"} == 1 []
18 list_metrics name_regex: "container.*memory|memory.*container" User rejected
19 list_metrics name_regex: "node_vmstat_oom" Error: Aborted

Summary

  • Total: 19 tool calls
  • Outcome: No OOMKilled containers found
  • Note: kube_pod_container_status_last_terminated_reason is not present in this cluster. Only kube_pod_init_container_status_terminated_reason exists (init containers), with reason values ["Completed"].

So for a scenario like a metric corresponding to a question doesn't exist, we may need to alert user earlier than letting them try more and being frustrated?

Say

**FAIL FAST:** If list_metrics returns no relevant metric after 2–3 attempts, STOP and inform the user immediately. Do not keep trying more patterns. Say something like: "The metric for [question] does not appear to exist in your cluster—it may not be scraped by your monitoring setup." This avoids frustrating the user with many failed tool calls.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can vary based on model used but still may be we should have a description for fail fast also?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way could be have a new parameter attempt_number that the server uses to return a “metric not found” hint after N attempts

wdyt?

- NEVER skip this step. NEVER guess metric names. Metric names vary between environments.
- After calling this tool: 1. Search the returned list for relevant metrics 2. Use the EXACT metric name found in subsequent queries 3. If no relevant metric exists, inform the user

**Parameters:**

| Parameter | Type | Required | Description |
| :----------- | :------- | :------: | :------------------------------------------------------------------------------------------------------------------------------------ |
| `name_regex` | `string` | ✅ | Regex pattern to filter metric names (e.g., 'http_.*', 'node_.*', 'kube.*'). This parameter is required. Don't pass in blanket regex. |
| Parameter | Type | Required | Description |
| :----------- | :------- | :------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name_regex` | `string` | ✅ | Regex pattern to filter metric names. IMPORTANT: Metric names are typically prefixed (e.g., 'prometheus_tsdb_head_series'). Use wildcards to match substrings: '.*tsdb.*' matches any metric containing 'tsdb', while 'tsdb' only matches the exact string 'tsdb'. Examples: 'http_.*' (starts with http_), '.*memory.*' (contains memory), 'node_.*' (starts with node_). This parameter is required. Don't pass in blanket regex like '.*' or '.+'. |

**Output Schema:**

Expand Down
2 changes: 1 addition & 1 deletion pkg/tools/definitions.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ var (
{
Name: "name_regex",
Type: ParamTypeString,
Description: "Regex pattern to filter metric names (e.g., 'http_.*', 'node_.*', 'kube.*'). This parameter is required. Don't pass in blanket regex.",
Description: "Regex pattern to filter metric names. IMPORTANT: Metric names are typically prefixed (e.g., 'prometheus_tsdb_head_series'). Use wildcards to match substrings: '.*tsdb.*' matches any metric containing 'tsdb', while 'tsdb' only matches the exact string 'tsdb'. Examples: 'http_.*' (starts with http_), '.*memory.*' (contains memory), 'node_.*' (starts with node_). This parameter is required. Don't pass in blanket regex like '.*' or '.+'.",
Required: true,
},
},
Expand Down
7 changes: 7 additions & 0 deletions pkg/tools/prompt.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ This tool MUST be called first for EVERY observability question to:
4. The 'name_regex' parameter should always be provided, and be a best guess of what the metric would be named like.
5. Do not use a blanket regex like .* or .+ in the 'name_regex' parameter. Use specific ones like kube.*, node.*, etc.

REGEX PATTERN GUIDANCE:
- Prometheus metrics are typically prefixed (e.g., 'prometheus_tsdb_head_series', 'kube_pod_status_phase')
- To match metrics CONTAINING a substring, use wildcards: '.*tsdb.*' matches 'prometheus_tsdb_head_series'
- Without wildcards, the pattern matches EXACTLY: 'tsdb' only matches a metric literally named 'tsdb' (which rarely exists)
- Common patterns: 'kube_pod.*' (pods), '.*memory.*' (memory-related), 'node_.*' (node metrics)
- If you get empty results, try adding '.*' before/after your search term

NEVER skip this step. NEVER guess metric names. Metric names vary between environments.

After calling this tool:
Expand Down