You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We propose adding a Claude Code plugin to the observability-stack repository that teaches Claude Code how to query traces, logs, and metrics from the running stack using PPL, PromQL, and curl commands. The plugin is a set of markdown skill files with no runtime code and no build step. Claude Code loads them as context to gain OpenSearch-native observability capabilities.
No existing public Claude Code skill covers OpenSearch observability or PPL. This fills that gap.
Motivation
Developers using AI coding assistants with the observability stack currently have to:
Manually look up PPL syntax for every trace or log query
Remember the correct curl flags, auth credentials, and API endpoints for OpenSearch and Prometheus
Know which index patterns store traces vs. logs vs. service maps
Construct cross-signal correlation queries (trace-to-log joins) from scratch
Debug stack health issues without structured guidance
Build RED metrics dashboards and SLO/SLI monitoring from scratch
Figure out how to connect to AWS managed services (Amazon OpenSearch Service, Amazon Managed Prometheus) with SigV4 auth
A Claude Code plugin eliminates this friction. When a developer asks "show me the slowest agent invocations in the last hour", "what's the error budget burn rate for the payment service?", or "why is the payment service erroring?", Claude Code can immediately construct and execute the right PPL or PromQL query against the right endpoint with the right auth.
Glossary
Term
Definition
Plugin
A collection of CLAUDE.md-compatible markdown skill files placed in a project directory that Claude Code loads as context to gain domain-specific capabilities.
Skill File
A single markdown file with frontmatter (name, description, allowed-tools) and instructional content that teaches Claude Code a specific capability.
PPL
Piped Processing Language, the query language used by OpenSearch for log and trace analytics. Queries are piped commands starting with source=<index>.
PromQL
Prometheus Query Language used for querying time-series metrics from Prometheus.
OpenSearch
The search and analytics engine that stores traces and logs in this stack, accessible at port 9200 with HTTPS and basic authentication.
Prometheus
The time-series database that stores metrics in this stack, accessible at port 9090.
OTel Collector
The OpenTelemetry Collector that receives telemetry via OTLP protocol on ports 4317 (gRPC) and 4318 (HTTP) and routes data to Data Prepper and Prometheus.
Data Prepper
The pipeline processor that transforms and enriches logs and traces before writing them to OpenSearch.
Trace Index
The OpenSearch index pattern otel-v1-apm-span-* storing trace span data.
Log Index
The OpenSearch index pattern otel-v1-apm-log-* storing log data.
Service Map Index
The OpenSearch index otel-v2-apm-service-map storing service dependency topology.
Gen AI Attributes
OpenTelemetry semantic convention attributes for generative AI operations, prefixed with gen_ai.* (e.g., gen_ai.operation.name, gen_ai.agent.name, gen_ai.usage.input_tokens).
Stack
The complete observability infrastructure: OTel Collector, Data Prepper, OpenSearch, Prometheus, and OpenSearch Dashboards.
Cross-Signal Correlation
The practice of linking telemetry signals (traces, logs, metrics) using shared identifiers such as traceId and spanId to enable end-to-end investigation.
Exemplar
A Prometheus data structure that links an individual metric sample to a specific trace by carrying trace_id and span_id alongside the measurement value. Enables metric-to-trace correlation.
Test Fixture
A YAML file defining a single integration test case with command, expected status code, expected response fields, and tags.
PPL Grammar Source
The official OpenSearch PPL grammar documentation located in the opensearch-project/sql repository under docs/user/ppl/.
RED Metrics
Rate, Errors, Duration: the three golden signals for service-level APM monitoring. Rate measures throughput, Errors measures failure ratio, Duration measures latency distribution.
SLI
Service Level Indicator: a quantitative measurement of a service's behavior, such as the ratio of successful requests to total requests.
SLO
Service Level Objective: a target value or range for an SLI, such as "99.9% availability over 30 days."
Error Budget
The allowed amount of unreliability derived from an SLO. For a 99.9% SLO, the error budget is 0.1%.
Burn Rate
The speed at which the error budget is being consumed. A burn rate of 1x means the budget will be exhausted exactly at the end of the SLO window.
Recording Rule
A Prometheus configuration that pre-computes and stores the result of a PromQL expression as a new time series, enabling efficient querying of SLI metrics at multiple time windows.
AWS SigV4
AWS Signature Version 4, the authentication protocol used to sign HTTP requests to AWS services including Amazon OpenSearch Service and Amazon Managed Prometheus.
Architecture
System Context
graph TB
subgraph "Claude Code Plugin"
CM[CLAUDE.md<br/>Entry Point]
subgraph "skills/"
TS[traces.md]
LS[logs.md]
MS[metrics.md]
SH[stack-health.md]
PR[ppl-reference.md]
CR[correlation.md]
AR[apm-red.md]
SL[slo-sli.md]
end
subgraph "tests/"
CF[conftest.py]
TF[test_fixtures.py]
TR[test_runner.py]
FX[fixtures/*.yaml]
end
end
subgraph "Observability Stack"
OS[OpenSearch :9200<br/>HTTPS + Basic Auth]
PM[Prometheus :9090<br/>HTTP]
OC[OTel Collector :4317/:4318]
DP[Data Prepper :21890]
end
CM -->|references| TS
CM -->|references| LS
CM -->|references| MS
CM -->|references| SH
CM -->|references| PR
CM -->|references| CR
CM -->|references| AR
CM -->|references| SL
TS -->|PPL queries via curl| OS
LS -->|PPL queries via curl| OS
CR -->|PPL queries via curl| OS
CR -->|PromQL + exemplars via curl| PM
AR -->|PromQL RED queries via curl| PM
AR -->|PPL RED queries via curl| OS
SL -->|PromQL SLO queries via curl| PM
SH -->|health checks via curl| OS
SH -->|health checks via curl| PM
SH -->|health checks via curl| OC
MS -->|PromQL queries via curl| PM
PR -->|PPL reference for| OS
TR -->|validates commands from| FX
CF -->|checks health of| OS
CF -->|checks health of| PM
Loading
Data Flow
flowchart LR
A[User asks Claude Code<br/>an observability question] --> B[Claude Code reads CLAUDE.md]
B --> C{Route by intent}
C -->|trace investigation| D[Load traces.md]
C -->|log search| E[Load logs.md]
C -->|metrics query| F[Load metrics.md]
C -->|stack issues| G[Load stack-health.md]
C -->|PPL syntax help| H[Load ppl-reference.md]
C -->|cross-signal correlation| X[Load correlation.md]
C -->|RED metrics / APM| Y[Load apm-red.md]
C -->|SLO/SLI / error budget| Z[Load slo-sli.md]
D --> I[Execute curl command<br/>against OpenSearch PPL API]
E --> I
F --> J[Execute curl command<br/>against Prometheus API]
G --> K[Execute curl/docker commands<br/>against stack endpoints]
H --> L[Reference for constructing<br/>novel PPL queries]
X --> I
X --> J
Y --> I
Y --> J
Z --> J
Loading
What's Included
Eight Skill Files
The plugin ships as a CLAUDE.md entry point plus eight skill files in a skills/ directory:
Every query template is a complete, copy-paste-ready curl command with:
Correct protocol (HTTPS for OpenSearch, HTTP for Prometheus)
Authentication (-u admin:'My_password_123!@#' for OpenSearch, none for Prometheus)
Certificate skip (-k for development)
Proper JSON body with PPL/PromQL query
Backtick escaping for dotted field names in PPL
Requirements
Requirement 1: Plugin Directory Structure
As a developer, I want the plugin organized as a directory of skill files with a top-level CLAUDE.md entry point, so that Claude Code automatically loads the observability capabilities when I work in the project.
The plugin contains a top-level CLAUDE.md that references all skill files
Each skill file includes frontmatter with name, description, and allowed-tools
Requirement 2: Traces Skill
As a developer, I want to query trace data from OpenSearch using PPL, so that I can investigate agent invocations, tool executions, slow spans, error spans, and token usage.
PPL query templates for agent invocation spans (attributes.gen_ai.operation.name = invoke_agent)
PPL query templates for tool execution spans (attributes.gen_ai.operation.name = execute_tool)
Slow span detection where durationInNanos exceeds a configurable threshold
Error span identification where status.code = 2
Token usage aggregation by model and by agent name
Service operation listing with GenAI operation type breakdown
Service map queries for dependency exploration
All GenAI attributes documented with descriptions and example values
Every PPL query includes the complete curl command with endpoint, auth, and escaping
Requirement 3: Logs Skill
As a developer, I want to query log data from OpenSearch using PPL, so that I can search logs by severity, correlate logs with traces, identify error patterns, and analyze log volume.
Severity-based filtering (ERROR, WARN, INFO)
Trace-to-log correlation via traceId
Error pattern identification with stats count() by aggregations
Log volume trending over time with span(time, <interval>)
Full-text body search with string matching or relevance functions
Log Index field reference: severityText, severityNumber, traceId, spanId, serviceName, body, @timestamp
Requirement 4: Metrics Skill
As a developer, I want to query metrics from Prometheus using PromQL, so that I can monitor HTTP request rates, latency percentiles, error rates, and active connections.
HTTP request rate per second grouped by service
HTTP latency at p95 and p99 by service
HTTP error rate (5xx) as a ratio
Active HTTP connections by service
Database operation latency at p95
Every PromQL query includes the complete curl command targeting localhost:9090/api/v1/query
Note on PPL as alternative for OpenSearch-ingested metrics
Requirement 5: Stack Health Skill
As a developer, I want to check the health of all observability stack components and troubleshoot common issues, so that I can verify the stack is operational and diagnose data flow problems.
Health check curl commands for OpenSearch, Prometheus, OTel Collector
Index listing and document count verification
Docker compose commands for container status and logs
Troubleshooting section for common failures: OpenSearch unreachable, no data in indices, Data Prepper pipeline errors, OTel Collector export failures
Port reference: OpenSearch (9200), OTel Collector gRPC (4317), OTel Collector HTTP (4318), Data Prepper (21890), Prometheus (9090), OpenSearch Dashboards (5601)
PPL describe for index mapping inspection
PPL _explain endpoint for query plan debugging
Requirement 6: PPL Reference Skill
As a developer, I want a comprehensive PPL language reference available to Claude Code, so that Claude Code can understand PPL syntax and construct correct queries for any observability question.
String: CONCAT, LENGTH, LOWER, UPPER, TRIM, SUBSTRING, REPLACE, REGEXP, REGEXP_EXTRACT, REGEXP_REPLACE, and more
System: TYPEOF
API Endpoints:
Query execution: POST /_plugins/_ppl with JSON body {"query": "<ppl_query>"}
Query explain: POST /_plugins/_ppl/_explain
Grammar metadata: GET /_plugins/_ppl/_grammar
Source: Grammar reference sourced from the opensearch-project/sql repository's docs/user/ppl/ directory.
Requirement 7: Skill File Format Compliance
Each skill file is valid markdown with YAML frontmatter delimited by ---
Frontmatter contains name, description, and allowed-tools fields
Top-level CLAUDE.md references each skill file path with a one-line summary
Credentials sourced from .env file (admin / My_password_123!@#), noted as configurable
Requirement 8: Authentication and Connection Details
Service
Protocol
Port
Auth
OpenSearch (local)
HTTPS
9200
Basic auth (admin / My_password_123!@#), -k flag for cert skip
OpenSearch (AWS managed)
HTTPS
443
AWS SigV4 (--aws-sigv4 "aws:amz:REGION:es")
Prometheus (local)
HTTP
9090
None
Prometheus (AWS managed)
HTTPS
443
AWS SigV4 (--aws-sigv4 "aws:amz:REGION:aps")
OTel Collector
HTTP
4317 (gRPC), 4318 (HTTP)
None
Data Prepper
HTTP
21890
None
OpenSearch Dashboards
HTTP
5601
Same as OpenSearch
All credentials are sourced from the repository .env file. The test harness reads .env with fallback to these defaults.
Skill files provide curl command variants for both local and AWS managed endpoints. The CLAUDE.md entry point includes a configuration section where users set $OPENSEARCH_ENDPOINT and $PROMETHEUS_ENDPOINT environment variables to switch between local and managed services. PPL and PromQL query syntax is identical across both profiles; only the endpoint URL and authentication method differ.
Requirement 9: PPL Grammar Source Documentation
Grammar reference sourced from opensearch-project/sql repository's docs/user/ppl/ directory
Functions organized into categories matching the source repository
Requirement 10: Cross-Signal Correlation and GenAI Debugging
As a developer, I want the plugin skills to support cross-signal correlation between traces, logs, and metrics, and provide GenAI-specific debugging capabilities, so that I can perform end-to-end observability investigations across all telemetry signals.
Cross-signal correlation:
Trace-to-log joins by matching traceId across Trace Index and Log Index
Log-to-span correlation by spanId
Full trace tree reconstruction by traceId with parentSpanId hierarchy
Latency gap analysis between parent and child spans
Root span identification where parentSpanId is empty or null
GenAI operation types (beyond invoke_agent and execute_tool):
gen_ai_client_token_usage histogram grouped by operation and model
gen_ai_client_operation_duration histogram grouped by operation and model
Requirement 11: Integration Test Harness
As a developer, I want an integration test suite that validates all skill file commands against a running observability stack, so that I can verify the plugin's queries and health checks produce correct results.
Test infrastructure:
pytest test suite in a tests/ directory within the plugin
YAML fixture files defining test cases with command, expected_status_code, expected_fields, and tags
Pydantic model for strict schema validation (extra="forbid")
Session-scoped fixture that checks stack health before tests run
All tests skipped with clear message if stack is not running
Test categories:
traces: PPL queries against Trace Index, validate schema and datarows in response
logs: PPL queries against Log Index, validate response structure
metrics: PromQL queries against Prometheus, validate status: "success" and data field
stack-health: Health check commands, validate HTTP 200 status codes
ppl: PPL system commands (describe, _explain), validate response structure
correlation: Cross-signal correlation queries, validate join results and exemplar responses
apm_red: RED metric queries against Prometheus and OpenSearch, validate rate/error/duration responses
slo_sli: SLO/SLI queries against Prometheus, validate recording rule outputs and burn rate calculations
Test execution:
Commands executed via subprocess.run with configurable timeout (default 30s)
JSON response parsing with recursive field lookup for expected_fields
pytest markers for tag-based filtering (pytest -m traces)
before_test and after_test hooks in YAML for setup/teardown scripts
Configuration:
Connection details read from .env with fallback defaults
README documenting how to run tests, prerequisites, and how to add new test cases
Requirement 12: Correlation Skill
As a developer, I want a dedicated correlation skill that teaches Claude Code how to join traces, logs, and metrics across all three telemetry signals using OTel semantic convention correlation fields, so that I can perform end-to-end investigations starting from any signal.
As a developer, I want a dedicated APM skill that teaches Claude Code how to construct RED (Rate, Errors, Duration) metrics queries for any service, so that I can quickly assess service health using the standard APM methodology.
Rate queries: per-service request rate via PromQL (rate(http_server_duration_seconds_count[5m])), per-endpoint rate, and PPL alternative from trace spans
Error queries: error rate as a ratio (5xx / total) via PromQL, error count from trace spans via PPL (status.code = 2)
Duration queries: latency percentiles (p50, p95, p99) via PromQL histogram_quantile and PPL percentile() from trace spans
Combined RED dashboard query set for all services in a single investigation workflow
GenAI-specific RED metrics using gen_ai_client_operation_duration histogram
OTel HTTP semantic convention metrics reference: http.server.request.duration (histogram), http.server.active_requests (gauge), and their Prometheus-exported equivalents
OTel Collector spanmetrics connector documentation for auto-generating RED metrics from traces
Every query template includes the complete curl command with the appropriate endpoint and authentication
Requirement 14: SLO/SLI Skill
As a developer, I want a dedicated SLO/SLI skill that teaches Claude Code how to define SLIs, calculate error budgets, and construct burn rate queries using Prometheus recording rules, so that I can implement and monitor service level objectives for my services.
- name: "agent_invocations"description: "Query all agent invocation spans"command: | curl -sk -u admin:'My_password_123!@#' \ -X POST https://localhost:9200/_plugins/_ppl \ -H 'Content-Type: application/json' \ -d '{"query": "source=otel-v1-apm-span-* | WHERE `attributes.gen_ai.operation.name` = '\''invoke_agent'\'' | head 10"}'expected_status_code: 200expected_fields: ["schema", "datarows"]tags: ["traces"]before_test: nullafter_test: null
Design Decisions
Why a flat skills/ directory?
Eight files don't need subdirectories. Flat is simpler to reference from CLAUDE.md and easier for contributors to navigate.
Why complete curl commands instead of just query bodies?
Claude Code can execute curl directly via its Bash tool. Including the full command (endpoint, auth, headers, body) means zero assembly required. The skill file is the executable documentation.
Why a dedicated PPL reference file?
The PPL grammar is large (50+ commands, 14 function categories). Inlining it into traces.md or logs.md would bloat those files. As a separate skill, Claude Code loads it on demand when it needs to construct a novel query.
Why YAML test fixtures instead of inline pytest?
Declarative YAML fixtures are easier for contributors to add (no Python knowledge needed to add a test case). The Pydantic schema catches malformed fixtures at load time. This pattern is proven at scale in HolmesGPT's test suite.
Why read credentials from .env?
The observability stack already centralizes configuration in .env. The plugin and test harness reuse the same source of truth rather than duplicating credentials.
Error Handling
Skill File Errors
Scenario
Handling
OpenSearch unreachable
Stack health skill provides diagnostic steps: check docker compose ps, verify port 9200, check health endpoint
Prometheus unreachable
Stack health skill suggests checking container status and port 9090
Skill files document correct credentials from .env; stack health skill suggests verifying credentials
No data in indices
Stack health skill provides index listing commands and document count verification
Data Prepper pipeline errors
Stack health skill suggests checking Data Prepper logs via docker compose logs data-prepper
OTel Collector export failures
Stack health skill suggests checking collector metrics at port 8888 and logs
Test Harness Errors
Scenario
Handling
Stack not running
Session-scoped fixture detects this and skips all tests with clear message
Curl command timeout
Configurable timeout (default 30s); test fails with timeout error
Invalid YAML fixture
Pydantic model with extra="forbid" raises validation error at load time
Unexpected JSON response
Test reports which expected_fields were missing from the response
Hook failure
Test reports before_test/after_test hook failure separately from the main command result
Missing .env file
Config loader falls back to hardcoded defaults
Running the Tests
Prerequisites: the observability stack must be running (docker compose up -d).
cd claude-code-observability-plugin/tests
# Install dependencies
pip install -r requirements.txt
# Run all tests
pytest
# Run by category
pytest -m traces
pytest -m logs
pytest -m metrics
pytest -m stack_health
pytest -m ppl
# Verbose output
pytest -v --tb=short
If the stack is not running, all tests are skipped with a clear message.
Open Questions
Plugin location: Should the plugin live at the repo root (claude-code-observability-plugin/) or under a new plugins/ directory?
Versioning: Should the plugin version track the observability stack version, or have its own independent version?
Additional AI assistants: The skill file format is Claude Code-specific (CLAUDE.md convention). Should we also provide equivalent configurations for other AI coding assistants (e.g., Cursor rules, Kiro steering)?
Metrics in OpenSearch: The metrics skill currently targets Prometheus. Should we also include PPL queries for metrics stored in OpenSearch (when metrics are ingested via Data Prepper)?
Example telemetry data: Should the test harness include a script that sends sample telemetry data to the stack, so tests can validate queries return actual results rather than just valid empty responses?
How to Contribute
Adding a new query template to a skill file:
Add the curl command to the appropriate skills/*.md file
Add a corresponding test fixture in tests/fixtures/*.yaml
Run pytest to verify the command works against a running stack
Adding a new test case:
Create a YAML entry in the appropriate tests/fixtures/*.yaml file
Follow the schema: name, description, command, expected_status_code, expected_fields, tags
Run pytest -m <tag> to verify
Feedback Requested
We'd like feedback on:
The skill file organization and routing approach
Which query templates are most valuable for your workflow
The open questions above
Any missing capabilities or query patterns you'd want included
The integration test approach and fixture format
Please comment on this RFC or open an issue with your thoughts.
Summary
We propose adding a Claude Code plugin to the observability-stack repository that teaches Claude Code how to query traces, logs, and metrics from the running stack using PPL, PromQL, and curl commands. The plugin is a set of markdown skill files with no runtime code and no build step. Claude Code loads them as context to gain OpenSearch-native observability capabilities.
No existing public Claude Code skill covers OpenSearch observability or PPL. This fills that gap.
Motivation
Developers using AI coding assistants with the observability stack currently have to:
A Claude Code plugin eliminates this friction. When a developer asks "show me the slowest agent invocations in the last hour", "what's the error budget burn rate for the payment service?", or "why is the payment service erroring?", Claude Code can immediately construct and execute the right PPL or PromQL query against the right endpoint with the right auth.
Glossary
source=<index>.otel-v1-apm-span-*storing trace span data.otel-v1-apm-log-*storing log data.otel-v2-apm-service-mapstoring service dependency topology.gen_ai.*(e.g.,gen_ai.operation.name,gen_ai.agent.name,gen_ai.usage.input_tokens).traceIdandspanIdto enable end-to-end investigation.trace_idandspan_idalongside the measurement value. Enables metric-to-trace correlation.opensearch-project/sqlrepository underdocs/user/ppl/.Architecture
System Context
graph TB subgraph "Claude Code Plugin" CM[CLAUDE.md<br/>Entry Point] subgraph "skills/" TS[traces.md] LS[logs.md] MS[metrics.md] SH[stack-health.md] PR[ppl-reference.md] CR[correlation.md] AR[apm-red.md] SL[slo-sli.md] end subgraph "tests/" CF[conftest.py] TF[test_fixtures.py] TR[test_runner.py] FX[fixtures/*.yaml] end end subgraph "Observability Stack" OS[OpenSearch :9200<br/>HTTPS + Basic Auth] PM[Prometheus :9090<br/>HTTP] OC[OTel Collector :4317/:4318] DP[Data Prepper :21890] end CM -->|references| TS CM -->|references| LS CM -->|references| MS CM -->|references| SH CM -->|references| PR CM -->|references| CR CM -->|references| AR CM -->|references| SL TS -->|PPL queries via curl| OS LS -->|PPL queries via curl| OS CR -->|PPL queries via curl| OS CR -->|PromQL + exemplars via curl| PM AR -->|PromQL RED queries via curl| PM AR -->|PPL RED queries via curl| OS SL -->|PromQL SLO queries via curl| PM SH -->|health checks via curl| OS SH -->|health checks via curl| PM SH -->|health checks via curl| OC MS -->|PromQL queries via curl| PM PR -->|PPL reference for| OS TR -->|validates commands from| FX CF -->|checks health of| OS CF -->|checks health of| PMData Flow
flowchart LR A[User asks Claude Code<br/>an observability question] --> B[Claude Code reads CLAUDE.md] B --> C{Route by intent} C -->|trace investigation| D[Load traces.md] C -->|log search| E[Load logs.md] C -->|metrics query| F[Load metrics.md] C -->|stack issues| G[Load stack-health.md] C -->|PPL syntax help| H[Load ppl-reference.md] C -->|cross-signal correlation| X[Load correlation.md] C -->|RED metrics / APM| Y[Load apm-red.md] C -->|SLO/SLI / error budget| Z[Load slo-sli.md] D --> I[Execute curl command<br/>against OpenSearch PPL API] E --> I F --> J[Execute curl command<br/>against Prometheus API] G --> K[Execute curl/docker commands<br/>against stack endpoints] H --> L[Reference for constructing<br/>novel PPL queries] X --> I X --> J Y --> I Y --> J Z --> JWhat's Included
Eight Skill Files
The plugin ships as a
CLAUDE.mdentry point plus eight skill files in askills/directory:traces.md:9200logs.md:9200metrics.md:9090stack-health.mdppl-reference.mdcorrelation.mdapm-red.mdslo-sli.md:9090Plugin Directory Structure
Skill File Format
Each skill file follows the Claude Code CLAUDE.md convention:
Every query template is a complete, copy-paste-ready curl command with:
-u admin:'My_password_123!@#'for OpenSearch, none for Prometheus)-kfor development)Requirements
Requirement 1: Plugin Directory Structure
As a developer, I want the plugin organized as a directory of skill files with a top-level CLAUDE.md entry point, so that Claude Code automatically loads the observability capabilities when I work in the project.
skills/directoryname,description, andallowed-toolsRequirement 2: Traces Skill
As a developer, I want to query trace data from OpenSearch using PPL, so that I can investigate agent invocations, tool executions, slow spans, error spans, and token usage.
attributes.gen_ai.operation.name = invoke_agent)attributes.gen_ai.operation.name = execute_tool)durationInNanosexceeds a configurable thresholdstatus.code = 2Requirement 3: Logs Skill
As a developer, I want to query log data from OpenSearch using PPL, so that I can search logs by severity, correlate logs with traces, identify error patterns, and analyze log volume.
traceIdstats count() byaggregationsspan(time, <interval>)severityText,severityNumber,traceId,spanId,serviceName,body,@timestampRequirement 4: Metrics Skill
As a developer, I want to query metrics from Prometheus using PromQL, so that I can monitor HTTP request rates, latency percentiles, error rates, and active connections.
localhost:9090/api/v1/queryRequirement 5: Stack Health Skill
As a developer, I want to check the health of all observability stack components and troubleshoot common issues, so that I can verify the stack is operational and diagnose data flow problems.
describefor index mapping inspection_explainendpoint for query plan debuggingRequirement 6: PPL Reference Skill
As a developer, I want a comprehensive PPL language reference available to Claude Code, so that Claude Code can understand PPL syntax and construct correct queries for any observability question.
Commands (50+):
search,source,where,fields,stats,sort,head,eval,dedup,rename,top,rare,tabletimechart,chart,bin,trendline,streamstats,eventstatsparse,grok,rex,regex,patterns,spathjoin,lookup,graphlookup,subquery,append,appendcol,appendpipefillnull,flatten,expand,transpose,convert,replace,reversemvexpand,mvcombine,nomvaddcoltotals,addtotalsad(anomaly detection),kmeans,mldescribe,explain,showdatasources,multisearchfieldformatFunctions (14 categories):
API Endpoints:
POST /_plugins/_pplwith JSON body{"query": "<ppl_query>"}POST /_plugins/_ppl/_explainGET /_plugins/_ppl/_grammarSource: Grammar reference sourced from the
opensearch-project/sqlrepository'sdocs/user/ppl/directory.Requirement 7: Skill File Format Compliance
---name,description, andallowed-toolsfields.envfile (admin /My_password_123!@#), noted as configurableRequirement 8: Authentication and Connection Details
admin/My_password_123!@#),-kflag for cert skip--aws-sigv4 "aws:amz:REGION:es")--aws-sigv4 "aws:amz:REGION:aps")All credentials are sourced from the repository
.envfile. The test harness reads.envwith fallback to these defaults.Skill files provide curl command variants for both local and AWS managed endpoints. The CLAUDE.md entry point includes a configuration section where users set
$OPENSEARCH_ENDPOINTand$PROMETHEUS_ENDPOINTenvironment variables to switch between local and managed services. PPL and PromQL query syntax is identical across both profiles; only the endpoint URL and authentication method differ.Requirement 9: PPL Grammar Source Documentation
opensearch-project/sqlrepository'sdocs/user/ppl/directoryhttps://github.com/opensearch-project/sqlRequirement 10: Cross-Signal Correlation and GenAI Debugging
As a developer, I want the plugin skills to support cross-signal correlation between traces, logs, and metrics, and provide GenAI-specific debugging capabilities, so that I can perform end-to-end observability investigations across all telemetry signals.
Cross-signal correlation:
traceIdacross Trace Index and Log IndexspanIdtraceIdwithparentSpanIdhierarchyparentSpanIdis empty or nullGenAI operation types (beyond invoke_agent and execute_tool):
chat,embeddings,retrieval,create_agent,text_completion,generate_contentException and error querying:
exception.type,exception.message,exception.stacktraceerror.typefor error categorizationtraceIdandspanIdExtended GenAI attributes:
gen_ai.agent.id,gen_ai.agent.description,gen_ai.agent.versiongen_ai.conversation.idfor multi-turn conversation trackinggen_ai.tool.call.id,gen_ai.tool.type,gen_ai.tool.call.arguments,gen_ai.tool.call.resultGenAI-specific metrics:
gen_ai_client_token_usagehistogram grouped by operation and modelgen_ai_client_operation_durationhistogram grouped by operation and modelRequirement 11: Integration Test Harness
As a developer, I want an integration test suite that validates all skill file commands against a running observability stack, so that I can verify the plugin's queries and health checks produce correct results.
Test infrastructure:
tests/directory within the plugincommand,expected_status_code,expected_fields, andtagsextra="forbid")Test categories:
traces: PPL queries against Trace Index, validateschemaanddatarowsin responselogs: PPL queries against Log Index, validate response structuremetrics: PromQL queries against Prometheus, validatestatus: "success"anddatafieldstack-health: Health check commands, validate HTTP 200 status codesppl: PPL system commands (describe,_explain), validate response structurecorrelation: Cross-signal correlation queries, validate join results and exemplar responsesapm_red: RED metric queries against Prometheus and OpenSearch, validate rate/error/duration responsesslo_sli: SLO/SLI queries against Prometheus, validate recording rule outputs and burn rate calculationsTest execution:
subprocess.runwith configurable timeout (default 30s)expected_fieldspytest -m traces)before_testandafter_testhooks in YAML for setup/teardown scriptsConfiguration:
.envwith fallback defaultspytest,pyyaml,pydantic,requests,hypothesisRequirement 12: Correlation Skill
As a developer, I want a dedicated correlation skill that teaches Claude Code how to join traces, logs, and metrics across all three telemetry signals using OTel semantic convention correlation fields, so that I can perform end-to-end investigations starting from any signal.
OTel correlation fields (sourced from opentelemetry.io):
The OTel specification defines three correlation mechanisms across signals:
traceId,spanId,traceFlagstraceId/spanId, enabling direct joinstrace_id,span_id,filtered_attributesservice.name,service.namespace,service.version,service.instance.idGenAI resource attributes promoted to Prometheus labels in this stack:
gen_ai.agent.id,gen_ai.agent.name,gen_ai.provider.name,gen_ai.request.model,gen_ai.response.modeldocker-compose/prometheus/prometheus.ymlunderotlp.promote_resource_attributesTrace-to-log correlation (PPL):
source=otel-v1-apm-log-* | WHERE traceId = '<id>'source=otel-v1-apm-log-* | WHERE spanId = '<id>'joinacross Trace Index and Log Index ontraceIdtraceId, sorted by timestampLog-to-trace correlation (PPL):
traceIdand query the Trace Index for the full trace treespanIdand find the exact span that produced itMetric-to-trace correlation (PromQL + exemplars):
GET /api/v1/query_exemplars?query=<metric>&start=<start>&end=<end>trace_idfrom exemplar, then query Trace Index via PPLgen_ai_agent_name,gen_ai_request_model), then correlate to tracesResource-level correlation:
serviceNamein traces/logs maps toservice_namelabel in Prometheus metricsInvestigation workflows:
Requirement 13: APM/RED Metrics Skill
As a developer, I want a dedicated APM skill that teaches Claude Code how to construct RED (Rate, Errors, Duration) metrics queries for any service, so that I can quickly assess service health using the standard APM methodology.
rate(http_server_duration_seconds_count[5m])), per-endpoint rate, and PPL alternative from trace spansstatus.code = 2)histogram_quantileand PPLpercentile()from trace spansgen_ai_client_operation_durationhistogramhttp.server.request.duration(histogram),http.server.active_requests(gauge), and their Prometheus-exported equivalentsspanmetricsconnector documentation for auto-generating RED metrics from tracesRequirement 14: SLO/SLI Skill
As a developer, I want a dedicated SLO/SLI skill that teaches Claude Code how to define SLIs, calculate error budgets, and construct burn rate queries using Prometheus recording rules, so that I can implement and monitor service level objectives for my services.
sli:http_availability:ratio_rate<window>,sli:http_latency:ratio_rate<window>Data Models
OpenSearch Trace Index Schema (otel-v1-apm-span-*)
traceIdspanIdparentSpanIdserviceNamenamekindstartTimeendTimedurationInNanosstatus.codeattributes.gen_ai.operation.nameattributes.gen_ai.agent.nameattributes.gen_ai.agent.idattributes.gen_ai.request.modelattributes.gen_ai.usage.input_tokensattributes.gen_ai.usage.output_tokensattributes.gen_ai.tool.nameattributes.gen_ai.tool.call.idattributes.gen_ai.tool.call.argumentsattributes.gen_ai.tool.call.resultattributes.gen_ai.conversation.idevents.attributes.exception.typeevents.attributes.exception.messageevents.attributes.exception.stacktraceOpenSearch Log Index Schema (otel-v1-apm-log-*)
traceIdspanIdseverityTextseverityNumberserviceNamebody@timestampOpenSearch Service Map Index (otel-v2-apm-service-map)
serviceNamedestination.domaindestination.resourcetraceGroupNamePrometheus Metrics
http_server_duration_secondsservice_name,http_response_status_codehttp_server_active_requestsservice_namedb_client_operation_duration_secondsservice_namegen_ai_client_token_usagegen_ai.operation.name,gen_ai.request.modelgen_ai_client_operation_durationgen_ai.operation.name,gen_ai.request.modelConnection Profiles
https://localhost:9200-u admin:'My_password_123!@#' -k)http://localhost:9090https://DOMAIN-ID.REGION.es.amazonaws.com--aws-sigv4 "aws:amz:REGION:es")https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID--aws-sigv4 "aws:amz:REGION:aps")Test Fixture YAML Schema
Design Decisions
Why a flat
skills/directory?Eight files don't need subdirectories. Flat is simpler to reference from CLAUDE.md and easier for contributors to navigate.
Why complete curl commands instead of just query bodies?
Claude Code can execute curl directly via its Bash tool. Including the full command (endpoint, auth, headers, body) means zero assembly required. The skill file is the executable documentation.
Why a dedicated PPL reference file?
The PPL grammar is large (50+ commands, 14 function categories). Inlining it into traces.md or logs.md would bloat those files. As a separate skill, Claude Code loads it on demand when it needs to construct a novel query.
Why YAML test fixtures instead of inline pytest?
Declarative YAML fixtures are easier for contributors to add (no Python knowledge needed to add a test case). The Pydantic schema catches malformed fixtures at load time. This pattern is proven at scale in HolmesGPT's test suite.
Why read credentials from
.env?The observability stack already centralizes configuration in
.env. The plugin and test harness reuse the same source of truth rather than duplicating credentials.Error Handling
Skill File Errors
docker compose ps, verify port 9200, check health endpoint_explainendpoint helps debug query plans.env; stack health skill suggests verifying credentialsdocker compose logs data-prepperTest Harness Errors
extra="forbid"raises validation error at load timeRunning the Tests
Prerequisites: the observability stack must be running (
docker compose up -d).If the stack is not running, all tests are skipped with a clear message.
Open Questions
Plugin location: Should the plugin live at the repo root (
claude-code-observability-plugin/) or under a newplugins/directory?Versioning: Should the plugin version track the observability stack version, or have its own independent version?
Additional AI assistants: The skill file format is Claude Code-specific (CLAUDE.md convention). Should we also provide equivalent configurations for other AI coding assistants (e.g., Cursor rules, Kiro steering)?
Metrics in OpenSearch: The metrics skill currently targets Prometheus. Should we also include PPL queries for metrics stored in OpenSearch (when metrics are ingested via Data Prepper)?
Example telemetry data: Should the test harness include a script that sends sample telemetry data to the stack, so tests can validate queries return actual results rather than just valid empty responses?
How to Contribute
Adding a new query template to a skill file:
skills/*.mdfiletests/fixtures/*.yamlpytestto verify the command works against a running stackAdding a new test case:
tests/fixtures/*.yamlfilename,description,command,expected_status_code,expected_fields,tagspytest -m <tag>to verifyFeedback Requested
We'd like feedback on:
Please comment on this RFC or open an issue with your thoughts.