Skip to content
22 changes: 22 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,28 @@
"tags": ["aws", "location", "maps", "geospatial"],
"version": "1.0.0"
},
{
"category": "observability",
"description": "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis.",
"keywords": [
"aws",
"observability",
"cloudwatch",
"monitoring",
"logs",
"metrics",
"alarms",
"application-signals",
"apm",
"cloudtrail",
"security",
"tracing"
],
"name": "aws-observability",
"source": "./plugins/aws-observability",
"tags": ["aws", "observability", "monitoring", "cloudwatch"],
"version": "1.0.0"
},
{
"category": "development",
"description": "Design, build, deploy, test, and debug serverless applications with AWS Serverless services.",
Expand Down
25 changes: 25 additions & 0 deletions plugins/aws-observability/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"author": {
"name": "Amazon Web Services"
},
"description": "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis for complete monitoring, troubleshooting, and optimization.",
"homepage": "https://github.com/awslabs/agent-plugins",
"keywords": [
"aws",
"observability",
"cloudwatch",
"monitoring",
"logs",
"metrics",
"alarms",
"application-signals",
"apm",
"cloudtrail",
"security",
"tracing"
],
"license": "Apache-2.0",
"name": "aws-observability",
"repository": "https://github.com/awslabs/agent-plugins",
"version": "1.0.0"
}
52 changes: 52 additions & 0 deletions plugins/aws-observability/.mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"mcpServers": {
"awslabs.cloudwatch-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudwatch-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.cloudwatch-applicationsignals-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudwatch-applicationsignals-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.cloudtrail-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudtrail-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.billing-cost-management-mcp-server": {
"command": "uvx",
"args": [
"awslabs.billing-cost-management-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awsknowledge": {
"type": "http",
"url": "https://knowledge-mcp.global.api.aws"
}
}
}
88 changes: 88 additions & 0 deletions plugins/aws-observability/skills/aws-observability/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
name: aws-observability
description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, Billing & Cost Management, and automated codebase observability gap analysis. Triggers on phrases like: CloudWatch logs, metrics, alarms, monitoring, observability, application signals, APM, distributed tracing, performance, latency, errors, troubleshooting, root cause analysis, security audit, CloudTrail, log analysis, alerting, SLO, incident response, observability gaps, missing instrumentation, AWS costs, billing, cost anomaly."
---

# AWS Observability

Requires AWS CLI credentials. All stdio MCP servers use `AWS_PROFILE` and `AWS_REGION` from their env config (defaults: `default` profile, `us-east-1`).

## Capabilities

| Capability | MCP Server | Use When |
| --------------------------- | -------------------------------------------------- | -------------------------------------------------------- |
| CloudWatch Logs | `awslabs.cloudwatch-mcp-server` | Log queries, pattern detection, anomaly analysis |
| Metrics & Alarms | `awslabs.cloudwatch-mcp-server` | Metric data, alarm recommendations, trend analysis |
| Application Signals (APM) | `awslabs.cloudwatch-applicationsignals-mcp-server` | Service health, SLOs, distributed tracing, error budgets |
| CloudTrail Security | `awslabs.cloudtrail-mcp-server` | IAM changes, resource deletions, compliance audits |
| Billing & Cost Management | `awslabs.billing-cost-management-mcp-server` | Cost analysis, forecasting, Compute Optimizer, budgets |
| AWS Documentation | `awsknowledge` (HTTP) | Troubleshooting, best practices, API references |
| Codebase Observability Gaps | _(file analysis, no MCP)_ | Identify missing logging, metrics, tracing in code |

## Workflow Decision Tree

**User reports an incident or error?**
-> Load [Incident Response](references/incident-response.md). Start with `audit_services` wildcard, then correlate alarms + logs + traces + CloudTrail changes.

**User asks about logs or wants to query logs?**
-> Load [Log Analysis](references/log-analysis.md). Use `execute_log_insights_query`. Always include `| limit` in queries.

**User wants to set up or tune alarms?**
-> Load [Alerting Setup](references/alerting-setup.md). Use `get_recommended_metric_alarms` for best-practice thresholds.

**User asks about service performance, latency, or SLOs?**
-> Load [Performance Monitoring](references/performance-monitoring.md). Start with `audit_services`, then `search_transaction_spans` for 100% trace visibility.

**User needs security audit or compliance review?**
-> Load [Security Auditing](references/security-auditing.md). Follow data source priority: CloudTrail Lake > CloudWatch Logs > Lookup Events API.

**User wants to assess codebase observability?**
-> Load [Observability Gap Analysis](references/observability-gap-analysis.md). Analyze logging, metrics, tracing, error handling, health checks.

**User setting up Application Signals for the first time?**
-> Load [Application Signals Setup](references/application-signals-setup.md). Start with `get_enablement_guide`.

**CloudTrail data source priority reference** (loaded by security-auditing.md, not directly):
-> [CloudTrail Data Source Selection](references/cloudtrail-data-source-selection.md)

## Essential Log Query Patterns

### Error Search

```
fields @timestamp, @message, @logStream, level
| filter level = "ERROR"
| sort @timestamp desc
| limit 100
```

### Performance Analysis

```
stats count() as requestCount,
avg(duration) as avgDuration,
pct(duration, 95) as p95Duration,
pct(duration, 99) as p99Duration
by endpoint
| filter requestCount > 10
| sort p95Duration desc
| limit 100
```

### Error Rate Over Time

```
stats count() as total,
sum(statusCode >= 500) as errors,
(sum(statusCode >= 500) / count()) * 100 as errorRate
by bin(5m) as timeWindow
| sort timeWindow
```

## Key Tool Entry Points

- **Application Signals**: Start with `audit_services` using `[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]` for wildcard discovery
- **Logs**: Use `describe_log_groups` to discover groups, then `execute_log_insights_query`
- **Metrics**: Use Sum for count metrics, Average for utilization, percentiles for latency
- **CloudTrail**: Check Lake first (`list_event_data_stores`), fall back to CloudWatch Logs, then `lookup_events`
- **Costs**: Use `cost-explorer` tool for spend analysis, `compute-optimizer` for right-sizing
Loading