feat(observability): GrafanaDashboard CR + SLO row, export to grafana-agent by stxkxs · Pull Request #8 · nanohype/competitive-intelligence

stxkxs · 2026-06-24T04:34:44Z

Brings the tenant's observability to its correct prod shape (grafana-operator + grafana-agent → AMP/AMG, not a kube-prometheus sidecar or cluster otel-collector).

Dashboard → GrafanaDashboard CR. The vendored tenant-chart-base helper now emits the CR (instanceSelector dashboards: external) the grafana-operator reconciles onto Amazon Managed Grafana. The vendored library is synced to the canonical nanohype skeleton (adds _servicemonitor.tpl; _helpers gains commonLabels for the tagging-governance labels).
SLO row. The board leads with crawl-availability — availability (30d), error-budget remaining, burn rate — inline over the real competitive_intelligence_crawl_sources_total{outcome} counter, self-contained (no ruler needed).
Telemetry path. OTLP → grafana-agent.monitoring.svc:4318 (forwards traces→Tempo, metrics→AMP, logs→Loki); NetworkPolicy egress opened to the monitoring namespace.
Docs. CLAUDE / ARCHITECTURE / RUNBOOK / chart README / Dockerfile / metrics.ts now describe AMP/Tempo/Loki, not Grafana Cloud / Mimir / cluster otel-collector.

Validated: helm template emits the CR (no stale ConfigMap) with OTLP→grafana-agent; dashboard JSON parses; helm lint clean. The Loki logs row is a follow-up (the Alloy stream-label selector needs live-cluster verification).

Closes #7.

…O row, export to grafana-agent Brings the tenant's observability to its correct prod shape — the org runs grafana-operator + grafana-agent (Alloy) → Amazon Managed Prometheus/Grafana, not a kube-prometheus-stack sidecar or a cluster otel-collector. - Dashboard delivery: the vendored tenant-chart-base helper now emits the GrafanaDashboard CR (instanceSelector dashboards: external) the grafana-operator reconciles onto Amazon Managed Grafana — the portable path that works on both EKS and the local kx cluster. The whole vendored library is synced to the canonical nanohype skeleton (adds _servicemonitor.tpl; _helpers gains commonLabels merging for the tagging-governance labels). - SLO row: the board leads with a crawl-availability SLO row — availability (30d), error-budget remaining, and burn rate — inline over the real good/bad counter competitive_intelligence_crawl_sources_total{outcome}, self-contained so it renders against AMP with no recording-rule ruler. - Telemetry path: OTLP exports to the grafana-agent OTLP receiver (grafana-agent.monitoring.svc:4318), which forwards traces → Tempo, metrics → AMP, logs → Loki; the NetworkPolicy allows egress to the monitoring namespace on 4318. - Docs: CLAUDE.md / ARCHITECTURE / RUNBOOK / chart README / Dockerfile / metrics.ts describe the real backends (AMP/Tempo/Loki via grafana-agent), not Grafana Cloud / Mimir / a cluster otel-collector. The Loki logs row is a follow-up — the exact Alloy stream-label selector needs verifying on a live cluster. Closes #7.

stxkxs merged commit 15873a5 into main Jun 24, 2026
9 checks passed

stxkxs deleted the o11y-prod-shape branch June 24, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): GrafanaDashboard CR + SLO row, export to grafana-agent#8

feat(observability): GrafanaDashboard CR + SLO row, export to grafana-agent#8
stxkxs merged 1 commit into
mainfrom
o11y-prod-shape

stxkxs commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stxkxs commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant