feat(addons): make the infra addon dashboards reach AMP — pod scrape annotations by stxkxs · Pull Request #61 · nanohype/eks-gitops

stxkxs · 2026-06-24T02:10:51Z

The hub's grafana-agent (Alloy) is annotation-gated — it scrapes only pods with prometheus.io/scrape="true" (plus the static KSM + cAdvisor targets); ServiceMonitor is inert in prod. None of these addon deltas carried a scrape annotation, so their community id-import dashboards were hollow in prod (metrics never reached AMP).

Adds the scrape config to every always-on infra addon that ships a dashboard. Each chart's mechanism + port was helm-template-verified (a wrong key/port silently re-hollows the board), catching real gotchas:

addon	key	port
karpenter	`podAnnotations`	8080
external-dns	`podAnnotations`	7979
external-secrets	`podAnnotations`	8080
tempo	`podAnnotations`	3200 (self)
loki	`singleBinary.podAnnotations`	3100 (self)
opencost	`opencost.podAnnotations`	9003
argo-rollouts	`controller.podAnnotations`	8090 (not 8080=healthz)
argo-events	`controller.podAnnotations`	7777 (not 8082=svc)
argo-workflows	`controller.metricsConfig.enabled` + `podAnnotations`	9090
cilium	`prometheus.enabled` + `operator.prometheus.enabled` + `hubble.relay.prometheus.enabled` (auto-stamps pods)	9962/9963/9966

No change needed: cert-manager + aws-load-balancer-controller already auto-emit the annotations by default — never the gap.

Not covered (tracked separately): the hubble L7 board reads hubble_http_* on :9965, which cilium only annotates on a headless Service — unreachable by pod-annotation scrape (a pod carries one prometheus.io/port, already :9962). Needs an Alloy service-discovery rule.

All 10 parse with correct nesting + yamllint clean.

…annotations The hub's grafana-agent (Alloy) is annotation-gated: it keeps only pods with prometheus.io/scrape="true" (honoring prometheus.io/port + /path), plus the static kube-state-metrics + cAdvisor targets. ServiceMonitor is inert in prod (no prometheus-operator). None of these addon deltas carried a scrape annotation, so their community id-import dashboards were hollow in prod — the metrics never reached Amazon Managed Prometheus. Adds the scrape config to every always-on infra addon that ships a dashboard. Each chart's exact mechanism + metrics port was helm-template-verified (a wrong key or port silently re-hollows the board): - karpenter podAnnotations :8080 (controller) - external-dns podAnnotations :7979 - external-secrets podAnnotations :8080 (main controller) - tempo podAnnotations :3200 (self-metrics) - loki singleBinary.podAnnotations :3100 (self-metrics) - opencost opencost.podAnnotations :9003 (subchart alias) - argo-rollouts controller.podAnnotations :8090 (NOT 8080 — that's healthz) - argo-events controller.podAnnotations :7777 (NOT the 8082 service port) - argo-workflows controller.metricsConfig.enabled + podAnnotations :9090 (the controller serves no metrics until metricsConfig.enabled) - cilium prometheus.enabled + operator.prometheus.enabled + hubble.relay.prometheus.enabled — the chart auto-stamps the agent/operator/relay pods (:9962/:9963/:9966) when these are on and serviceMonitor is off (which it is in prod) Two addons need no change — cert-manager and aws-load-balancer-controller already auto-emit the prometheus.io annotations by default (prometheus.enabled, serviceMonitor off), so their boards were never the gap. Not covered here: the hubble L7 overview reads hubble_http_* on :9965, which the cilium chart only annotates on a headless Service — unreachable by the pod-annotation scrape (a pod can carry only one prometheus.io/port, already used by the agent's :9962). That needs an Alloy service-discovery scrape rule, tracked separately.

github-actions · 2026-06-24T02:11:21Z

CI Results

Check	Status
YAML Lint	✅

Environment	Kustomize Build
dev	✅
staging	✅
production	✅
hub	✅

All validations passed.

stxkxs merged commit 9a9e915 into main Jun 24, 2026
8 checks passed

stxkxs deleted the addon-scrape-annotations branch June 24, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(addons): make the infra addon dashboards reach AMP — pod scrape annotations#61

feat(addons): make the infra addon dashboards reach AMP — pod scrape annotations#61
stxkxs merged 1 commit into
mainfrom
addon-scrape-annotations

stxkxs commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

stxkxs commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026

CI Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant