O-9: Authentik metrics + ServiceMonitor + 4 app dashboards#48
Merged
Conversation
… dashboards
- Authentik HR: enable metrics.enabled on server + worker (chart provisions
the metrics Services). Keep chart-side ServiceMonitor disabled — ownership
centralised in _lib/observability/kube-prometheus-stack/ (same pattern as
falco), so cardinality/relabel changes happen in one place.
- New servicemonitor-authentik.yaml: cross-namespace selects authentik-server-
metrics + authentik-worker-metrics via matchExpressions in one SM.
- New dashboards/ dir with 4 ConfigMaps (sidecar-discovered via
grafana_dashboard: "1" label):
- authentik: community grafana.com/14837 r2 (beryju), normalised
- cloudflared: community grafana.com dashboard, normalised
- gatus: hand-authored — uptime & latency panels
- freshrss: hand-authored — service health panels
- yamllint: ignore the 2 community-sourced dashboard YAMLs (embedded markdown
in panel `content:` blocks exceeds the 300-char line cap; pinned to specific
revisions per file header, not hand-edited).
Scope discipline: O-10 (postgres-exporter + DB-content panels) is a separate
sprint. Dashboards in this PR show operational metrics only.
Post-merge manual check: Authentik metrics targets up in Prometheus, all 4
dashboards visible in Grafana sidebar.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes
O-9 (App-level dashboards/alerts) from the open-items punch list. Source: _docs/reviews/home-0ps-review-2026-05-27.md line 119.
What's in this PR
metrics.enabled: trueon bothserverandworker(chart provisions the two metrics Services). Chart-side ServiceMonitor stays disabled — ownership of ServiceMonitors is centralised in_lib/observability/kube-prometheus-stack/(same pattern as falco), so cardinality/relabel changes happen in one place.servicemonitor-authentik.yaml(new): cross-namespace selects bothauthentik-server-metrics+authentik-worker-metricsServices via onematchExpressionsblock. 60s scrape interval._lib/observability/kube-prometheus-stack/dashboards/— sidecar-discovered viagrafana_dashboard: "1"label:__inputs/__requires, replaced${DS_PROMETHEUS}with Prometheus datasource, clearedid).yamllint.yamlupdated: ignore the 2 community-sourced dashboard YAMLs (embedded markdown in panelcontent:blocks exceeds the 300-char line cap; pinned to specific revisions per file header, not hand-edited)Scope discipline
O-10 (postgres-exporter + DB-content panels) is a separate sprint. Dashboards in this PR show operational metrics only — request rates, latency, pod health, scrape targets. Custom DB-content queries (freshrss unread/favorites, authentik login stats) wait for the postgres-exporter rollout in O-10.
Acceptance test
CI runs
.claude/sprints/o-9/accept.shvia.github/workflows/sprint-accept.yml. Local pass:Post-merge manual checks (not in accept.sh)
kube dev -n monitoring exec deploy/prometheus-kps-prometheus-0 -- wget -qO- localhost:9090/api/v1/targets | grep authentik— authentik metrics targets up (server + worker)Background — subagent session-limit + recovery
This PR was driven by the
/sprint-orchestrateparallel executor as wave 2 alongside O-11. The O-9 subagent completed the implementation but hit the global session limit before committing — the work was left uncommitted in/tmp/sprints/o-9. The orchestrator (me) verified the agent's edits (architecture, ServiceMonitor pattern, dashboard provenance, yamllint workaround), committed them as a single logical commit, rebased, ran the acceptance test, and opened this PR.Also during recovery: discovered that PRs #46 (H-3) + #47 (O-11) had been merged on GitHub but were missing from
origin/dev— looks like a force-push to dev clobbered the merge commits. Recovered both viagit cherry-pick -m 1of the GitHub merge commits (9b6881f,290677a) onto current dev, then pushedeb197cf+6f51280. O-9 was rebased onto the now-correct dev tip before this PR was opened.Worktree
/tmp/sprints/o-9(clean up after merge:git worktree remove /tmp/sprints/o-9 && git branch -D sprint/o-9)🤖 Generated with Claude Code