Skip to content

O-11: live Gatus status badges in repo README#47

Merged
alexrf45 merged 2 commits into
devfrom
sprint/o-11
May 28, 2026
Merged

O-11: live Gatus status badges in repo README#47
alexrf45 merged 2 commits into
devfrom
sprint/o-11

Conversation

@alexrf45

Copy link
Copy Markdown
Owner

Closes

O-11 (Gatus endpoint statuses in repo README) from the open-items punch list. Source: _docs/reviews/home-0ps-review-2026-05-27.md line 121.

What's in this PR

  • New ## Live status block in README.md with two sub-tables (### Applications, ### Infrastructure) surfacing all 6 Gatus-monitored endpoints together
  • Hybrid badge format: shields.io endpoint (style-consistent with the rest of the README's shields) for the Status column, direct Gatus SVG (/uptimes/7d/badge.svg) for the Uptime (7d) column — different signals, both useful
  • Removed the stale Syncthing row from the ## Applications table (spun down 2026-05-15)
  • Bonus: found + fixed a second stale Syncthing reference in .claude/rules/kube-wrapper.md (a k8sop dev stern -n syncthing example → swapped to freshrss)
  • ~5-minute GitHub camo cache caveat documented inline in the README's Live status section (not just the PR body) — visitors hitting stale badge state will see the explanation

Gatus key normalization confirmed live

curl -sf https://dev-status.home-0ps.com/api/v1/endpoints/statuses | jq -r '.[].key'
applications_authentik
applications_freshrss
applications_grafana
applications_homer
infrastructure_truenas
infrastructure_unifi

Rule: <group>_<lowercase(name)> — TrueNAS → truenas, UniFi → unifi.

Acceptance test

CI runs .claude/sprints/o-11/accept.sh via .github/workflows/sprint-accept.yml. Local pass:

[accept:O-11] README.md present
[accept:O-11] all 6 Gatus endpoint badge URLs present
[accept:O-11] dev-status.home-0ps.com references: 7
[accept:O-11] Syncthing reference removed
[accept:O-11] all 6 endpoint tokens surfaced in README copy
[accept:O-11] markdownlint not installed — skipping
[accept:O-11] PASS

Public-surface notes

  • Public hostname dev-status.home-0ps.com is Cloudflare-tunnel-exposed + rate-limited (60 req/min) per terraform/cloudflare-tunnel/main.tf G2 — no infra changes needed for this PR
  • GitHub camo proxies + caches the badge SVGs ~5min; badge state on the README lags live dashboard by that window. Acceptable for repo-visitor signal

Worktree

/tmp/sprints/o-11 (clean up after merge: git worktree remove /tmp/sprints/o-11 && git branch -D sprint/o-11)

Generated by

/sprint-orchestrate H-3 O-9 O-11 — wave 2 (parallel with O-9, which is still running in a separate worktree).

🤖 Generated with Claude Code

alexrf45 added 2 commits May 28, 2026 03:38
…(O-11)

Replaces the static "Active / Dev" Applications table with live status +
7d-uptime badges sourced from Gatus at dev-status.home-0ps.com (already
Cloudflare-tunnel-exposed). Adds an Infrastructure block for TrueNAS + UniFi
so all 6 endpoints monitored by Gatus surface on the repo landing page.

Drops the Syncthing row (spun down 2026-05-15) and updates a stale
`stern -n syncthing` example in the kubeop docs to a live namespace.

Note: GitHub camo caches external SVGs ~5min — README badges lag the live
dashboard by that window. Out of scope to fix.
@alexrf45 alexrf45 merged commit 290677a into dev May 28, 2026
1 check passed
alexrf45 added a commit that referenced this pull request May 28, 2026
## Closes

**O-11 (Gatus endpoint statuses in repo README)** from the open-items
punch list. Source:
[_docs/reviews/home-0ps-review-2026-05-27.md](../blob/dev/_docs/reviews/home-0ps-review-2026-05-27.md#observability-follow-ups)
line 121.

## What's in this PR

- New `## Live status` block in `README.md` with two sub-tables (`###
Applications`, `### Infrastructure`) surfacing all 6 Gatus-monitored
endpoints together
- **Hybrid badge format:** shields.io endpoint (style-consistent with
the rest of the README's shields) for the *Status* column, direct Gatus
SVG (`/uptimes/7d/badge.svg`) for the *Uptime (7d)* column — different
signals, both useful
- Removed the stale **Syncthing** row from the `## Applications` table
(spun down 2026-05-15)
- **Bonus:** found + fixed a second stale Syncthing reference in
`.claude/rules/kube-wrapper.md` (a `k8sop dev stern -n syncthing`
example → swapped to `freshrss`)
- ~5-minute GitHub camo cache caveat documented inline in the README's
Live status section (not just the PR body) — visitors hitting stale
badge state will see the explanation

## Gatus key normalization confirmed live

```
curl -sf https://dev-status.home-0ps.com/api/v1/endpoints/statuses | jq -r '.[].key'
applications_authentik
applications_freshrss
applications_grafana
applications_homer
infrastructure_truenas
infrastructure_unifi
```

Rule: `<group>_<lowercase(name)>` — TrueNAS → truenas, UniFi → unifi.

## Acceptance test

CI runs `.claude/sprints/o-11/accept.sh` via
`.github/workflows/sprint-accept.yml`. Local pass:

```text
[accept:O-11] README.md present
[accept:O-11] all 6 Gatus endpoint badge URLs present
[accept:O-11] dev-status.home-0ps.com references: 7
[accept:O-11] Syncthing reference removed
[accept:O-11] all 6 endpoint tokens surfaced in README copy
[accept:O-11] markdownlint not installed — skipping
[accept:O-11] PASS
```

## Public-surface notes

- Public hostname `dev-status.home-0ps.com` is Cloudflare-tunnel-exposed
+ rate-limited (60 req/min) per `terraform/cloudflare-tunnel/main.tf` G2
— no infra changes needed for this PR
- GitHub camo proxies + caches the badge SVGs ~5min; badge state on the
README lags live dashboard by that window. Acceptable for repo-visitor
signal

## Worktree

`/tmp/sprints/o-11` (clean up after merge: `git worktree remove
/tmp/sprints/o-11 && git branch -D sprint/o-11`)

## Generated by

`/sprint-orchestrate H-3 O-9 O-11` — wave 2 (parallel with O-9, which is
still running in a separate worktree).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
alexrf45 added a commit that referenced this pull request May 28, 2026
## Closes

**O-9 (App-level dashboards/alerts)** from the open-items punch list.
Source:
[_docs/reviews/home-0ps-review-2026-05-27.md](../blob/dev/_docs/reviews/home-0ps-review-2026-05-27.md#observability-follow-ups)
line 119.

## What's in this PR

- **Authentik HR:** enable `metrics.enabled: true` on both `server` and
`worker` (chart provisions the two metrics Services). **Chart-side
ServiceMonitor stays disabled** — ownership of ServiceMonitors is
centralised in `_lib/observability/kube-prometheus-stack/` (same pattern
as falco), so cardinality/relabel changes happen in one place.
- **`servicemonitor-authentik.yaml`** (new): cross-namespace selects
both `authentik-server-metrics` + `authentik-worker-metrics` Services
via one `matchExpressions` block. 60s scrape interval.
- **4 dashboard ConfigMaps** under
`_lib/observability/kube-prometheus-stack/dashboards/` —
sidecar-discovered via `grafana_dashboard: "1"` label:
- **authentik** — community grafana.com/14837 r2 (beryju), normalised
(stripped `__inputs`/`__requires`, replaced `${DS_PROMETHEUS}` with
Prometheus datasource, cleared `id`)
  - **cloudflared** — community dashboard, same normalisation
  - **gatus** — hand-authored — uptime & latency panels
  - **freshrss** — hand-authored — service-health panels
- **`.yamllint.yaml`** updated: ignore the 2 community-sourced dashboard
YAMLs (embedded markdown in panel `content:` blocks exceeds the 300-char
line cap; pinned to specific revisions per file header, not hand-edited)

## Scope discipline

**O-10 (postgres-exporter + DB-content panels) is a separate sprint.**
Dashboards in this PR show *operational* metrics only — request rates,
latency, pod health, scrape targets. Custom DB-content queries (freshrss
unread/favorites, authentik login stats) wait for the postgres-exporter
rollout in O-10.

## Acceptance test

CI runs `.claude/sprints/o-9/accept.sh` via
`.github/workflows/sprint-accept.yml`. Local pass:

```text
[accept:O-9] yamllint 8 files
[accept:O-9] kubectl kustomize _lib/observability/kube-prometheus-stack
[accept:O-9] assert: ServiceMonitor 'authentik' in obs render
[accept:O-9] assert: dashboard ConfigMaps present
[accept:O-9]   found: dashboard-authentik
[accept:O-9]   found: dashboard-cloudflared
[accept:O-9]   found: dashboard-gatus
[accept:O-9]   found: dashboard-freshrss
[accept:O-9] assert: every dashboard ConfigMap has label grafana_dashboard=1
[accept:O-9] assert: Authentik HR has server.metrics.enabled == true
[accept:O-9] assert: Authentik HR has worker.metrics.enabled == true
[accept:O-9] assert: Authentik HR server.metrics.serviceMonitor.enabled == false
[accept:O-9] assert: ServiceMonitor 'authentik' targets authentik ns
[accept:O-9] PASS
```

## Post-merge manual checks (not in accept.sh)

- `kube dev -n monitoring exec deploy/prometheus-kps-prometheus-0 --
wget -qO- localhost:9090/api/v1/targets | grep authentik` — authentik
metrics targets up (server + worker)
- Grafana sidebar: 4 new dashboards visible (Authentik, Cloudflared,
Gatus, FreshRSS)

## Background — subagent session-limit + recovery

This PR was driven by the `/sprint-orchestrate` parallel executor as
wave 2 alongside O-11. The O-9 subagent **completed the implementation**
but hit the global session limit before committing — the work was left
uncommitted in `/tmp/sprints/o-9`. The orchestrator (me) verified the
agent's edits (architecture, ServiceMonitor pattern, dashboard
provenance, yamllint workaround), committed them as a single logical
commit, rebased, ran the acceptance test, and opened this PR.

**Also during recovery:** discovered that PRs #46 (H-3) + #47 (O-11) had
been merged on GitHub but were missing from `origin/dev` — looks like a
force-push to dev clobbered the merge commits. Recovered both via `git
cherry-pick -m 1` of the GitHub merge commits (`9b6881f`, `290677a`)
onto current dev, then pushed `eb197cf` + `6f51280`. O-9 was rebased
onto the now-correct dev tip before this PR was opened.

## Worktree

`/tmp/sprints/o-9` (clean up after merge: `git worktree remove
/tmp/sprints/o-9 && git branch -D sprint/o-9`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@alexrf45 alexrf45 deleted the sprint/o-11 branch June 12, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant