From 50c87fa436e0137f7216ee8061d76cd7883136e5 Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 09:59:42 -0400 Subject: [PATCH 1/6] feat(envoy-gateway): bump civic-cloud v1.9.2 + add Gateway API foundation (phase 1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bumps civic-cloud blueprint v1.7.7 → v1.9.2 which brings: - cert-manager 1.13.3 → 1.20.2 (Gateway API integration + ListenerSets gate) - Gateway API v1.5.1 CRDs (standard channel) - Envoy Gateway v1.7.3 controller (installs to envoy-gateway-system) - hairpin-proxy removed (Linode LKE now supports LB hairpin natively) - Server-side apply for CRDs in deploy workflow Adds _infra/envoy-gateway/ with the three foundation resources copied from sandbox: GatewayClass `eg` references EnvoyProxy `shared` with mergeGateways enabled (single LB for all Gateways), main-gateway has an HTTP catchall listener used by both cert-manager solver routes and the global HTTP→HTTPS redirect (added in phase 3.5). Traffic still flows through ingress-nginx after this deploys — phase 1 is foundation only. Refs: #144 --- .holo/sources/civic-cloud.toml | 2 +- _infra/envoy-gateway/envoyproxy.yaml | 10 ++++++++++ _infra/envoy-gateway/gatewayclass.yaml | 11 +++++++++++ _infra/envoy-gateway/main-gateway.yaml | 17 +++++++++++++++++ 4 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 _infra/envoy-gateway/envoyproxy.yaml create mode 100644 _infra/envoy-gateway/gatewayclass.yaml create mode 100644 _infra/envoy-gateway/main-gateway.yaml diff --git a/.holo/sources/civic-cloud.toml b/.holo/sources/civic-cloud.toml index 94090ea..854bb1e 100644 --- a/.holo/sources/civic-cloud.toml +++ b/.holo/sources/civic-cloud.toml @@ -1,3 +1,3 @@ [holosource] url = "https://github.com/CodeForPhilly/civic-cloud" -ref = "refs/tags/v1.7.7" +ref = "refs/tags/v1.9.2" diff --git a/_infra/envoy-gateway/envoyproxy.yaml b/_infra/envoy-gateway/envoyproxy.yaml new file mode 100644 index 0000000..8125aca --- /dev/null +++ b/_infra/envoy-gateway/envoyproxy.yaml @@ -0,0 +1,10 @@ +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: EnvoyProxy +metadata: + name: shared + namespace: envoy-gateway-system +spec: + # Collapse all Gateway resources using the `eg` class into one Envoy + # data-plane Deployment + one LoadBalancer. Keeps cost flat regardless + # of how many Gateway resources exist on the cluster. + mergeGateways: true diff --git a/_infra/envoy-gateway/gatewayclass.yaml b/_infra/envoy-gateway/gatewayclass.yaml new file mode 100644 index 0000000..62262b7 --- /dev/null +++ b/_infra/envoy-gateway/gatewayclass.yaml @@ -0,0 +1,11 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: GatewayClass +metadata: + name: eg +spec: + controllerName: gateway.envoyproxy.io/gatewayclass-controller + parametersRef: + group: gateway.envoyproxy.io + kind: EnvoyProxy + name: shared + namespace: envoy-gateway-system diff --git a/_infra/envoy-gateway/main-gateway.yaml b/_infra/envoy-gateway/main-gateway.yaml new file mode 100644 index 0000000..7f7acc8 --- /dev/null +++ b/_infra/envoy-gateway/main-gateway.yaml @@ -0,0 +1,17 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: main-gateway + namespace: envoy-gateway-system +spec: + gatewayClassName: eg + listeners: + - name: http + protocol: HTTP + port: 80 + # No hostname → matches anything. Per-app HTTPRoutes scope to their + # own hostnames; the listener itself stays open so we don't have to + # add a Gateway listener every time a project ships a new HTTPRoute. + allowedRoutes: + namespaces: + from: All From 0db3585898cc9bbed74dcb7e18c0098dd6afe172 Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 10:00:11 -0400 Subject: [PATCH 2/6] feat(cert-manager): add parallel gateway ClusterIssuers (phase 2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds `letsencrypt-prod-gateway` and `letsencrypt-staging-gateway` ClusterIssuers using cert-manager 1.20's gatewayHTTPRoute solver against main-gateway. The existing nginx-solver issuers in cert-manager.issuers.yaml stay untouched so existing Ingress-managed Certs continue to renew normally — clean separation between the two paths until each app cuts over. Lesson from sandbox: mutating the existing solver in place couples Ingress and Gateway renewal behavior in a way that's hard to reason about and hard to revert. Parallel is the safer pattern. Refs: #144 --- _infra/cert-manager/issuers-gateway.yaml | 43 ++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 _infra/cert-manager/issuers-gateway.yaml diff --git a/_infra/cert-manager/issuers-gateway.yaml b/_infra/cert-manager/issuers-gateway.yaml new file mode 100644 index 0000000..c6fc873 --- /dev/null +++ b/_infra/cert-manager/issuers-gateway.yaml @@ -0,0 +1,43 @@ +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: letsencrypt-staging-gateway +spec: + acme: + email: services@codeforphilly.org + server: https://acme-staging-v02.api.letsencrypt.org/directory + privateKeySecretRef: + name: letsencrypt-staging-gateway + solvers: + # cert-manager 1.20 gatewayHTTPRoute solver. cert-manager creates a + # short-lived HTTPRoute attached to main-gateway for each challenge, + # routing /.well-known/acme-challenge/* on the cert's hostname to its + # solver Pod. Parallel to the existing nginx-based issuers in + # cert-manager.issuers.yaml (root) — those keep serving existing + # Ingress-managed Certs until each app cuts over to its Gateway. + - http01: + gatewayHTTPRoute: + parentRefs: + - group: gateway.networking.k8s.io + kind: Gateway + name: main-gateway + namespace: envoy-gateway-system +--- +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: letsencrypt-prod-gateway +spec: + acme: + email: services@codeforphilly.org + server: https://acme-v02.api.letsencrypt.org/directory + privateKeySecretRef: + name: letsencrypt-prod-gateway + solvers: + - http01: + gatewayHTTPRoute: + parentRefs: + - group: gateway.networking.k8s.io + kind: Gateway + name: main-gateway + namespace: envoy-gateway-system From 56a2240ec53f6ebcb3386d4637fc71bb1dbdea8e Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 10:02:24 -0400 Subject: [PATCH 3/6] feat(gateways): pre-populate per-app Gateway + HTTPRoute (phase 3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit One file per app in _gateways/ — Gateway with per-hostname HTTPS listeners (each with its own cert-manager-managed cert via the letsencrypt-prod-gateway ClusterIssuer added in phase 2) plus a single HTTPRoute matching all the app's hostnames and routing to its backend Service. Cert Secret naming uses `-gw-tls` suffix to avoid collision with existing Ingress-managed `-tls` Certs — both coexist until each app's Ingress is removed (phase 5). Per-app HTTPRoutes attach only to the per-app HTTPS Gateway; HTTP→HTTPS redirect is handled globally on main-gateway (phase 3.5), not per-app. Apex domains (balancerproject.org, choosenativeplants.com, codeforphilly.org, penn-chime.phl.io, vaultwarden.phl.io, bitwarden.phl.io) will not issue certs until their DNS cuts over to Envoy — HTTP-01 challenge needs to reach Envoy. Plan DNS cutover + cert issuance together for each apex. For initial verification per app, the letsencrypt-prod-gateway annotation can be swapped to letsencrypt-staging-gateway to avoid Let's Encrypt rate limits during smoke testing — then flipped back to prod. Refs: #144 --- _gateways/balancer.yaml | 36 +++++++++++++++++ _gateways/browserless-chrome.yaml | 36 +++++++++++++++++ _gateways/chime.yaml | 48 +++++++++++++++++++++++ _gateways/choose-native-plants.yaml | 60 +++++++++++++++++++++++++++++ _gateways/code-for-philly.yaml | 60 +++++++++++++++++++++++++++++ _gateways/echo-http.yaml | 36 +++++++++++++++++ _gateways/grafana.yaml | 36 +++++++++++++++++ _gateways/sealed-secrets.yaml | 36 +++++++++++++++++ _gateways/third-places.yaml | 36 +++++++++++++++++ _gateways/vaultwarden.yaml | 48 +++++++++++++++++++++++ 10 files changed, 432 insertions(+) create mode 100644 _gateways/balancer.yaml create mode 100644 _gateways/browserless-chrome.yaml create mode 100644 _gateways/chime.yaml create mode 100644 _gateways/choose-native-plants.yaml create mode 100644 _gateways/code-for-philly.yaml create mode 100644 _gateways/echo-http.yaml create mode 100644 _gateways/grafana.yaml create mode 100644 _gateways/sealed-secrets.yaml create mode 100644 _gateways/third-places.yaml create mode 100644 _gateways/vaultwarden.yaml diff --git a/_gateways/balancer.yaml b/_gateways/balancer.yaml new file mode 100644 index 0000000..94a6cfe --- /dev/null +++ b/_gateways/balancer.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: balancer + namespace: balancer + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: balancerproject.org + tls: + mode: Terminate + certificateRefs: + - name: balancer-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: balancer + namespace: balancer +spec: + parentRefs: + - name: balancer + hostnames: + - balancerproject.org + rules: + - backendRefs: + - name: balancer + port: 8000 diff --git a/_gateways/browserless-chrome.yaml b/_gateways/browserless-chrome.yaml new file mode 100644 index 0000000..2e7c58c --- /dev/null +++ b/_gateways/browserless-chrome.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: browserless-chrome + namespace: browserless-chrome + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: browserless-chrome.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: browserless-chrome-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: browserless-chrome + namespace: browserless-chrome +spec: + parentRefs: + - name: browserless-chrome + hostnames: + - browserless-chrome.live.k8s.phl.io + rules: + - backendRefs: + - name: browserless-chrome + port: 80 diff --git a/_gateways/chime.yaml b/_gateways/chime.yaml new file mode 100644 index 0000000..d039d05 --- /dev/null +++ b/_gateways/chime.yaml @@ -0,0 +1,48 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: chime + namespace: chime + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https-apex + protocol: HTTPS + port: 443 + hostname: penn-chime.phl.io + tls: + mode: Terminate + certificateRefs: + - name: penn-chime-phl-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-subdomain + protocol: HTTPS + port: 443 + hostname: penn-chime.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: penn-chime-live-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: chime + namespace: chime +spec: + parentRefs: + - name: chime + hostnames: + - penn-chime.phl.io + - penn-chime.live.k8s.phl.io + rules: + - backendRefs: + - name: chime + port: 80 diff --git a/_gateways/choose-native-plants.yaml b/_gateways/choose-native-plants.yaml new file mode 100644 index 0000000..a7d3b1d --- /dev/null +++ b/_gateways/choose-native-plants.yaml @@ -0,0 +1,60 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: choose-native-plants + namespace: choose-native-plants + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https-subdomain + protocol: HTTPS + port: 443 + hostname: choose-native-plants.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: choose-native-plants-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-apex + protocol: HTTPS + port: 443 + hostname: choosenativeplants.com + tls: + mode: Terminate + certificateRefs: + - name: choosenativeplants-com-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-www + protocol: HTTPS + port: 443 + hostname: www.choosenativeplants.com + tls: + mode: Terminate + certificateRefs: + - name: www-choosenativeplants-com-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: choose-native-plants + namespace: choose-native-plants +spec: + parentRefs: + - name: choose-native-plants + hostnames: + - choose-native-plants.live.k8s.phl.io + - choosenativeplants.com + - www.choosenativeplants.com + rules: + - backendRefs: + - name: choose-native-plants + port: 80 diff --git a/_gateways/code-for-philly.yaml b/_gateways/code-for-philly.yaml new file mode 100644 index 0000000..70583fd --- /dev/null +++ b/_gateways/code-for-philly.yaml @@ -0,0 +1,60 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: code-for-philly + namespace: code-for-philly + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https-apex + protocol: HTTPS + port: 443 + hostname: codeforphilly.org + tls: + mode: Terminate + certificateRefs: + - name: codeforphilly-org-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-www + protocol: HTTPS + port: 443 + hostname: www.codeforphilly.org + tls: + mode: Terminate + certificateRefs: + - name: www-codeforphilly-org-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-subdomain + protocol: HTTPS + port: 443 + hostname: codeforphilly.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: codeforphilly-live-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: code-for-philly + namespace: code-for-philly +spec: + parentRefs: + - name: code-for-philly + hostnames: + - codeforphilly.org + - www.codeforphilly.org + - codeforphilly.live.k8s.phl.io + rules: + - backendRefs: + - name: code-for-philly-site + port: 80 diff --git a/_gateways/echo-http.yaml b/_gateways/echo-http.yaml new file mode 100644 index 0000000..c26afad --- /dev/null +++ b/_gateways/echo-http.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: echo-http + namespace: echo-http + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: echo-http.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: echo-http-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: echo-http + namespace: echo-http +spec: + parentRefs: + - name: echo-http + hostnames: + - echo-http.live.k8s.phl.io + rules: + - backendRefs: + - name: echo-http + port: 80 diff --git a/_gateways/grafana.yaml b/_gateways/grafana.yaml new file mode 100644 index 0000000..55bceb3 --- /dev/null +++ b/_gateways/grafana.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: grafana + namespace: grafana + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: metrics.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: grafana-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: grafana + namespace: grafana +spec: + parentRefs: + - name: grafana + hostnames: + - metrics.live.k8s.phl.io + rules: + - backendRefs: + - name: grafana + port: 80 diff --git a/_gateways/sealed-secrets.yaml b/_gateways/sealed-secrets.yaml new file mode 100644 index 0000000..5b30b5e --- /dev/null +++ b/_gateways/sealed-secrets.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: sealed-secrets + namespace: sealed-secrets + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: sealed-secrets.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: sealed-secrets-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: sealed-secrets + namespace: sealed-secrets +spec: + parentRefs: + - name: sealed-secrets + hostnames: + - sealed-secrets.live.k8s.phl.io + rules: + - backendRefs: + - name: sealed-secrets + port: 8080 diff --git a/_gateways/third-places.yaml b/_gateways/third-places.yaml new file mode 100644 index 0000000..f2fe999 --- /dev/null +++ b/_gateways/third-places.yaml @@ -0,0 +1,36 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: third-places + namespace: third-places + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https + protocol: HTTPS + port: 443 + hostname: third-places.live.k8s.phl.io + tls: + mode: Terminate + certificateRefs: + - name: third-places-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: third-places + namespace: third-places +spec: + parentRefs: + - name: third-places + hostnames: + - third-places.live.k8s.phl.io + rules: + - backendRefs: + - name: third-places + port: 80 diff --git a/_gateways/vaultwarden.yaml b/_gateways/vaultwarden.yaml new file mode 100644 index 0000000..e62ad6b --- /dev/null +++ b/_gateways/vaultwarden.yaml @@ -0,0 +1,48 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: vaultwarden + namespace: vaultwarden + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod-gateway +spec: + gatewayClassName: eg + listeners: + - name: https-vaultwarden + protocol: HTTPS + port: 443 + hostname: vaultwarden.phl.io + tls: + mode: Terminate + certificateRefs: + - name: vaultwarden-phl-gw-tls + allowedRoutes: + namespaces: + from: Same + - name: https-bitwarden + protocol: HTTPS + port: 443 + hostname: bitwarden.phl.io + tls: + mode: Terminate + certificateRefs: + - name: bitwarden-phl-gw-tls + allowedRoutes: + namespaces: + from: Same +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: vaultwarden + namespace: vaultwarden +spec: + parentRefs: + - name: vaultwarden + hostnames: + - vaultwarden.phl.io + - bitwarden.phl.io + rules: + - backendRefs: + - name: vaultwarden + port: 80 From 0385d6b2f44a35b992856d8613f6c1a8abbf6e59 Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 10:02:45 -0400 Subject: [PATCH 4/6] =?UTF-8?q?feat(envoy-gateway):=20add=20global=20HTTP?= =?UTF-8?q?=E2=86=92HTTPS=20redirect=20HTTPRoute=20(phase=203.5)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Single HTTPRoute on main-gateway with a RequestRedirect filter (no hostnames, no path → matches everything that hits the HTTP listener). ACME challenges bypass via Gateway API conflict resolution — cert-manager's solver HTTPRoute carries both a hostname filter and pathType: Exact on /.well-known/acme-challenge/, both more specific. Safe to deploy any time after phase 2 — doesn't depend on per-app Gateways being ready. Once DNS cuts over per host, HTTP requests to that host get a 301 to HTTPS instead of falling through to ingress-nginx. Refs: #144 --- _infra/envoy-gateway/http-redirect.yaml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 _infra/envoy-gateway/http-redirect.yaml diff --git a/_infra/envoy-gateway/http-redirect.yaml b/_infra/envoy-gateway/http-redirect.yaml new file mode 100644 index 0000000..934985e --- /dev/null +++ b/_infra/envoy-gateway/http-redirect.yaml @@ -0,0 +1,19 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: http-redirect + namespace: envoy-gateway-system +spec: + parentRefs: + - name: main-gateway + # No hostnames → matches anything reaching main-gateway's HTTP listener. + # cert-manager's per-challenge HTTPRoute uses pathType: Exact on + # /.well-known/acme-challenge/ and a hostname filter — both more + # specific than this rule — so ACME validation traffic bypasses the + # redirect and reaches the solver Pod. + rules: + - filters: + - type: RequestRedirect + requestRedirect: + scheme: https + statusCode: 301 From a9a1ab644edf9233ccae17480b8b93741ae668f9 Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 10:53:26 -0400 Subject: [PATCH 5/6] refactor: prefix non-workloads with _ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adopts the convention sandbox settled on: top-level directories under the workspace root use the `_` prefix when they hold infrastructure / glue / admin manifests that aren't tied to a single workload. Workloads stay bare. Renames: admins/ → _admins/ docs/ → _docs/ Updates `.holo/branches/docs-site/{_,docs/}_cfp-live-cluster.toml` to read from `_docs/`, and the k8s-manifests exclude to skip `_docs/**`. The docs-site branch still publishes `docs/` at root — only the workspace source path moved. Already on the `_` convention: `_infra/`, `_gateways/` (added in the in-flight Envoy Gateway migration on this branch). Refs: cfp-sandbox-cluster@d7af5bd8 + @4763b70e --- .holo/branches/docs-site/_cfp-live-cluster.toml | 2 +- .holo/branches/docs-site/docs/_cfp-live-cluster.toml | 2 +- .holo/branches/k8s-manifests/_cfp-live-cluster.toml | 2 +- {admins => _admins}/choose-native-plants.yaml | 0 {docs => _docs}/echo-service.md | 0 {docs => _docs}/mkdocs.site.yml | 0 6 files changed, 3 insertions(+), 3 deletions(-) rename {admins => _admins}/choose-native-plants.yaml (100%) rename {docs => _docs}/echo-service.md (100%) rename {docs => _docs}/mkdocs.site.yml (100%) diff --git a/.holo/branches/docs-site/_cfp-live-cluster.toml b/.holo/branches/docs-site/_cfp-live-cluster.toml index 419e532..76bc95d 100644 --- a/.holo/branches/docs-site/_cfp-live-cluster.toml +++ b/.holo/branches/docs-site/_cfp-live-cluster.toml @@ -1,5 +1,5 @@ [holomapping] -root = "docs" +root = "_docs" files = [ "mkdocs.*.yml" ] diff --git a/.holo/branches/docs-site/docs/_cfp-live-cluster.toml b/.holo/branches/docs-site/docs/_cfp-live-cluster.toml index 9ae1579..1d11e74 100644 --- a/.holo/branches/docs-site/docs/_cfp-live-cluster.toml +++ b/.holo/branches/docs-site/docs/_cfp-live-cluster.toml @@ -1,5 +1,5 @@ [holomapping] -root = "docs" +root = "_docs" files = [ "**", "!mkdocs.*.yml", diff --git a/.holo/branches/k8s-manifests/_cfp-live-cluster.toml b/.holo/branches/k8s-manifests/_cfp-live-cluster.toml index e51b0eb..4547a9f 100644 --- a/.holo/branches/k8s-manifests/_cfp-live-cluster.toml +++ b/.holo/branches/k8s-manifests/_cfp-live-cluster.toml @@ -2,7 +2,7 @@ files = [ "**", "!.github/**", - "!docs/**", + "!_docs/**", "!mkdocs.*.yml", "!README.md" ] diff --git a/admins/choose-native-plants.yaml b/_admins/choose-native-plants.yaml similarity index 100% rename from admins/choose-native-plants.yaml rename to _admins/choose-native-plants.yaml diff --git a/docs/echo-service.md b/_docs/echo-service.md similarity index 100% rename from docs/echo-service.md rename to _docs/echo-service.md diff --git a/docs/mkdocs.site.yml b/_docs/mkdocs.site.yml similarity index 100% rename from docs/mkdocs.site.yml rename to _docs/mkdocs.site.yml From 311f397005efa2e527d9a6055b841c13c90a1443 Mon Sep 17 00:00:00 2001 From: Chris Alfano Date: Mon, 18 May 2026 10:55:01 -0400 Subject: [PATCH 6/6] docs(claude): add repo-local agent instructions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adapted from cfp-sandbox-cluster@fadcf31c. Same structure (projection model, required local-diff QA, guardrails) but rewritten for live's situation: - Migration is in flight (#144), not complete — sandbox is the source for patterns, live trails it - Parallel ClusterIssuers `letsencrypt-{prod,staging}-gateway` coexist with the legacy nginx-solver `letsencrypt-{prod,staging}` at the repo root - Wildcard DNS is `*.live.k8s.phl.io` not `*.sandbox.k8s.phl.io` - Apex domains documented (balancerproject.org, codeforphilly.org, etc.) + the ACME-DNS-cutover dependency for them - No cnpg / shared-cluster — per-app PostgreSQL StatefulSets where needed - ingress-nginx + hairpin-proxy noted as currently-present, scheduled for removal in #144 Refs: cfp-sandbox-cluster@fadcf31c --- .claude/CLAUDE.md | 170 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 .claude/CLAUDE.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 0000000..622578a --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,170 @@ +# cfp-live-cluster — agent instructions + +GitOps repo for the CodeForPhilly **live** (production) Kubernetes cluster on Linode LKE. Source-projected via hologit. Don't edit deployed branches directly — change the workspace, project, push the result. + +The Envoy Gateway migration tracked in [#144](https://github.com/CodeForPhilly/cfp-live-cluster/issues/144) is **in flight** — sandbox completed its equivalent migration in May 2026 and this repo is following its lead. Several patterns documented below describe the target post-migration state; current `main` may still be on the ingress-nginx path until that work lands. + +## Source pipeline + +``` +JarvusInnovations/cluster-template + └─ civic-cloud/cluster-template + └─ this repo (cfp-live-cluster) + └─ projected branches → live cluster +``` + +This repo pulls civic-cloud via `.holo/sources/civic-cloud.toml`. + +To refresh a holosource: `git holo source fetch `. **Never** `git fetch ` — that auto-pulls upstream tags into local `refs/tags/` and pollutes the tag namespace. + +## How projection works + +- **Workspace files** are what humans edit. `.holo/branches//` configs map workspace paths to source content. +- `git holo project ` runs the pipeline and prints a tree SHA on the last stdout line. Inspect with `git ls-tree -r ` or `git diff `. +- Two branches matter: + - `k8s-manifests` — manifests only + - `k8s-manifests-github` — manifests + GitHub Actions workflows (overlays on top of `k8s-manifests`) +- Deploy lifecycle: push to `main` → `Build k8s-manifests` workflow → `releases/k8s-manifests` → Deploy PR auto-opens → merge → `deploys/k8s-manifests` → `K8s: Deploy k8s-manifests` workflow → `kubectl apply` to cluster. +- The deploy workflow's "Apply manifests: deleted resources" step removes anything that disappears from the projection. Drop a file from the workspace → resource deleted on next deploy. + +### Lenses + +`.holo/lenses/.toml` describes per-source transformations: + +- **helm3** — renders a chart against the app's `release-values.yaml` +- **kustomize** — builds a kustomization +- **k8s-normalize** — routes flat manifests into the `//.yaml` layout + +Cluster-scoped resources land in `_//.yaml`. + +## Directory map + +| Path | Purpose | +|---|---| +| `_infra/` | Cluster-level infra (cert-manager issuers, envoy-gateway config) | +| `_gateways/` | Per-app Gateway + HTTPRoute pairs, one file per app | +| `_admins/`, `_docs/` | Admin RBAC + docs source (workload dirs stay bare) | +| `/` | App workspace — `release-values.yaml` for helm, `kustomization.yaml` for kustomize | +| `.secrets/` | SealedSecrets for that namespace | +| `cert-manager.issuers.yaml` | Legacy nginx-solver ClusterIssuers at repo root — kept parallel to `_infra/cert-manager/issuers-gateway.yaml` until per-app cutover finishes (see [#144](https://github.com/CodeForPhilly/cfp-live-cluster/issues/144)) | +| `echo-http.yaml` | Raw Namespace+Deployment+Service+Ingress for the echo-http probe | +| `.holo/sources/` | Holosource pins (URL + ref) | +| `.holo/branches//` | Holomappings — source content → workspace path | +| `.holo/lenses/` | Lens configs | + +`_` prefix means "not a workload namespace." Workspace convention; projected tree drops it. `_infra/` and `_gateways/` were added in the in-flight migration; `_admins/` and `_docs/` followed the convention in a separate refactor. + +Workload roots currently in tree: `balancer/` (kustomize), `browserless-chrome/` (raw yaml), `chime/`, `choose-native-plants/`, `code-for-philly/`, `grafana/`, `sealed-secrets/`, `third-places/`, `vaultwarden/` (all helm). + +## Standing patterns + +Target post-Envoy-migration state. Mimic these for new work. + +### Per-app routing + +Each public-facing app gets `_gateways/.yaml` containing: + +- `Gateway` in the app's namespace with one HTTPS listener per hostname, `cert-manager.io/cluster-issuer: letsencrypt-prod-gateway` annotation, certificateRef to a `-gw-tls` Secret per listener +- `HTTPRoute` with `parentRefs` attached **only** to the per-app Gateway (no `main-gateway`) + +HTTP (port 80) is handled globally by `_infra/envoy-gateway/http-redirect.yaml` — a single `HTTPRoute` on `main-gateway` that 301s everything to HTTPS. ACME challenge paths bypass it via Gateway API conflict resolution (cert-manager creates an `Exact`-path HTTPRoute per challenge). + +### Cert Secret naming + +`-gw-tls` (or `-gw-tls` for multi-hostname apps where each listener gets its own cert) — distinct from legacy `-tls` so the two paths coexist during migration. + +### ClusterIssuers — parallel during migration + +Two pairs exist side by side until the migration finishes: + +- `letsencrypt-{prod,staging}` (nginx-solver) in `cert-manager.issuers.yaml` at the repo root — keeps existing Ingress-managed Certs renewing +- `letsencrypt-{prod,staging}-gateway` (gatewayHTTPRoute solver) in `_infra/cert-manager/issuers-gateway.yaml` — for new Gateway-issued Certs + +New Gateway resources should annotate `letsencrypt-prod-gateway`. The nginx-solver pair gets removed in phase 6 of #144, after all Ingresses are gone. + +### Envoy Gateway + +- `EnvoyProxy` resource has `mergeGateways: true` — every Gateway shares one Envoy data plane and one LoadBalancer. **Do not disable.** +- `GatewayClass` is named `eg` +- The shared HTTP `main-gateway` lives in `envoy-gateway-system`; per-app Gateways attach implicitly via the merged data plane. + +## Before pushing a PR — required QA + +Run a local projection and diff it against the deployed tree. **No PR ships without this.** + +```bash +# 1. Commit everything first +git status # must be clean + +# 2. Fetch and project against the deploy branch's layout +git fetch origin +SHA=$(git holo project k8s-manifests-github 2>&1 | tail -1) + +# 3. Diff +git diff --name-status origin/deploys/k8s-manifests "$SHA" +git diff --stat origin/deploys/k8s-manifests "$SHA" + +# 4. Spot-check content for changed files +git show "$SHA": +``` + +If using `git holo project --working` to test uncommitted changes, project `k8s-manifests` (not `-github`) and expect the deploy workflow files to show as deletions — they live in `k8s-manifests-github`, not `k8s-manifests`. Harmless noise. Committing first is usually simpler. + +The diff is the definitive preview. Read it carefully — admission webhooks add defaults that show up here (HTTPRoutes get default `PathPrefix: /` matches, etc.) and side effects of changed helm values can surface as unrelated-looking ConfigMap or Deployment edits (e.g. `ingress.enabled: false` clearing `server.domain` on grafana, `MB_SITE_URL` on metabase if it ever lands here). + +## Common operations + +### Add a new app + +1. Create workspace dir + resources (chart values or kustomize) +2. Add holomapping at `.holo/branches/k8s-manifests//` +3. Add lens config at `.holo/lenses/.toml` if applicable +4. Add `_gateways/.yaml` if it needs external HTTPS (post-migration); pre-migration it still uses an Ingress on the chart +5. Run the projection + diff (above) + +### Bump an upstream chart version + +Edit the version pin in `.holo/sources/.toml` → `git holo source fetch ` → project → diff. + +For chart versions owned by the upstream chain: bump in cluster-template or civic-cloud → wait for release → bump civic-cloud pin here. + +### Disable an Ingress on a helm-managed app + +`/release-values.yaml`: `ingress.enabled: false`. If the chart inferred its public hostname from `ingress.hosts[0]` (historical examples: grafana → `grafana.ini.server.domain`), set it directly via the chart's other values. Verify via render diff. This is phase 5 of #144 — only do it after the app's DNS has cut over to Envoy. + +## Cluster context + +Things not in any single grep-able file: + +- **Wildcard DNS**: `*.live.k8s.phl.io` resolves to the cluster's ingress LB. During the Envoy migration this stays pointed at ingress-nginx; per-hostname A records override the wildcard as each app cuts over to Envoy. +- **Apex domains in tree**: `balancerproject.org`, `choosenativeplants.com` (+ `www.`), `codeforphilly.org` (+ `www.`), `penn-chime.phl.io`, `vaultwarden.phl.io`, `bitwarden.phl.io`. Apex ACME challenges only work once DNS points at Envoy — plan cutover and cert issuance together for these. +- **Linode LKE LoadBalancer hairpin**: now native (in-cluster pods can reach the cluster's LB external IP). hairpin-proxy was the historical workaround and is scheduled for removal as part of #144 (the civic-cloud v1.9.2 bump drops it from the projection). +- **ingress-nginx + hairpin-proxy still present** on this cluster as of writing — both go away in #144 (phases 1 and 5.5 respectively). Don't reintroduce after they're gone. +- **No cnpg / shared-cluster** on this cluster yet. If a database is needed, it ships per-app (e.g. vaultwarden runs its own PostgreSQL StatefulSet via the gissilabs chart; chime + third-places similar). + +## Guardrails + +Take these only with explicit user authorization: + +- `kubectl apply/delete/patch` against shared-infra namespaces: `kube-system`, `cert-manager`, `ingress-nginx` (until decommissioned), `envoy-gateway-system` (once present), `sealed-secrets` +- Force-pushes to `releases/k8s-manifests` or `deploys/k8s-manifests` +- Merging upstream release PRs (cluster-template, civic-cloud) — user handles these +- Restarting deployments in shared namespaces +- Bumping `.holo/sources/civic-cloud.toml` (carries cert-manager + Gateway API + Envoy Gateway + hairpin-proxy removal — phase-1 of #144 lives in that bump) + +Editing workspace files in this repo and opening PRs are fine without per-action approval. + +## Known external issues + +- **hologit shallow-clone race** ([JarvusInnovations/hologit#450](https://github.com/JarvusInnovations/hologit/issues/450)) — `Build k8s-manifests` intermittently fails with `fatal: shallow file has changed since we read it`. Rerun the workflow. + +For repo-local issues, check the open issue list directly — anything I'd list here will rot. + +## References + +- Migration umbrella: [#144](https://github.com/CodeForPhilly/cfp-live-cluster/issues/144) +- Sandbox-equivalent (already complete, source for patterns): [cfp-sandbox-cluster#130](https://github.com/CodeForPhilly/cfp-sandbox-cluster/issues/130) +- Sandbox repo (for prior-art on patterns): +- Upstream cluster-template: +- civic-cloud cluster-template: +- Hologit: