Skip to content

balancer: cutover from RDS to shared-cluster cnpg #162

@themightychris

Description

@themightychris

@TineoC — over to you for the DB cutover. Full context below.

balancer on cfp-sandbox-cluster — current state + what's needed next

Now that the cluster-wide migration to Envoy Gateway is complete and the clean balancer design has landed (PR #160), this issue captures where balancer stands and what still needs to happen to actually run it on the shared cnpg cluster.

What's already in place

Infrastructure (cluster-wide)

  • Envoy Gateway controller running, with a shared LoadBalancer (mergeGateways: true)
  • cert-manager 1.20 with the gatewayHTTPRoute HTTP-01 solver
  • Global HTTP→HTTPS redirect on the main Gateway
  • ingress-nginx fully decommissioned
  • hairpin-proxy removed (Linode LKE hairpin works natively now)
  • shared-cluster cnpg PostgreSQL cluster (PG 18 + PostGIS 3.6 + pgvector 0.8) running with 2 replicas

balancer-specific (from PR #160, deployed)

  • balancer/Gateway/balancer — HTTPS listener on balancer.sandbox.k8s.phl.io, Programmed=True
  • balancer/HTTPRoute/balancer — routes traffic to Service/balancer:8000
  • balancer/Certificate/balancer-gw-tls — issued via Let's Encrypt, Ready=True
  • cloudnative-pg/Database/balancerapplied: true, empty database named balancer, owned by balancer role
  • Pod balancer-99445476d-jqbgw running — still serving traffic via the same balancer-config Secret

Workspace layout in cfp-sandbox-cluster

balancer/
├── kustomization.yaml      # wrapper, resources: [app, cnpg]
├── app/
│   ├── kustomization.yaml  # namespace: balancer
│   └── manifests/          # mapped from balancer-main v1.1.5 base via hologit
└── cnpg/
    ├── kustomization.yaml
    └── database.yaml       # Database CR (namespace: cloudnative-pg)

_gateways/balancer.yaml     # Gateway + HTTPRoute

What's running vs. what's queued

Thing Status
balancer pod running
HTTPS reachability via Envoy working — curl https://balancer.sandbox.k8s.phl.io/ returns 200
Database connection still on the old external RDS (balancer-jj.cab6cwkqwif9.us-east-1.rds.amazonaws.com)
Database/balancer on shared-cluster exists, empty, no client connecting
balancer-db-credentials SealedSecret in cloudnative-pg missing — cnpg has reconciled the balancer role but with no password set

The new infrastructure is fully provisioned and dual-pathable in principle. balancer just isn't using the new DB yet.

What's needed to finish the cutover

1. Create the balancer-db-credentials SealedSecret in cloudnative-pg

Generate a strong password and seal it with kubeseal against the cluster's sealed-secrets controller. The Secret needs to be in cloudnative-pg namespace (where the cnpg Cluster lives) with key password:

apiVersion: v1
kind: Secret
metadata:
  name: balancer-db-credentials
  namespace: cloudnative-pg
type: kubernetes.io/basic-auth
stringData:
  username: balancer
  password: <generated>

Once this is sealed and applied via GitOps (or kubectl apply), cnpg will pick it up on the next reconciliation and set the balancer role's password to match. Verify with:

kubectl get cluster -n cloudnative-pg shared-cluster -o jsonpath='{.status.managedRolesStatus.passwordStatus.balancer}'

The transactionID should advance after the Secret lands.

2. Migrate the data from RDS to shared-cluster

The balancer database on shared-cluster is empty. Migrating from RDS:

# From a workstation with network access to both
pg_dump \
  -h balancer-jj.cab6cwkqwif9.us-east-1.rds.amazonaws.com \
  -U balancer \
  -d balancer_dev \
  --no-owner --no-privileges \
  > balancer-dump.sql

# Restore to the cnpg-managed database
# (use a port-forward or run from a pod in the cluster)
kubectl port-forward -n cloudnative-pg svc/shared-cluster-rw 5432:5432 &
PGPASSWORD=<from-balancer-db-credentials> psql \
  -h localhost -U balancer -d balancer \
  -f balancer-dump.sql

Schedule for a low-traffic window since the source DB will see read load and there shouldn't be writes between dump-and-restore (or schedule a brief read-only window).

3. Update balancer-config SealedSecret

The balancer Deployment reads its database config from Secret/balancer-config in the balancer namespace. After data migration:

Key Current New
SQL_HOST balancer-jj.cab6cwkqwif9.us-east-1.rds.amazonaws.com shared-cluster-rw.cloudnative-pg.svc.cluster.local
SQL_USER balancer balancer (same)
SQL_PASSWORD <RDS password> <from balancer-db-credentials>
SQL_DATABASE balancer_dev balancer
SQL_PORT 5432 5432 (same)

Re-seal with kubeseal, commit to cfp-sandbox-cluster, deploy.

4. Restart the balancer pod

After the Secret deploys, the pod needs a restart to pick up the new env vars:

kubectl rollout restart deployment -n balancer balancer

Verify with kubectl logs -n balancer -l app=balancer. The Django startup should connect to the new host without errors.

5. Verify + decommission RDS

  • https://balancer.sandbox.k8s.phl.io/admin/ should load and show migrated data
  • Once stable for a few days, the RDS instance can be retired (separate concern, not on the cluster)

Optional follow-ups (lower priority)

  • CORS ConfigMap — PR feat(balancer): integrate sandbox overlay with CNPG + Gateway API + CORS #143 was adding balancer-db-config with CORS settings. If the balancer app needs CORS config now, file as a separate small PR adding balancer/app/configmap.yaml + reference it from balancer/app/kustomization.yaml. The structure supports it cleanly; just hasn't been needed yet.

  • balancer-main base manifests on Gateway API — currently balancer-main's base/ingress.yaml is filtered out at the holomapping level. Upstream could ship base manifests with HTTPRoute instead (or no Ingress at all), simplifying the downstream filter. Not blocking.

  • balancerproject.org DNS — the old sandbox.balancerproject.org hostname is no longer reachable (its DNS pointed at the old ingress-nginx LB which is gone). If that hostname is still expected to work, it needs:

    1. An A record pointing at the Envoy LB (139.144.241.4)
    2. A second listener on the balancer Gateway with that hostname + cert annotation

    Or, if it's no longer needed (the secret template documents balancer.sandbox.k8s.phl.io as the canonical URL), the DNS record can be retired.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions