fix(providers): harden postgres major upgrades#151
Conversation
|
Thanks for this @bherila — the volume-handoff hardening is the right direction, and these are genuinely dangerous paths (a bad handoff = pg18 opening stale pg17 PGDATA), so the "fail early instead of silently reusing incompatible data" framing is exactly right. The A few items to address before merge (CHANGELOG conflict aside): 1. Route the new exec through
The code being replaced was already a bespoke inline loop, so this isn't introducing the first duplication — but consolidating onto 2. Copy-sidecar leak on the 3. Real pg17→pg18 validation before merge (the important one). Test changes themselves look good — swapping the |
Summary
Extracts the Postgres major-upgrade hardening that was originally discovered while validating MariaDB PITR E2E coverage in #138.
This PR keeps the scope to Postgres upgrade correctness only:
s3_sourcesandbackupsrows in the Postgres upgrade test fixture sopostgres_major_upgrades.pre_upgrade_backup_idsatisfies the real FK constraint;psql SELECT 1, not just Docker health /pg_isready, and create the configured DB if entrypoint initialization has not finished that race yet;PGDATAafter a supposedly clean copy;Why this is required
The pg17 to pg18 upgrade path depends on a clean data-volume handoff. During CI validation, the old behavior could leave the live volume or copy sidecar around and then start the target pg18 container against stale pg17 data. That produces misleading readiness states and can make an upgrade appear to advance while the target database is not actually usable.
The fixture update is also required because the tests now run against the real schema: the fake backup id used by the upgrade tests violated the
backupsFK once the orchestrator persisted it topostgres_major_upgrades.pre_upgrade_backup_id.Together these changes make the Postgres upgrade tests exercise the real control-plane constraints and make the runtime upgrade path fail early instead of silently reusing incompatible data.
Verification
Local:
cargo fmt --all -- --checkcargo check -p temps-providers --features docker-tests --libcargo test -p temps-providers --features docker-tests --lib orchestrator_phase_new_container_completes_under_timeout -- --nocapture --test-threads=1(passes locally; Docker-dependent body skipped because local Docker is unavailable)I also kicked off the fork GitHub Actions workflow for this branch and will attach the run once it completes.