Skip to content

fix: make consul deploy swaps atomic#78

Merged
tybook merged 1 commit into
mainfrom
ty/neyn-11598-atomic-consul-deploy
May 28, 2026
Merged

fix: make consul deploy swaps atomic#78
tybook merged 1 commit into
mainfrom
ty/neyn-11598-atomic-consul-deploy

Conversation

@tybook

@tybook tybook commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Defer old ASG drain/delete until every selected Consul pod passes health checks.
  • If any selected Consul pod fails health, delete the new-release ASGs for the whole selected deploy set and keep the currently running release.
  • Bump Stack version to 0.0.94 for the next stable release.

Rollout

  1. Merge this PR to merkle-team/stack.
  2. Create and push tag v0.0.94 from the merge commit.
  3. Wait for the Stack release workflow to publish the v0.0.94 GitHub Release asset and Docker image.
  4. Because deploy-stack-action@v1 defaults to latest stable Stack when stack-version is unset, normal deploy workflows will pick up v0.0.94 on deployers that do not already have stack installed.
  5. For deployers with an existing installed stack, run stack update or pin stack-version: v0.0.94 in the workflow to force the new binary. Backend should use this before/with the backend migration-failure PR rollout.

Test plan

  • Cursor lints on changed files
  • Unable to run bun run lint locally because bun is not installed in this environment

Part of NEYN-11598.

Made with Cursor

Delay old ASG cleanup until every selected Consul pod passes health checks so one failed pod rolls back the whole deploy set.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings May 28, 2026 17:30
@tybook tybook merged commit b910b6c into main May 28, 2026
1 check passed
@tybook tybook deleted the ty/neyn-11598-atomic-consul-deploy branch May 28, 2026 17:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Consul-based deploy flow to behave atomically: only clean up the old ASGs after all selected Consul pods pass health checks, and tear down the new-release ASGs for the whole deploy set if any pod fails. This reduces the chance of ending up in a partially-swapped state during Consul deploys.

Changes:

  • Defers old-ASG cleanup until after all Consul service health checks pass; aborts and deletes new-release ASGs for the full selected pod set on any failure.
  • Removes per-pod old-ASG deletion from the Consul health-check loop to support atomic swaps.
  • Bumps the package version to 0.0.94.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/app.ts Reworks Consul health-check success/failure handling to make deploy cleanup atomic and postpones old-ASG deletion until all pods are healthy.
package.json Bumps Stack version from 0.0.93 to 0.0.94.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/app.ts
Comment on lines +405 to +420
// Only perform a swap if there are already running instances.
if (!this.options.applyOnly && alreadyRunningInstances.length) {
// It's possible the above apply command removed instances, so need to check again
const currentlyRunningInstances = await this.alreadyRunningInstances([
...new Set(podNames).difference(failedPods),
]);

if (currentlyRunningInstances.length) {
const currentlyRunningInstancesByPod =
await this.alreadyRunningInstancesByPod([
...new Set(podNames).difference(failedPods),
]);
// Run the pre-terminate script for each pod
const preTerminateScriptExitStatus =
await this.runPreContainerShutdownScripts(
[...new Set(podNames).difference(failedPods)],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants