Skip to content

bubble up build/deploy failure details#163

Merged
vmelikyan merged 2 commits intomainfrom
failure-details
Apr 15, 2026
Merged

bubble up build/deploy failure details#163
vmelikyan merged 2 commits intomainfrom
failure-details

Conversation

@vmelikyan
Copy link
Copy Markdown
Contributor

  • Preserve terminal build/deploy failure reasons in statusMessage
  • Record deploy failures before rethrowing instead of swallowing details

@vmelikyan vmelikyan requested a review from a team as a code owner April 15, 2026 19:38
Copy link
Copy Markdown
Contributor

@vigneshrajsb vigneshrajsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

  • [P2] Aurora redeploy failures can still be recorded against a stale run UUID (src/server/services/deploy.ts:403-408, src/server/services/deploy.ts:429-433, src/server/services/deploy.ts:806-810, src/server/services/deploy.ts:865-866)
    deployAurora() writes a fresh runUUID only through deploy.$query().patch(...) at lines 403-408, but it never assigns that UUID back onto the in-memory model before the failure path. If cliDeploy() throws, the catch at lines 427-433 passes deploy.runUUID into recordDeployFailure(). When this instance still holds the older UUID, recordDeployFailure() treats that stale value as authoritative at lines 806-810, and patchAndUpdateActivityFeed() patches by { id, runUUID } at line 865. That means the error update can miss the row entirely and leave the redeploy stuck in BUILDING.

  • [P2] The classic-mode manifest catch now overwrites every active deploy as failed (src/server/services/build.ts:1426-1440, src/server/services/build.ts:1473-1483)
    In the classic path, deploys is built once as the full list of active classic deploys for the build at lines 1426-1440. If either k8s.applyManifests(build) or k8s.waitForPodReady(build) fails, the catch at lines 1473-1483 calls recordDeployFailure() for every entry in that list with the same DEPLOY_FAILED status and message. A single broken rollout therefore marks unrelated healthy services as failed and loses which service actually triggered the error.

Copy link
Copy Markdown
Contributor

@vigneshrajsb vigneshrajsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the latest updates. The previously flagged failure-recording issues are addressed, and the added tests cover those regression paths.

@vmelikyan vmelikyan merged commit 0bec62e into main Apr 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants