Skip to content

fix(#2432): retry merge on 409 after updating PR branch#2434

Open
ralphbean wants to merge 1 commit into
mainfrom
fix/2432-merge-409-retry
Open

fix(#2432): retry merge on 409 after updating PR branch#2434
ralphbean wants to merge 1 commit into
mainfrom
fix/2432-merge-409-retry

Conversation

@ralphbean

Copy link
Copy Markdown
Member

Summary

  • When MergeChangeProposal gets a 409 "Head branch is out of date", it now calls GitHub's PUT .../pulls/{n}/update-branch to sync the PR with the base, waits 3s, then retries the merge (up to 3 attempts).
  • Non-409 errors are returned immediately (no behavior change).
  • Fixes the flaky TestAdminInstallUninstall failure at the enrollment PR merge step.

Closes #2432

Test plan

  • TestMergeChangeProposal_Success — happy path unchanged
  • TestMergeChangeProposal_409UpdatesBranchAndRetries — 409 triggers update-branch then successful retry
  • TestMergeChangeProposal_NonConflictErrorNotRetried — 422 not retried
  • TestMergeChangeProposal_409PersistsAfterRetries — gives up after max attempts with clear error

🤖 Generated with Claude Code

When MergeChangeProposal gets a 409 "Head branch is out of date", call
GitHub's update-branch endpoint to sync the PR branch with the base,
wait briefly, then retry the merge. Up to 3 attempts before giving up.
Non-409 errors are still returned immediately.

Closes #2432

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Retry PR merge after updating branch on 409 “out of date” conflicts
🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

Description

• Retry squash-merge when GitHub returns 409 “head branch is out of date”.
• Auto-call update-branch, wait briefly, then re-attempt merge (max 3 tries).
• Add focused HTTP tests covering success, retry, non-retry errors, and retry exhaustion.
Diagram

sequenceDiagram
  participant Caller as Caller
  participant GHClient as LiveClient
  participant GitHub as GitHub API

  Caller->>GHClient: MergeChangeProposal(owner, repo, number)
  GHClient->>GitHub: PUT /pulls/{n}/merge (squash)
  alt Merge succeeds
    GitHub-->>GHClient: 200 OK
    GHClient-->>Caller: nil
  else 409 Head out of date
    GitHub-->>GHClient: 409 Conflict
    GHClient->>GitHub: PUT /pulls/{n}/update-branch
    GitHub-->>GHClient: 202 Accepted
    GHClient-->>GHClient: wait 3s (or ctx cancel)
    GHClient->>GitHub: PUT /pulls/{n}/merge (retry, up to 3)
    alt Eventually succeeds
      GitHub-->>GHClient: 200 OK
      GHClient-->>Caller: nil
    else Still 409 after max attempts
      GitHub-->>GHClient: 409 Conflict
      GHClient-->>Caller: error (out of date after retries)
    end
  else Non-409 error
    GitHub-->>GHClient: 4xx/5xx
    GHClient-->>Caller: error (no retry)
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Poll update-branch completion instead of fixed 3s sleep
  • ➕ More reliable under load/queueing (avoids retrying before branch update finishes).
  • ➕ Can reduce unnecessary delay when updates complete quickly.
  • ➖ Requires an additional API to check update status (or repeated merge attempts anyway).
  • ➖ Adds complexity (poll loop, backoff, timeout semantics).
2. Exponential backoff with jitter for retries
  • ➕ More resilient to variable GitHub processing time and transient errors.
  • ➕ Reduces thundering-herd behavior if many merges hit 409 simultaneously.
  • ➖ Slightly more complex logic and test expectations.
  • ➖ May increase total time-to-fail in persistent conflict cases.
3. Use GitHub Auto-merge (enable and let GitHub merge when up-to-date)
  • ➕ Offloads retry/merge timing to GitHub.
  • ➕ Potentially fewer client-side heuristics (sleep/retry).
  • ➖ May require repo/PR settings and additional permissions.
  • ➖ Behavioral change: merge becomes asynchronous and harder to reason about in workflows.

Recommendation: The PR’s approach is a pragmatic fix for flaky merges: detect the specific 409, trigger update-branch, and retry a small fixed number of times. The main improvement to consider is replacing the fixed 3s delay with a bounded poll/backoff strategy (or backoff+jitter) to better match GitHub’s actual update latency, but the current implementation is a reasonable minimal change given the added test coverage.

Files changed (2) +154 / -5

Bug fix (1) +33 / -5
github.goRetry merge on 409 by updating PR branch via update-branch endpoint +33/-5

Retry merge on 409 by updating PR branch via update-branch endpoint

• Adds a bounded retry loop to 'MergeChangeProposal' that specifically handles HTTP 409 conflicts by calling GitHub’s 'update-branch' endpoint, waiting briefly, and retrying the merge. Non-409 errors still return immediately, and a final error is returned if the branch remains out of date after all attempts.

internal/forge/github/github.go

Tests (1) +121 / -0
github_merge_test.goAdd tests for merge retry/update-branch behavior and retry limits +121/-0

Add tests for merge retry/update-branch behavior and retry limits

• Introduces an 'httptest'-based suite validating: normal merge success, 409-triggered update-branch then successful retry, non-409 errors not retried, and persistent 409s eventually failing after retries.

internal/forge/github/github_merge_test.go

@github-actions

Copy link
Copy Markdown

Site preview

Preview: https://f9b65d81-site.fullsend-ai.workers.dev

Commit: fd6cbd9373f64f45ad121e8e9be20146b3c2581a

@qodo-code-review

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 51 rules
✅ Skills: writing-user-docs, writing-adrs

Grey Divider


Action required

1. range maxAttempts invalid loop 📘 Rule violation ≡ Correctness
Description
MergeChangeProposal uses for attempt := range maxAttempts, which is invalid Go syntax and will
fail compilation. This will cause make go-vet (and therefore linting in CI) to fail for this PR.
Code

internal/forge/github/github.go[2064]

+	for attempt := range maxAttempts {
Evidence
The compliance checklist requires make go-vet and make lint to pass. The new code introduces
invalid Go syntax at internal/forge/github/github.go:2064, which will fail compilation and
therefore fail go vet (and linting that includes vet).

Rule 1062050: Go code must pass make go-vet without issues
Rule 1062039: All code changes must pass make lint without failures
internal/forge/github/github.go[2060-2066]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`MergeChangeProposal` contains an invalid Go loop: `for attempt := range maxAttempts { ... }`, which will not compile.

## Issue Context
This change is intended to retry merge attempts up to `maxAttempts` times, so the loop should iterate `attempt` from 0 to `maxAttempts-1`.

## Fix Focus Areas
- internal/forge/github/github.go[2060-2091]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Update-branch errors ignored 🐞 Bug ≡ Correctness
Description
MergeChangeProposal calls the PR update-branch endpoint via do() but never checks the HTTP
status code nor returns updateErr, so it can keep retrying merges even when the branch update was
rejected/rate-limited and ultimately return a misleading “branch remained out of date” error. It
also performs an update-branch call on the final 409 attempt even though no subsequent merge retry
will occur.
Code

internal/forge/github/github.go[R2076-2088]

+		// Update the PR branch to incorporate base branch changes.
+		updateResp, updateErr := c.do(ctx, http.MethodPut, updatePath, map[string]string{})
+		if updateErr == nil {
+			updateResp.Body.Close()
+		}
+
+		if attempt < maxAttempts-1 {
+			select {
+			case <-time.After(3 * time.Second):
+			case <-ctx.Done():
+				return ctx.Err()
+			}
+		}
Evidence
In MergeChangeProposal, the new update-branch request is made with do() and its error/status
are not acted upon. The do() helper explicitly returns responses without status checking, while
checkStatus() is the mechanism used elsewhere to turn non-success statuses into APIErrors—so
skipping it here means update failures are silently ignored.

internal/forge/github/github.go[2056-2092]
internal/forge/github/github.go[95-140]
internal/forge/github/github.go[216-236]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`MergeChangeProposal` uses `c.do()` for `PUT .../update-branch` and ignores both (a) non-2xx HTTP status codes and (b) `updateErr`. Since `do()` does not treat non-2xx as errors, the code can proceed as if the branch update succeeded when it actually failed, and it may perform an unnecessary final `update-branch` call even when no further merge retry will happen.

## Issue Context
- `do()` returns `(*http.Response, nil)` for non-2xx responses; status validation must be done via `checkStatus` (or an equivalent helper).
- `update-branch` commonly returns `202 Accepted`, so the status checker should accept 202.

## Fix Focus Areas
- internal/forge/github/github.go[2059-2092]
- internal/forge/github/github.go[95-165]
- internal/forge/github/github.go[216-236]

## Suggested fix approach
1. Only call `update-branch` when `attempt < maxAttempts-1` (i.e., when a retry will actually occur).
2. After `updateResp, updateErr := c.do(...)`:
  - If `updateErr != nil`, return a wrapped error (e.g., `fmt.Errorf("update pull request #%d branch: %w", number, updateErr)`).
  - Otherwise `defer updateResp.Body.Close()` and validate status with `checkStatus(updateResp, http.StatusAccepted, http.StatusOK, http.StatusNoContent)` (whatever is correct for your usage).
  - If status validation fails, return a wrapped error so the caller sees the real cause.
3. Consider preserving the last merge 409 `err` and including it in the final failure message for better diagnostics (optional but helpful).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

mergePath := fmt.Sprintf("/repos/%s/%s/pulls/%d/merge", owner, repo, number)
updatePath := fmt.Sprintf("/repos/%s/%s/pulls/%d/update-branch", owner, repo, number)

for attempt := range maxAttempts {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. range maxattempts invalid loop 📘 Rule violation ≡ Correctness

MergeChangeProposal uses for attempt := range maxAttempts, which is invalid Go syntax and will
fail compilation. This will cause make go-vet (and therefore linting in CI) to fail for this PR.
Agent Prompt
## Issue description
`MergeChangeProposal` contains an invalid Go loop: `for attempt := range maxAttempts { ... }`, which will not compile.

## Issue Context
This change is intended to retry merge attempts up to `maxAttempts` times, so the loop should iterate `attempt` from 0 to `maxAttempts-1`.

## Fix Focus Areas
- internal/forge/github/github.go[2060-2091]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +2076 to +2088
// Update the PR branch to incorporate base branch changes.
updateResp, updateErr := c.do(ctx, http.MethodPut, updatePath, map[string]string{})
if updateErr == nil {
updateResp.Body.Close()
}

if attempt < maxAttempts-1 {
select {
case <-time.After(3 * time.Second):
case <-ctx.Done():
return ctx.Err()
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Update-branch errors ignored 🐞 Bug ≡ Correctness

MergeChangeProposal calls the PR update-branch endpoint via do() but never checks the HTTP
status code nor returns updateErr, so it can keep retrying merges even when the branch update was
rejected/rate-limited and ultimately return a misleading “branch remained out of date” error. It
also performs an update-branch call on the final 409 attempt even though no subsequent merge retry
will occur.
Agent Prompt
## Issue description
`MergeChangeProposal` uses `c.do()` for `PUT .../update-branch` and ignores both (a) non-2xx HTTP status codes and (b) `updateErr`. Since `do()` does not treat non-2xx as errors, the code can proceed as if the branch update succeeded when it actually failed, and it may perform an unnecessary final `update-branch` call even when no further merge retry will happen.

## Issue Context
- `do()` returns `(*http.Response, nil)` for non-2xx responses; status validation must be done via `checkStatus` (or an equivalent helper).
- `update-branch` commonly returns `202 Accepted`, so the status checker should accept 202.

## Fix Focus Areas
- internal/forge/github/github.go[2059-2092]
- internal/forge/github/github.go[95-165]
- internal/forge/github/github.go[216-236]

## Suggested fix approach
1. Only call `update-branch` when `attempt < maxAttempts-1` (i.e., when a retry will actually occur).
2. After `updateResp, updateErr := c.do(...)`:
   - If `updateErr != nil`, return a wrapped error (e.g., `fmt.Errorf("update pull request #%d branch: %w", number, updateErr)`).
   - Otherwise `defer updateResp.Body.Close()` and validate status with `checkStatus(updateResp, http.StatusAccepted, http.StatusOK, http.StatusNoContent)` (whatever is correct for your usage).
   - If status validation fails, return a wrapped error so the caller sees the real cause.
3. Consider preserving the last merge 409 `err` and including it in the final failure message for better diagnostics (optional but helpful).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 18, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:34 PM UTC · Completed 4:56 PM UTC
Commit: fd6cbd9 · View workflow run →

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/forge/github/github.go 90.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@fullsend-ai-review

Copy link
Copy Markdown

Review

Findings

Medium

  • [error-handling-gap] internal/forge/github/github.go:2076 — The update-branch call's error is silently discarded. If c.do() returns an error (network failure, context cancellation, rate limit exhaustion), the code proceeds to sleep and retry the merge with no indication that the update failed. When all retries are exhausted, the final error message says "branch remained out of date" with no context about update-branch failures, making the issue difficult to diagnose in production.
    Remediation: At minimum, log the update error (if updateErr != nil). Consider including the last update-branch error in the final returned error message when all retries are exhausted.

Low

  • [edge-case] internal/forge/github/github.go:2076 — On the final loop iteration (attempt == maxAttempts-1), when the merge fails with 409, the code still calls update-branch even though the loop will not execute another merge attempt. This is wasted work and an unnecessary API call.
    Remediation: Move the update-branch call inside the if attempt < maxAttempts-1 block.

  • [test-inadequate] internal/forge/github/github_merge_test.go:111TestMergeChangeProposal_409PersistsAfterRetries asserts mergeAttempts.Load() > 1 (at least 2), but the contract is exactly 3 attempts (maxAttempts). A weaker assertion would pass even if the retry logic were accidentally changed to only retry once.
    Remediation: Use assert.Equal(t, int32(3), mergeAttempts.Load()) for precise verification.

  • [test-inadequate] internal/forge/github/github_merge_test.go — No test for update-branch failure (e.g., the update endpoint returning 422 or a network error). This is the scenario where the silent error swallowing in production code matters most.

  • [pattern-inconsistency] internal/forge/github/github.go:2080 — Body close is conditional (if updateErr == nil). The safer defensive pattern is if updateResp != nil { updateResp.Body.Close() }, which handles cases where the response object exists alongside a non-nil error.

  • [doc-style] internal/forge/github/github.go:2056 — The function comment documents the retry behavior but omits the delay duration (3 seconds) and total potential delay (~9s). Compare with retryOnTransient which documents timing details.


Labels: PR fixes a bug in the GitHub forge client that caused flaky e2e test failures

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

return fmt.Errorf("merge pull request #%d: %w", number, err)
}

// Update the PR branch to incorporate base branch changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] error-handling-gap

The update-branch call's error is silently discarded. If c.do() returns an error (network failure, context cancellation, rate limit exhaustion), the code proceeds to sleep and retry the merge with no indication that the update failed. When all retries are exhausted, the final error message says 'branch remained out of date' with no context about update-branch failures.

Suggested fix: At minimum, log the update error. Consider including the last update-branch error in the final returned error message when all retries are exhausted.

return fmt.Errorf("merge pull request #%d: %w", number, err)
}

// Update the PR branch to incorporate base branch changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

On the final loop iteration (attempt == maxAttempts-1), when the merge fails with 409, the code still calls update-branch even though the loop will not execute another merge attempt. Wasted work and unnecessary API call.

Suggested fix: Move the update-branch call inside the if attempt < maxAttempts-1 block.

default:
t.Errorf("unexpected request: %s %s", r.Method, r.URL.Path)
w.WriteHeader(http.StatusNotFound)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] test-inadequate

TestMergeChangeProposal_409PersistsAfterRetries asserts mergeAttempts.Load() > 1 but the contract is exactly 3 attempts. A weaker assertion would pass even if retry logic changed to only retry once.

Suggested fix: Use assert.Equal(t, int32(3), mergeAttempts.Load()) for precise verification.

updateResp, updateErr := c.do(ctx, http.MethodPut, updatePath, map[string]string{})
if updateErr == nil {
updateResp.Body.Close()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] pattern-inconsistency

Body close is conditional (if updateErr == nil). The safer defensive pattern is if updateResp != nil { updateResp.Body.Close() }.

@@ -2054,13 +2054,41 @@ func (c *LiveClient) DismissPullRequestReview(ctx context.Context, owner, repo s
}

// MergeChangeProposal squash-merges a pull request by number.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] doc-style

Function comment documents retry behavior but omits the delay duration (3 seconds) and total potential delay (~9s). Compare with retryOnTransient which documents timing details.

Suggested fix: Add timing details to the comment: 3-second delay, up to 3 retries (~9s total).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-manual-review Review requires human judgment type/bug Confirmed defect in existing behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(e2e): flaky 409 "Head branch is out of date" when merging enrollment PR

1 participant