Skip to content

Correctness fixes: maxFee, 504 retries, key cache#238

Merged
CassioMG merged 6 commits into
mainfrom
cg-correctness-improvements
May 29, 2026
Merged

Correctness fixes: maxFee, 504 retries, key cache#238
CassioMG merged 6 commits into
mainfrom
cg-correctness-improvements

Conversation

@CassioMG

@CassioMG CassioMG commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Three independent correctness bug fixes, one commit each:

  1. submitWithFeeIncrease now respects maxFee on every retry, not just the first — previously the fee cap was silently dropped after the first retry, letting fees grow unbounded.
  2. submitTransaction 504 retries are now bounded with exponential backoff instead of recursing forever with no delay — avoids hammering Horizon.
  3. KeyManager.removeKey now clears the in-memory cache — previously a removed key lingered in the cache and could still be used to sign when shouldCache was on.

No public API changes. No new dependencies.


1. Forward maxFee across submitWithFeeIncrease retries (1dd8411)

submitWithFeeIncrease rebuilt its params object on each recursive retry but omitted maxFee. As a result the cap was only enforced on the first retry; from the second retry onward maxFee was undefined and the guard if (maxFee && …) short-circuited to false, letting the fee grow unbounded.

  • Forward maxFee in the recursive call.
  • Tighten the guard from maxFee && newFee > maxFee to maxFee !== undefined && newFee > maxFee.

2. Bound submitTransaction 504 retries with exponential backoff (c1dc76f)

On HTTP 504 the method recursed with no retry limit, no delay, and no backoff. Under sustained Horizon 504s this hammered the already-slow server and could eventually exhaust the call stack (RangeError: Maximum call stack size exceeded), since V8 doesn't apply tail-call optimization to await.

  • Convert the recursion to an iterative loop with a fixed five-retry budget.
  • Capped exponential backoff: 1s base, 30s ceiling, additive jitter up to +50%.
  • After exhausting retries, the last 504 is re-thrown.
  • Use optional chaining on e?.response?.status to avoid a latent crash when a non-axios error enters the catch block.

3. Clear KeyManager cache entry on removeKey (e2d9691)

_writeIndexCache's guard if (this.shouldCache && key) skipped the body when key was undefined, so removeKey's _writeIndexCache(id, undefined) call was a no-op. With shouldCache: true, a removed key stayed in the in-memory cache and subsequent signTransaction / fetchAuthToken calls kept returning it.

  • Split the guard so the method writes when given a key and deletes the cache entry when given undefined.

Testing

Each fix ships with a focused unit test:

  • submitWithFeeIncrease cap is enforced across multiple retries (not just the first).
  • submitTransaction 504 handling: retries then succeeds, re-throws non-504 immediately, and gives up after the retry budget.
  • removeKey purges the cache so a subsequent sign attempt misses and fails as expected.

No public API changes. No new dependencies.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 27, 2026 21:38

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Three independent correctness bug fixes to the core SDK and key-management package, each backed by a focused unit test.

Changes:

  • Forward maxFee in submitWithFeeIncrease recursive retries and tighten the guard to maxFee !== undefined so the cap is enforced on every retry, not just the first.
  • Replace unbounded recursion in submitTransaction's 504 handling with a bounded iterative loop (5 retries) using capped exponential backoff + jitter, plus optional chaining on e?.response?.status to harden non-axios error handling.
  • Fix KeyManager._writeIndexCache so it actually deletes the cached entry when key is undefined, ensuring removeKey purges the in-memory cache.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
@stellar/typescript-wallet-sdk/src/walletSdk/Horizon/Stellar.ts Bounded 504 retry loop with backoff; forward maxFee and tighten its guard in submitWithFeeIncrease.
@stellar/typescript-wallet-sdk/test/stellar.test.ts Adds tests for multi-retry maxFee enforcement and the three 504-handling paths (retry+success, non-504 rethrow, exhaustion).
@stellar/typescript-wallet-sdk-km/src/keyManager.ts Split cache guard so _writeIndexCache(id, undefined) deletes the entry.
@stellar/typescript-wallet-sdk-km/test/keyManager.test.ts Adds removeKey cache-purge test; minor formatting cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CassioMG CassioMG changed the title Correctness fixes: submitWithFeeIncrease maxFee, submitTransaction 504 retries, KeyManager cache Correctness fixes: maxFee, 504 retries, key cache May 27, 2026
CassioMG added 3 commits May 27, 2026 15:08
The recursive call rebuilt the params object without maxFee, so the cap
was enforced on the first retry only. After that, maxFee was undefined
and the guard short-circuited to false, letting the fee grow without
bound. Forward maxFee in the recursive call and tighten the guard from
`maxFee && …` to `maxFee !== undefined && …`.
The previous implementation recursed on every 504 with no delay, no
retry limit, and no backoff. Under sustained Horizon 504s this both
hammered the already-congested server and eventually crashed the
process with RangeError: Maximum call stack size exceeded.

Convert to an iterative loop with capped exponential backoff (1s base,
30s ceiling, additive jitter up to +50%) and a fixed five-retry
budget. After exhausting retries, the last 504 is rethrown.

Also use optional chaining on `e?.response?.status` to avoid a latent
crash when a non-axios error (e.g., TransactionSubmitFailedError from
the `!response.successful` branch) enters the catch.
The guard `if (this.shouldCache && key)` skipped the body when key
was undefined, so removeKey's `_writeIndexCache(id, undefined)` call
was a no-op. The cached Key remained in memory and subsequent
signTransaction / fetchAuthToken calls returned the "deleted" key.

Split the guard so the method writes when given a key and deletes the
cache entry when given undefined.
@CassioMG CassioMG force-pushed the cg-correctness-improvements branch from a216429 to 4593135 Compare May 27, 2026 22:09
@CassioMG CassioMG requested a review from piyalbasu May 27, 2026 22:14
@CassioMG CassioMG self-assigned this May 27, 2026
@CassioMG CassioMG requested a review from Copilot May 27, 2026 22:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@piyalbasu piyalbasu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of correctness concerns on the new submitTransaction / submitWithFeeIncrease logic.

const newFee = parseInt(transaction.fee) + baseFeeIncrease;

if (maxFee && newFee > maxFee) {
if (maxFee !== undefined && newFee > maxFee) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxFee: 0 regression. This guard changed from if (maxFee && newFee > maxFee) to if (maxFee !== undefined && newFee > maxFee). The old behavior treated maxFee === 0 as falsy ("no cap"); the new behavior treats it as a hard zero cap that always trips, since newFee = parseInt(transaction.fee) + baseFeeIncrease > 0.

Any caller passing maxFee: 0 (defensive default, numeric coercion of an unset env var, or as a documented no-cap sentinel under the old behavior) will now throw TransactionSubmitWithFeeIncreaseFailedError(0, e) on the very first tx_too_late and never resubmit. Worth a CHANGELOG note flagging the breaking change, or — if 0 was previously undocumented — at least narrowing the type to a positive number.

Separately, this is also bypassed when parseInt(transaction.fee) returns NaN: NaN > maxFee is false, so submitWithFeeIncrease recurses with baseFee: NaN. Worth guarding with Number.isFinite(newFee).

@CassioMG CassioMG May 29, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On maxFee: 0: agreed, reverted in 4939bb2 — the guard is back to if (maxFee && newFee > maxFee), preserving the maxFee: 0 ≈ no cap behavior. The actual bug fix (forwarding maxFee across the recursive call) is what closes the report; the guard tightening was spec creep.

On NaN guard: pushing back. transaction.fee comes from @stellar/stellar-sdk's TransactionBuilder, which sets fee from a numeric value — it's always a numeric string. For parseInt to return NaN here, the SDK contract would have to break, and we already depend on that contract elsewhere (baseFee.toString() in makeFeeBump, the builder constructor, etc.). I'd rather not sprinkle Number.isFinite defenses against contracts we already lean on; happy to revisit if there's a concrete path where transaction.fee could be non-numeric.

if (e.response.status === 504) {
// in case of 504, keep retrying this tx until submission succeeds or we get a different error
let lastError: unknown;
for (let attempt = 0; attempt <= SUBMIT_504_MAX_RETRIES; attempt++) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

504 retry semantics changed from unbounded to ~6 attempts / ~46s. The previous implementation retried recursively forever; this caps at SUBMIT_504_MAX_RETRIES + 1 attempts and then throws the last 504. Two follow-ups worth considering:

  1. Caller cascade: submitWithFeeIncrease catches submitTransaction errors and only handles tx_too_late. A bubbled 504 has getResultCode(e) === '', so it rethrows immediately — even though the original transaction may still land on-chain. A naive caller that retries on its own will double-submit. Worth either documenting this in the JSDoc or special-casing 504 in submitWithFeeIncrease.

  2. Jitter exceeds the documented cap (line 160): cappedDelay = Math.min(BASE * 2**attempt, MAX) is clamped to 30s, but jitter (Math.random() * cappedDelay/2, up to 15s) is then added on top, giving ~45s waits on the final retries — 50% over SUBMIT_504_MAX_DELAY_MS. If the cap is meant to bound caller latency, apply jitter first and Math.min after, or use full-jitter (Math.random() * cappedDelay).

@CassioMG CassioMG May 29, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 1 (caller cascade): you're right that an exhausted-504 bubbles through submitWithFeeIncrease unchanged, and the developer-experience concern stands. Documented in e7aba81 — the new JSDoc on submitTransaction explicitly calls out that an exhausted-504 leaves the on-chain status indeterminate and that callers should poll the tx hash rather than resubmit blindly (and notes that resubmitting the same signed tx will fail with tx_bad_seq once the original lands, so it's a DX problem rather than a fund-loss one — Stellar's sequence numbers prevent the double-submit you're concerned about).

On 2 (jitter exceeds the cap): fixed in ba3cb75 — switched to equal-jitter (cappedDelay/2 + Math.random() * cappedDelay/2), so sleeps now stay in [cap/2, cap]. Retains the herd-smoothing benefit with a deterministic progressive floor, and the named SUBMIT_504_MAX_DELAY_MS actually bounds the wait now.

@piyalbasu piyalbasu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'd say the maxFee comment is worth addressing, but the 504 improvement could maybe be pushed off

CassioMG and others added 3 commits May 28, 2026 18:44
The previous additive jitter (cappedDelay + Math.random()*cappedDelay/2)
let sleeps exceed SUBMIT_504_MAX_DELAY_MS by 50% (up to 45s with the 30s
cap). Switch to equal-jitter: each attempt waits cappedDelay/2 plus a
random amount up to cappedDelay/2, so the total stays in [cap/2, cap].
The deterministic floor keeps the schedule progressive while preserving
the herd-smoothing benefit.

Update the 504 retry tests to match the new (cappedDelay/2)-floor sleeps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous tightening to \`maxFee !== undefined && newFee > maxFee\`
changed behavior for callers passing \`maxFee: 0\` (e.g. as a defensive
default or numeric coercion of an unset env var): the old truthy check
treated 0 as "no cap" and recursed, while the strict check threw on the
very first tx_too_late. Restore the original \`maxFee && newFee > maxFee\`
guard. The actual fix — forwarding \`maxFee\` across the recursive call —
remains in place, which is what closes the underlying bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the JSDoc describe the actual behavior: a bounded number of 504
retries with exponential backoff, immediate rethrow on non-504 errors,
and an explicit note that an exhausted-504 leaves the transaction's
on-chain status indeterminate so callers should poll the tx hash
instead of resubmitting blindly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CassioMG

CassioMG commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

This looks good to me. I'd say the maxFee comment is worth addressing, but the 504 improvement could maybe be pushed off

@piyalbasu Both your comments have been addressed, thanks for the review!

@CassioMG CassioMG merged commit 07da147 into main May 29, 2026
8 checks passed
@CassioMG CassioMG deleted the cg-correctness-improvements branch May 29, 2026 02:06
@CassioMG CassioMG mentioned this pull request May 29, 2026
3 tasks
CassioMG added a commit that referenced this pull request May 29, 2026
Bug-fix release rolling up #237 (SEP-10 challenge integrity) and #238
(submitWithFeeIncrease maxFee forwarding, bounded submitTransaction 504
retries, KeyManager removeKey cache deletion).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants