A tiny, zero-dependency utility to run asynchronous operations with:
- Per-attempt timeouts
- An overall deadline
- Retry policy with configurable classification
- Jittered backoff (decorrelated jitter)
- AbortSignal composition (abort when any of multiple signals abort)
- Observability hooks
Lean, predictable, and safe for both browser and Node environments. Internally uses a monotonic clock to avoid deadline drift from wall-clock changes.
Version: 1.0.0
- Why
- Features
- Install
- Quick Start
- API
- Class RobustOperation
- Options
- Hooks
- Methods
- Static Utilities
- Helper Functions
- Error Types
- Behavioral Details
- Examples
- Resilient fetch
- Respect Retry-After
- External abort
- Instance cancel
- Custom retry classification
- TypeScript
- Notes on clocks, deadlines, and memory safety
- FAQ
- License
Most retry helpers stop at “retry n times with exponential backoff.” This utility goes further:
- Per-attempt timeouts and an overall deadline
- Jittered backoff tuned to avoid thundering herds
- AbortSignal composition so attempts and sleeps can be canceled immediately
- Sensible default classification for transient errors, with hooks for full control
- Metrics hook (
onFinish) for observability - Monotonic time internally, so wall-clock changes don’t surprise you
- Per-attempt timeout (or disable it with 0 or Infinity)
- Overall deadline across all attempts
- Decorrelated jitter backoff with min/max caps
- Retry classification with good defaults:
- HTTP 408/429/5xx (including 502/504)
- Common Node network error codes (ECONNRESET, ETIMEDOUT, ENOTFOUND, EADDRINUSE, …)
- Fetch-like network failures
- Honors Retry-After for 429/503 when present
- AbortSignal composition, including instance-level cancel (fire-and-forget then cancel later)
- Small surface area; zero dependencies
Use directly as an ES module in your project.
-
Local file import:
import { RobustOperation, secureRandom } from './robust-operation.js';
-
If you publish to npm later, usage will look like:
import { RobustOperation, secureRandom } from 'robust-operation';
This library is ESM-first.
import { RobustOperation, secureRandom } from './robust-operation.js';
const op = new RobustOperation({
retries: 3,
timeoutPerAttempt: 8000, // 8s per attempt; 0 or Infinity disables per-attempt timeout
random: secureRandom,
onError: (err, attempt, willRetry, ctx) => {
const nth = attempt + 1;
console.warn(
willRetry
? `Attempt ${nth} failed: ${err?.message || err}. Retrying in ${ctx.nextDelayMs}ms...`
: `Attempt ${nth} failed: ${err?.message || err}. No more retries.`
);
},
onFinish: ({ error, attempts, durationMs }) => {
console.log('Finished', { ok: !error, attempts, durationMs });
}
});
const result = await op.run(async (signal, { attempt }) => {
// your async operation here; check `signal.aborted` or pass it to fetch()
await fetch('https://api.example.com/data', { signal });
return 'ok';
});Constructs a policy-bound runner for asynchronous operations.
new RobustOperation(options?)All numeric values are clamped to sensible ranges. Defaults are shown.
-
retries: number = 3- Number of retries. Total attempts =
retries + 1. Must be >= 0.
- Number of retries. Total attempts =
-
timeoutPerAttempt: number = 15000- Milliseconds per attempt. Use
0orInfinityto disable per-attempt timeout.
- Milliseconds per attempt. Use
-
overallDeadlineMs: number = 0- Milliseconds for the entire run.
0disables the overall deadline.
- Milliseconds for the entire run.
-
minDelay: number = 0- Minimum delay between retries, in ms.
-
maxDelay: number = 30000- Maximum delay between retries, in ms.
-
random: () => number = Math.random- Random source for jitter. Consider
secureRandom().
- Random source for jitter. Consider
-
backoffBase: number = 1000- Base delay for the built-in decorrelated jitter strategy.
-
backoffStrategy?: () => number- Provide your own delay generator. Defaults to decorrelated jitter.
-
shouldRetry?: (err, attempt, ctx) => boolean | Promise<boolean>- Decide whether to retry on a given error/attempt. See default behavior below.
-
getDelay?: (err, attempt, ctx) => number | null | undefined | Promise<number | null | undefined>- Optionally override the next delay (e.g., to honor or clamp
Retry-After). Return0to retry immediately.
- Optionally override the next delay (e.g., to honor or clamp
-
onError?: (err, attempt, willRetry, ctx) => void | Promise<void>- Called on each failure; useful for logging/metrics.
-
onFinish?: ({ result, error, attempts, durationMs }) => void | Promise<void>- Called once when the run finishes (success or failure). Never throws.
attempt: number— zero-indexed attempt numberretries: numberretriesLeft: numberelapsedMs: number— monotonic duration sincerunstarteddeadlineAtWall: number— absolute ms since epoch when the overall deadline would expire, for logging; enforcement uses a monotonic clocktimeoutPerAttempt: numberoverallDeadlineMs: numbernextDelayMs?: number— planned delay before next attempt (present only whenwillRetryis true)
shouldRetry (default):
- Not retried: aborts,
IntegrityError - Retried:
TimeoutError, HTTP 408/429/5xx (including 502/504) - Retried: common transient Node error codes:
ECONNRESET,ECONNREFUSED,ECONNABORTED,ETIMEDOUT,ENETUNREACH,EHOSTUNREACH,EAI_AGAIN,EPIPE,ENOTFOUND,EADDRINUSE
- Retried: fetch-like network errors (
TypeErrorwith “Failed to fetch” / “Network request failed”) - Unknown errors: retried once
getDelay (default none):
- If not provided or returns
null/undefined/NaN, delay falls back to:Retry-Afterfor 429/503 (seconds or HTTP-date)- Backoff strategy (decorrelated jitter)
minDelay/maxDelayare enforced.
-
run(operation, options?) => Promise<T>Runs your async
operation(signal, { attempt }). Thesignalwill be aborted on:- External abort
- Instance-level
cancel() - Per-attempt timeout
- Overall deadline expiry
Per-run
optionscan override:retriestimeoutPerAttemptoverallDeadlineMssignal(external AbortSignal)
-
cancel(reason?) => voidAborts all in-flight operations started by this instance and prevents future runs from proceeding. Create a new instance to run more operations later.
-
signal: AbortSignal(getter)An instance-level signal that becomes aborted when
cancel()is called. You can pass this to your own code if needed.
-
RobustOperation.anySignal(signals?) => { signal: AbortSignal, cleanup: () => void }Compose multiple signals into one that aborts when any source aborts. Always call
cleanup()when done to remove listeners. -
RobustOperation.abortableSleep(delayMs, signal?) => Promise<void>Sleep for
delayMs, rejecting early ifsignalaborts.
-
createDecorrelatedJitter(baseMs, maxMs, random?) => () => numberReturns a function that yields the next delay using Amazon’s decorrelated jitter strategy.
-
secureRandom() => numberReturns a cryptographically strong random float in [0, 1), falling back to
Math.random()ifcrypto.getRandomValuesis not available.
TimeoutErrorIntegrityError
-
Per-attempt timeout
- Set
timeoutPerAttemptto0orInfinityto disable. - If the per-attempt timer fires,
operation’s signal is aborted and aTimeoutErroris raised.
- Set
-
Overall deadline
- Applies across all attempts including sleep between retries.
- Enforced with a monotonic clock to avoid wall-clock jumps.
- Sleep is clamped so it won’t overshoot the remaining budget.
-
Backoff and jitter
- Default is decorrelated jitter with
backoffBaseandmaxDelay. minDelayandmaxDelayare always enforced.
- Default is decorrelated jitter with
-
Retry classification
- Sensible defaults; fully customizable via
shouldRetry.
- Sensible defaults; fully customizable via
-
Retry-After
- For HTTP 429/503, if
Retry-Afteris present, it is used unlessgetDelayprovides an override.
- For HTTP 429/503, if
-
Abort behavior
operationreceives a composed signal that aborts on: external signal, instancecancel(), or per-attempt timeout.- Sleep between attempts can also be externally or instance-aborted.
import { RobustOperation, secureRandom } from './robust-operation.js';
const roFetch = new RobustOperation({
retries: 3,
timeoutPerAttempt: 8000,
random: secureRandom,
onError: (err, attempt, willRetry, ctx) => {
console.warn(
`Attempt ${attempt + 1} failed: ${err?.message || err}.` +
(willRetry ? ` Retrying in ${ctx.nextDelayMs}ms...` : ' Giving up.')
);
}
});
const result = await roFetch.run(async (signal) => {
const res = await fetch('https://api.example.com/data', { signal });
if (!res.ok) {
const err = new Error(`HTTP ${res.status}`);
err.status = res.status;
err.response = res;
throw err;
}
return await res.json();
});
console.log('Got data:', result);const op = new RobustOperation({
retries: 5,
getDelay: (err) => {
// Clamp any Retry-After (if present) to at most 10s. If not present, return null to use the built-in backoff.
// Note: built-in logic automatically uses Retry-After for 429/503. This example shows how to add a cap.
const h = err?.response?.headers?.get?.('Retry-After') ?? err?.retryAfter;
if (h && /^\d+(\.\d+)?$/.test(String(h))) {
return Math.min(Number(h) * 1000, 10_000);
}
return null; // fall back to built-in inference + jitter strategy
}
});const controller = new AbortController();
const op = new RobustOperation({ retries: 10, timeoutPerAttempt: 5000 });
const promise = op.run(async (signal) => {
// This fetch will be aborted if controller.abort() is called
const res = await fetch('https://example.com/slow', { signal });
return res.text();
}, { signal: controller.signal });
// Cancel from outside later:
controller.abort(new Error('No longer needed'));
await promise; // will reject with an abort reasonconst op = new RobustOperation({ retries: 100 });
// Kick off work you might cancel later:
const p1 = op.run(work);
const p2 = op.run(work);
// Cancel all runs started by this instance:
op.cancel(new Error('Shutting down'));
// Both p1 and p2 will reject promptlyconst op = new RobustOperation({
retries: 4,
shouldRetry: (err, attempt, ctx) => {
// Never retry on 4xx except 408/429; retry on 5xx
const status = err?.status ?? err?.response?.status;
if (status && status >= 400 && status < 500 && status !== 408 && status !== 429) {
return false;
}
return attempt < ctx.retries; // equivalent to attempt < 4 here
}
});-
The codebase is well-annotated with JSDoc, and a lightweight
index.d.tsre-exports the module’s symbols to improve TS friendliness:export * from './robust-operation.js';
-
Usage in TS:
import { RobustOperation, TimeoutError } from './robust-operation.js'; const op = new RobustOperation({ retries: 2 }); const value = await op.run<string>(async (signal) => { // ... return 'ok'; });
Note: For the best developer experience in a published package, consider shipping full .d.ts files generated from these JSDoc types.
-
Monotonic vs wall-clock:
- Deadline enforcement uses a monotonic clock (
performance.now()orprocess.hrtime.bigint()), so system time changes won’t extend or cut your deadlines unexpectedly. ErrorContext.deadlineAtWallis provided for logging/observability only.
- Deadline enforcement uses a monotonic clock (
-
Memory safety:
anySignal()returns acleanup()function; always call it (the library does so internally).- Long-lived processes: prefer reusing a
RobustOperationinstance rather than creating thousands of them dynamically.
-
Why “decorrelated jitter”?
- It spreads retries across time more effectively than naive exponential backoff, reducing load spikes under failure.
-
Does abort really cancel my operation?
- Your operation receives an
AbortSignal. Pass it to APIs that support it (e.g.,fetch) or checksignal.abortedyourself and stop promptly.
- Your operation receives an
-
What if my operation ignores the signal?
- The library aborts the signal and raises the timeout, but if your operation doesn’t respect abort, any internal work might still continue. Always design your operations to be abort-aware.
-
Can I configure total attempts instead of
retries + 1?- Today you configure
retries. AmaxAttemptsalias may be added in the future for symmetry with some APIs.
- Today you configure
Please be aware that this is an early-stage project. While I've done my best to make the code functional and correct, it has not yet been validated by an automated test suite.
I encourage you to try it out, but please test it thoroughly within your own application before relying on it for anything critical.
The project's top priority is to add tests, and any contributions in this area are very welcome.
This project is licensed under the MIT License.
Note on Authorship: Portions of this Software may have been generated with the assistance of AI tools. Final authorship, responsibility, and copyright for the Software rest with the copyright holder.
MIT License
Copyright (c) 2025 Edwin Hayward, Genki Productions Ltd
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.