Skip to content

test(context-offloader): raise timeout for sandbox isolation test#2842

Open
notowen333 wants to merge 1 commit into
strands-agents:mainfrom
notowen333:wt/cleanups
Open

test(context-offloader): raise timeout for sandbox isolation test#2842
notowen333 wants to merge 1 commit into
strands-agents:mainfrom
notowen333:wt/cleanups

Conversation

@notowen333

Copy link
Copy Markdown
Contributor

Description

The context-offloader test isolates a shared storage config by agent sandbox flakes during the pre-commit hook, failing intermittently with Test timed out in 5000ms.

The test exercises FileStorage against a real PosixShellSandbox (TestSandbox) on purpose, so it covers the actual sandbox code paths — base64 encoding, shell quoting, metadata sidecar I/O. That fidelity has a cost: every storage operation spawns an sh subprocess, and the test performs ~9 of them across its two agents. Run on its own the test finishes in ~1.4s, comfortably under the 5s default.

The flake only appears under the pre-commit hook, which runs npm run test:coverage — the full unit-node suite in parallel with v8 coverage. Other subprocess-heavy suites (e.g. posix-shell.test.node.ts, 46 such tests) are spawning sh at the same time, so the spawns serialize on CPU and each balloons from ~150ms to ~600ms. Wall time climbs to ~6s and trips the 5s ceiling. Isolating either variable confirms the cause: single-fork runs (with or without coverage) pass consistently at ~1.3s; only the parallel run fails.

Rather than weaken the test by faking the sandbox, this gives the test a 15s timeout — roughly 2.5x the observed ~6s peak under load. That absorbs a slower or busier CI machine while staying low enough that a genuine hang still fails fast.

Related Issues

Documentation PR

No documentation changes required.

Type of Change

Bug fix

Testing

Reproduced the flake by running the exact pre-commit command (vitest run --coverage --project unit-node), which failed on the timeout consistently. After the change, ran the same command repeatedly — the test now lands at ~5.8–6.4s and passes every time, with all 3495 unit tests green. Lint, format, and type-check pass.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have reviewed and understand every line of code in this PR, including any generated by AI tools, and I can explain why it works
  • My change is focused and reasonably small; I have split unrelated work into separate PRs
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The FileStorage isolation test drives a real PosixShellSandbox, so every
storage operation spawns an `sh` subprocess (~9 spawns across both agents).
Solo it runs in ~1.4s, but under the parallel coverage run that pre-commit
invokes, the spawns serialize on CPU and push wall time past the 5s default,
making the test flake with a timeout.

Give the test a 15s timeout so the subprocess work fits comfortably under
load without masking a genuine hang.
@github-actions github-actions Bot added size/xs typescript Pull requests that update typescript code area-server Related to strands Server, sandbox, runtime container area-community Related to community and contributor health strands-running chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact labels Jun 16, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Approve

A focused, well-justified fix. The PR description's root-cause analysis checks out: vitest.config.ts gives the unit-node project no testTimeout, so it inherits Vitest's 5s default (only the integ-* projects set 60s), making this subprocess-heavy test vulnerable under the parallel coverage run. Raising the ceiling rather than mocking the sandbox is the right call — it preserves real PosixShellSandbox coverage of base64/shell-quoting/sidecar paths. The inline comment captures the why well, and 15_000 matches the numeric-separator style used elsewhere in the codebase.

Review notes
  • Correctness: 15s ≈ 2.5x the observed ~6s peak under load — enough headroom for a slow CI box while still failing fast on a genuine hang.
  • Test fidelity: Keeps the test driving a real sandbox instead of weakening it; good tradeoff, clearly documented.
  • Scope/style: Single-file, +5/-1, no API or doc impact; assertions unchanged and appropriate.
  • Optional (non-blocking): The underlying "sh spawns serialize on CPU under coverage" behavior is worth a tracking issue if it recurs across other subprocess-heavy suites, but it shouldn't gate this fix.

Nice diagnosis isolating the parallel-coverage variable as the trigger.

@notowen333 notowen333 enabled auto-merge (squash) June 16, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-community Related to community and contributor health area-server Related to strands Server, sandbox, runtime container chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact size/xs typescript Pull requests that update typescript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants