Skip to content

[codex] Fix workflow task claim timeout#723

Draft
j-s wants to merge 1 commit into
paradigmxyz:mainfrom
j-s:codex/workflow-task-claim-timeout
Draft

[codex] Fix workflow task claim timeout#723
j-s wants to merge 1 commit into
paradigmxyz:mainfrom
j-s:codex/workflow-task-claim-timeout

Conversation

@j-s

@j-s j-s commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Increase Centaur workflow task claim timeout from 5 minutes to 30 minutes.
  • Apply the same timeout to every api-rs workflow worker queue through WorkerOptions.claim_timeout.
  • Heartbeat workflow tasks with the 30-minute extension so long bounded agent turns do not fail with $ClaimTimeout before they can finish.
  • Add a regression test that keeps the task claim timeout above the old 5-minute boundary and below the heartbeat cadence risk.

Root Cause

Agent-backed workflows such as agent_surface_sync can legitimately run for longer than 5 minutes while still being bounded by the agent turn timeout. The api-rs worker was still using the platform default task-claim window, so a valid workflow could be killed with $ClaimTimeout before the agent turn finished or reported.

Validation

  • cargo test -p centaur-workflows
  • PR CI: Rust API fmt, clippy, and tests passed.
  • Deployed equivalent patch on auto-research-centaur-v1 and verified a manual agent_surface_sync run completed without recent claim-timeout logs.

CI Note

The Publish Images matrix fails on this fork PR because GitHub reports Secret source: None and Depot returns permission_denied: Invalid token. That is a fork-permission/publishing-token limitation, not a Rust build/test failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant