Reported by: codex
Requested by: jmr.pineda
Priority: P1
Affected surfaces: runtime execution, HTTP transport behavior, resilience controls
Constraints: retries must remain explicit and observable so workflow outcomes stay understandable
Summary
Add transport-level retry controls that go beyond the current readiness policies so transient failures can be handled in a controlled and observable way during workflow execution.
Problem / opportunity
Current readiness policies do not cover the broader retry behavior needed for unstable networks, rate-limited endpoints, or transient remote faults. Without first-class controls, users must work around this at the workflow level or accept brittle execution.
Requested behavior
Users should be able to configure retry behavior for transport-level failures with clear semantics around attempts, delays, and failure reporting.
Scope
- In scope: retry controls, transport-failure handling, visibility in execution reporting.
- Out of scope: silent infinite retries or unrelated orchestration redesign.
Acceptance criteria
- Users can configure retry attempts and retry strategy for relevant transport-level failures.
- Execution reports show when retries occurred and what the final outcome was.
- The feature avoids masking deterministic failures that should fail fast.
Notes
- Source: local roadmap at
.doc/positioning-and-roadmap.md.
- This issue was created while migrating local backlog ownership to GitHub.
Reported by: codex
Requested by: jmr.pineda
Priority: P1
Affected surfaces: runtime execution, HTTP transport behavior, resilience controls
Constraints: retries must remain explicit and observable so workflow outcomes stay understandable
Summary
Add transport-level retry controls that go beyond the current readiness policies so transient failures can be handled in a controlled and observable way during workflow execution.
Problem / opportunity
Current readiness policies do not cover the broader retry behavior needed for unstable networks, rate-limited endpoints, or transient remote faults. Without first-class controls, users must work around this at the workflow level or accept brittle execution.
Requested behavior
Users should be able to configure retry behavior for transport-level failures with clear semantics around attempts, delays, and failure reporting.
Scope
Acceptance criteria
Notes
.doc/positioning-and-roadmap.md.