Skip to content

SIF-004: add transport-level retry controls beyond readiness policies #8

@jmrpineda

Description

@jmrpineda

Reported by: codex
Requested by: jmr.pineda
Priority: P1
Affected surfaces: runtime execution, HTTP transport behavior, resilience controls
Constraints: retries must remain explicit and observable so workflow outcomes stay understandable

Summary

Add transport-level retry controls that go beyond the current readiness policies so transient failures can be handled in a controlled and observable way during workflow execution.

Problem / opportunity

Current readiness policies do not cover the broader retry behavior needed for unstable networks, rate-limited endpoints, or transient remote faults. Without first-class controls, users must work around this at the workflow level or accept brittle execution.

Requested behavior

Users should be able to configure retry behavior for transport-level failures with clear semantics around attempts, delays, and failure reporting.

Scope

  • In scope: retry controls, transport-failure handling, visibility in execution reporting.
  • Out of scope: silent infinite retries or unrelated orchestration redesign.

Acceptance criteria

  1. Users can configure retry attempts and retry strategy for relevant transport-level failures.
  2. Execution reports show when retries occurred and what the final outcome was.
  3. The feature avoids masking deterministic failures that should fail fast.

Notes

  • Source: local roadmap at .doc/positioning-and-roadmap.md.
  • This issue was created while migrating local backlog ownership to GitHub.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpriority:P1High priorityroadmapTracked from local roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions