feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5) by RaviPidaparthi · Pull Request #46997 · Azure/azure-sdk-for-python

RaviPidaparthi · 2026-05-19T19:12:36Z

Summary

Lights up the durable-task primitive in azure-ai-agentserver-core
2.0.0b6 (and the matching invocations-protocol sample suite in
azure-ai-agentserver-invocations 1.0.0b5) as a new feature.

The durable-task primitive is a small decorator-driven API that lets a
hosted agent run long operations as named tasks that survive
process crashes, OOM kills, and container redeployments. Tasks pick up
exactly where they were after recovery, without the developer writing
any explicit checkpoint or replay code.

Full developer guide:
sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md.

Scope of THIS PR

azure-ai-agentserver-core — full durable-task primitive shipping
for the first time (no prior release of the primitive).
azure-ai-agentserver-invocations — matching durable sample
suite (durable_copilot, durable_multiturn, durable_langgraph,
durable_research) demonstrating the primitive end-to-end on the
invocations transport. Plus per-sample READMEs, a SHIPPABLE.md
manifest, a cross-sample DURABLE_SAMPLES.md operational guide, and
a CI gate (test_samples_shippable_bar.py) that enforces the
per-sample shippable bar on every PR.

Out of scope of this PR (split into separate PRs)

azure-ai-agentserver-responses durable orchestration
→ see PR for branch feature/agentserver-responses-spec016
durable-agent-demo azd-deployable hosted-agent sample
→ see PR for branch feature/agentserver-durable-agent-demo
(temporary; never-merged demo sample)

What the primitive ships

Tour:

from azure.ai.agentserver.core.durable import task

@task(name="long_research")
async def do_research(ctx, prompt: str) -> dict:
    if ctx.entry_mode == "recovered":
        # Pick up from where you were, using ctx.metadata
        ...
    await ctx.stream({"phase": "searching"})
    ...
    return {"summary": "..."}

# In your handler:
run = await do_research.run(task_id="task-123", input={"prompt": "..."})
async for chunk in run:
    ...
result = await run.result()

Concepts shipping

@task(...) decorator + Task returned object with .run(),
.start(), .options(...), .get_active_run(task_id).
TaskContext — entry_mode, input, metadata (with auto-flush
at lifecycle boundaries), cancel (asyncio.Event), cause
booleans timeout_exceeded / cancel_requested, steering signals
pending_input_count / is_steered_turn, shutdown,
retry_attempt, recovery_count. Provides ctx.suspend(output=...),
ctx.stream(chunk), ctx.exit_for_recovery().
TaskResult.status: Literal["completed", "suspended"].
Failure paths surface as exceptions (TaskFailed, TaskCancelled,
TaskConflictError).
TaskConflictError — single error type for any "task is busy / not
available" state (live elsewhere, recovered elsewhere, evicted under
split-brain protection, terminal with queued steerer). Carries
current_status so callers can branch.
RetryPolicy — exponential / fixed / linear backoff presets,
durable across crash and recovery.
EntryMode Literal: "fresh" | "resumed" | "recovered".
Suspended (sentinel for .run() of a suspended task),
TaskStatus Literal, TaskMetadata, StreamHandler,
StreamHandlerFactory, QueueStreamHandler.

Behavior shipping

Automatic recovery — crashed-mid-task records are detected at
three layers (startup scan, periodic background scan, inline reclaim
at scheduling primitives). The developer sees
ctx.entry_mode == "recovered" and otherwise the same TaskContext
surface as on a fresh entry.
Split-brain protection — a new agent process that takes over a
session cancels stranded executions in the previous process cleanly
via HTTP 409 binding_mismatch. The previous process cancels its
execution, suppresses its terminal write, and signals its awaiters
with TaskConflictError.
Steering as plain multi-turn — Task.start(...) on an already-
active steerable task queues the new input. The first turn's
ctx.suspend(...) call resolves the steerer's .result() with the
next turn's outcome.
Per-turn wall-clock durable timeout — @task(timeout=...) is
anchored to a persisted per-turn-start timestamp. A crash mid-turn
does NOT reset the budget; the recovered watchdog computes
remaining budget from the persisted timestamp.
Metadata auto-flush at lifecycle boundaries — ctx.metadata is
flushed automatically at every terminal-of-turn boundary.
Bookkeeping is durable — suspended-resume input patches are
ETag-protected; steerable input data is cleared at the suspend
transition (data minimization); the lease owner string incorporates
both FOUNDRY_AGENT_NAME and session ID so two different agents
sharing a session ID cannot collide on lease ownership.

Transport

HostedTaskProvider is built on azure.core.AsyncPipelineClient
with the standard policy chain (request-id, headers, user-agent,
retry, AsyncBearerTokenCredentialPolicy, task-API logging,
distributed tracing). Retry policy retries on 5xx / 408 / 429 only —
never on 409 regardless of body. ContentDecodePolicy intentionally
excluded; body parsing happens at the call site with defensive
error handling.
httpx is no longer a production dependency.

Validation

Check	Status
Core pylint	10.00/10 (0 new findings vs origin/main)
Core mypy	0 new errors
Core pyright	Pass
Core sphinx	Pass
Invocations pylint	0 new findings vs origin/main
Core tests	439 passed, 6 skipped
Core durable suite	345 passed, 1 skipped
Invocations tests	244 passed, 2 skipped

…urable-tasks # Conflicts: # sdk/agentserver/.gitignore # sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md # sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py # sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_tracing.py # sdk/agentserver/azure-ai-agentserver-core/samples/selfhosted_invocation/selfhosted_invocation.py # sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing_e2e.py # sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md # sdk/agentserver/azure-ai-agentserver-invocations/azure/ai/agentserver/invocations/_invocation.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/conftest.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/test_span_parenting.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing.py

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements spec-009’s “pluggable stream handler” work for the durable task framework by introducing a StreamHandler protocol with a default QueueStreamHandler, plus related durable-task capabilities (retry, resume route, metadata, samples/tests) and extensive formatting/tidying across tests and samples.

Changes:

Added a pluggable streaming abstraction (StreamHandler, QueueStreamHandler, factory type) and wired it into TaskContext.stream() and TaskRun async iteration.
Introduced/expanded durable-task building blocks: TaskResult, RetryPolicy, resume HTTP route, hosted provider client, lease renewal helper, and substantial new test coverage + samples.
Updated docs/changelogs and reformatted various tests/samples for style consistency.

Reviewed changes

Copilot reviewed 88 out of 92 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing_e2e.py	Formatting-only adjustments (line wrapping/blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_session_id.py	Formatting-only adjustments (blank lines, wrapped AsyncClient context).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_server_routes.py	Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_limits.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_id.py	Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_multimodal_protocol.py	Minor whitespace cleanup and section spacing.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_invoke.py	Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_graceful_shutdown.py	Formatting + wrapped long asserts for readability.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_get_cancel.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_edge_cases.py	Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_decorator_pattern.py	Formatting (wrapped JSONResponse returns).
sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py	Reformatted token list for readability.
sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py	Formatting; JSONResponse construction wrapped.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py	New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt	New sample requirements.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py	New durable multiturn sample host wiring.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py	New durable multiturn sample agent task.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py	New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt	New sample requirements (LangGraph + deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py	New streaming + steering durable LangGraph host sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py	New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt	New sample requirements (Copilot SDK, core, Starlette, uvicorn).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py	New durable Copilot host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py	New steerable durable Copilot agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/store.py	New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/requirements.txt	New sample requirements (Anthropic SDK + runtime deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/app.py	New durable Claude host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/agent.py	New steerable durable Claude agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py	Formatting-only adjustments (wrapped JSON dict literals).
sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md	Changelog updates to mention durable samples + dependency bump.
sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing.py	Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/tests/test_startup_logging.py	Formatting-only adjustments and wrapped long lines.
sdk/agentserver/azure-ai-agentserver-core/tests/test_server_routes.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_logger.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_graceful_shutdown.py	Formatting-only adjustments and wrapped long asserts.
sdk/agentserver/azure-ai-agentserver-core/tests/test_config.py	Formatting for long function signatures.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_task_result.py	New tests for `TaskResult` wrapper behavior + guardrails.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_streaming.py	New tests for pluggable stream handler integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_source.py	New tests exercising `source` field persistence.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_retry.py	New tests for `RetryPolicy` and retry integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_resume_route.py	New tests for the resume HTTP route behavior.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_models.py	New tests for durable models/exceptions.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_metadata.py	New tests for dict-like `TaskMetadata` + flush semantics.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_local_provider.py	New tests for local durable provider CRUD/listing.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_lifecycle.py	New lifecycle automation tests.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_get.py	New tests for `DurableTask.get()`.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_entry_mode.py	New tests for `ctx.entry_mode` across paths.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_decorator.py	New tests for `@durable_task` decorator/options/type extraction.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_cancellation_timeout.py	New tests for cancellation, timeout, and termination.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_callable_factories.py	New tests for callable factories on tags/description.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/init.py	New package init for durable tests.
sdk/agentserver/azure-ai-agentserver-core/tests/conftest.py	Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/requirements.txt	New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py	New sample demonstrating streaming with durable tasks.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/requirements.txt	New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/durable_source.py	New sample demonstrating `source` usage.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/requirements.txt	New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/durable_retry.py	New sample demonstrating retry policies.
sdk/agentserver/azure-ai-agentserver-core/pyproject.toml	Added `httpx` dependency + optional `hosted` extras (azure-identity).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_stream.py	New StreamHandler protocol + default QueueStreamHandler + factory alias.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_run.py	New TaskRun async-iter streaming integration and lifecycle control methods.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_retry.py	New RetryPolicy implementation and presets.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_resume_route.py	New Starlette route for POST /tasks/resume.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_result.py	New TaskResult wrapper class.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_provider.py	New storage provider protocol for durable subsystem.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_metadata.py	New dict-like TaskMetadata with flush/auto-flush.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_lease.py	New lease identity utilities + renewal loop.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_exceptions.py	New durable exception types (failed/suspended/cancelled/etc.).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_context.py	New TaskContext with stream support and lifecycle fields.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_client.py	New hosted durable task provider httpx client.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/init.py	New public durable API exports.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_middleware.py	Formatting-only adjustments for imports/log calls.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_errors.py	Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_config.py	Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/init.py	Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/README.md	Added durable-task documentation section + link.
sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md	Large changelog entry documenting durable subsystem and other changes.
sdk/agentserver/.gitignore	Added `.vscode/` ignore.

Comments suppressed due to low confidence (1)

sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py:1

For JSON persistence, it’s better to write/read with an explicit encoding (UTF-8) for cross-platform consistency. Consider using open(fd, \"w\", encoding=\"utf-8\") (or os.fdopen) and also using read_text(encoding=\"utf-8\") in load() to avoid platform-default encoding surprises.

+        if initial_delay.total_seconds() < 0:
+            raise ValueError(f"initial_delay must be >= 0, got {initial_delay}")
+        if max_attempts < 1 and not (
+            max_attempts == 1 and initial_delay == timedelta(0)
+        ):
+            pass  # allow no_retry preset
+        if backoff_coefficient < 1.0:
+            raise ValueError(
+                f"backoff_coefficient must be >= 1.0, got {backoff_coefficient}"
+            )
+        if max_delay < initial_delay:
+            raise ValueError(
+                f"max_delay ({max_delay}) must be >= initial_delay ({initial_delay})"
+            )
+        if max_attempts < 1:
+            raise ValueError(f"max_attempts must be >= 1, got {max_attempts}")


+        except Exception as exc:
+            if "not found" in str(exc).lower():


+        if task_info.payload and "metadata" in task_info.payload:
+            meta_data: dict[str, Any] = task_info.payload["metadata"]
+            for key, value in meta_data.items():
+                self._metadata.set(key, value)


+            and self._flush_callback is not None
+            and self._flush_task is None
+        ):
+            self._flush_task = asyncio.get_event_loop().create_task(


+    except Exception as exc:  # pylint: disable=broad-exception-caught
+        msg = str(exc).lower()
+        if "not found" in msg:
+            return Response(status_code=404)
+        if "not 'suspended'" in msg or "already" in msg or "conflict" in msg:
+            return Response(status_code=409)
+        logger.error("Resume failed for task %s: %s", task_id, exc, exc_info=True)
+        return Response(status_code=500)


+
+### Breaking Changes
+
+- **`source` parameter removed** — The `source` keyword argument has been removed from `@durable_task()`, `.run()`, `.start()`, and `.options()`. Source provenance is now auto-stamped by the framework and cannot be overridden by developers. Use `tags` for custom metadata.


- Pin aiohttp>=3.9.0,<4.0.0 to prevent pre-release 4.0.0a1 from being pulled by --pre flag (fails to compile on Python 3.13) - Disable mindependency for invocations/responses since azure-ai-agentserver-core>=2.0.0b4 is not yet on PyPI - Disable apistub for core (tool bug with Generic[Input,Output] on 3.10) - Change task API route from /storage/tasks to /internal/tasks - Add durable task overview documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

….0a0 - AgentServerHost lifespan now automatically creates and initializes a DurableTaskManager during startup, and shuts it down on exit. This fixes 'DurableTaskManager not initialized' errors when using @durable_task without manual manager setup. - Pin aiohttp<4.0.0a0 to exclude pre-release 4.0.0a1 which fails to build (missing longintrepr.h) when CI uses --pre flag for nightly builds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Changed HostedDurableTaskProvider base URL from /storage/tasks to /tasks - Task API integration remains disabled (FOUNDRY_TASK_API_ENABLED=0) - Includes all durable demo improvements: 12-stage research pipeline, crash recovery, GET reconnect with file fallback, cancel support, supervisor proxy, and updated README with demo script Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ointer Replace hand-crafted @durable_task checkpoint logic with LangGraph StateGraph and AsyncSqliteSaver. This eliminates FileStreamHandler, manual metadata management, and JSONL-based replay. Key changes: - agent.py: LangGraph StateGraph with looping research_stage node - app.py: Simplified HTTP handlers (no durable task framework imports) - GET handler: replays from checkpoint state instead of JSONL files - Cancel: asyncio.Event checked at node entry - requirements.txt: added langgraph, langgraph-checkpoint-sqlite, aiosqlite - README: updated architecture docs - .env: committed for deployment config All 5 test scenarios pass: - Full 12-stage execution with checkpointing - Already-complete detection on re-invocation - Cancel mid-execution (stops at next node boundary) - Resume after cancel (clears stale cancel flag) - Unknown thread returns None Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…e checkpointer" This reverts commit 4cf120a.