Skip to content

feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5)#46997

Open
RaviPidaparthi wants to merge 267 commits into
mainfrom
feature/agentserver-durable-tasks
Open

feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5)#46997
RaviPidaparthi wants to merge 267 commits into
mainfrom
feature/agentserver-durable-tasks

Conversation

@RaviPidaparthi

@RaviPidaparthi RaviPidaparthi commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

Lights up the durable-task primitive in azure-ai-agentserver-core
2.0.0b6 (and the matching invocations-protocol sample suite in
azure-ai-agentserver-invocations 1.0.0b5) as a new feature.

The durable-task primitive is a small decorator-driven API that lets a
hosted agent run long operations as named tasks that survive
process crashes, OOM kills, and container redeployments. Tasks pick up
exactly where they were after recovery, without the developer writing
any explicit checkpoint or replay code.

Full developer guide:
sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md.

Scope of THIS PR

  • azure-ai-agentserver-core — full durable-task primitive shipping
    for the first time (no prior release of the primitive).
  • azure-ai-agentserver-invocations — matching durable sample
    suite (durable_copilot, durable_multiturn, durable_langgraph,
    durable_research) demonstrating the primitive end-to-end on the
    invocations transport. Plus per-sample READMEs, a SHIPPABLE.md
    manifest, a cross-sample DURABLE_SAMPLES.md operational guide, and
    a CI gate (test_samples_shippable_bar.py) that enforces the
    per-sample shippable bar on every PR.

Out of scope of this PR (split into separate PRs)

  • azure-ai-agentserver-responses durable orchestration
    → see PR for branch feature/agentserver-responses-spec016
  • durable-agent-demo azd-deployable hosted-agent sample
    → see PR for branch feature/agentserver-durable-agent-demo
    (temporary; never-merged demo sample)

What the primitive ships

Tour:

from azure.ai.agentserver.core.durable import task

@task(name="long_research")
async def do_research(ctx, prompt: str) -> dict:
    if ctx.entry_mode == "recovered":
        # Pick up from where you were, using ctx.metadata
        ...
    await ctx.stream({"phase": "searching"})
    ...
    return {"summary": "..."}

# In your handler:
run = await do_research.run(task_id="task-123", input={"prompt": "..."})
async for chunk in run:
    ...
result = await run.result()

Concepts shipping

  • @task(...) decorator + Task returned object with .run(),
    .start(), .options(...), .get_active_run(task_id).
  • TaskContextentry_mode, input, metadata (with auto-flush
    at lifecycle boundaries), cancel (asyncio.Event), cause
    booleans timeout_exceeded / cancel_requested, steering signals
    pending_input_count / is_steered_turn, shutdown,
    retry_attempt, recovery_count. Provides ctx.suspend(output=...),
    ctx.stream(chunk), ctx.exit_for_recovery().
  • TaskResult.status: Literal["completed", "suspended"].
    Failure paths surface as exceptions (TaskFailed, TaskCancelled,
    TaskConflictError).
  • TaskConflictError — single error type for any "task is busy / not
    available" state (live elsewhere, recovered elsewhere, evicted under
    split-brain protection, terminal with queued steerer). Carries
    current_status so callers can branch.
  • RetryPolicy — exponential / fixed / linear backoff presets,
    durable across crash and recovery.
  • EntryMode Literal: "fresh" | "resumed" | "recovered".
  • Suspended (sentinel for .run() of a suspended task),
    TaskStatus Literal, TaskMetadata, StreamHandler,
    StreamHandlerFactory, QueueStreamHandler.

Behavior shipping

  • Automatic recovery — crashed-mid-task records are detected at
    three layers (startup scan, periodic background scan, inline reclaim
    at scheduling primitives). The developer sees
    ctx.entry_mode == "recovered" and otherwise the same TaskContext
    surface as on a fresh entry.
  • Split-brain protection — a new agent process that takes over a
    session cancels stranded executions in the previous process cleanly
    via HTTP 409 binding_mismatch. The previous process cancels its
    execution, suppresses its terminal write, and signals its awaiters
    with TaskConflictError.
  • Steering as plain multi-turnTask.start(...) on an already-
    active steerable task queues the new input. The first turn's
    ctx.suspend(...) call resolves the steerer's .result() with the
    next turn's outcome.
  • Per-turn wall-clock durable timeout@task(timeout=...) is
    anchored to a persisted per-turn-start timestamp. A crash mid-turn
    does NOT reset the budget; the recovered watchdog computes
    remaining budget from the persisted timestamp.
  • Metadata auto-flush at lifecycle boundariesctx.metadata is
    flushed automatically at every terminal-of-turn boundary.
  • Bookkeeping is durable — suspended-resume input patches are
    ETag-protected; steerable input data is cleared at the suspend
    transition (data minimization); the lease owner string incorporates
    both FOUNDRY_AGENT_NAME and session ID so two different agents
    sharing a session ID cannot collide on lease ownership.

Transport

  • HostedTaskProvider is built on azure.core.AsyncPipelineClient
    with the standard policy chain (request-id, headers, user-agent,
    retry, AsyncBearerTokenCredentialPolicy, task-API logging,
    distributed tracing). Retry policy retries on 5xx / 408 / 429 only —
    never on 409 regardless of body. ContentDecodePolicy intentionally
    excluded; body parsing happens at the call site with defensive
    error handling.
  • httpx is no longer a production dependency.

Validation

Check Status
Core pylint 10.00/10 (0 new findings vs origin/main)
Core mypy 0 new errors
Core pyright Pass
Core sphinx Pass
Invocations pylint 0 new findings vs origin/main
Core tests 439 passed, 6 skipped
Core durable suite 345 passed, 1 skipped
Invocations tests 244 passed, 2 skipped

…urable-tasks

# Conflicts:
#	sdk/agentserver/.gitignore
#	sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md
#	sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py
#	sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_tracing.py
#	sdk/agentserver/azure-ai-agentserver-core/samples/selfhosted_invocation/selfhosted_invocation.py
#	sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing_e2e.py
#	sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md
#	sdk/agentserver/azure-ai-agentserver-invocations/azure/ai/agentserver/invocations/_invocation.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/conftest.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/test_span_parenting.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing.py
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label May 19, 2026
@RaviPidaparthi RaviPidaparthi marked this pull request as ready for review May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi requested a review from ankitbko as a code owner May 19, 2026 19:21
Copilot AI review requested due to automatic review settings May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi requested a review from vangarp as a code owner May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi changed the title feat(agentserver): pluggable StreamHandler protocol and stream_handler_factory feat(agentserver): Durable Task Framework for azure-ai-agentserver-core May 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements spec-009’s “pluggable stream handler” work for the durable task framework by introducing a StreamHandler protocol with a default QueueStreamHandler, plus related durable-task capabilities (retry, resume route, metadata, samples/tests) and extensive formatting/tidying across tests and samples.

Changes:

  • Added a pluggable streaming abstraction (StreamHandler, QueueStreamHandler, factory type) and wired it into TaskContext.stream() and TaskRun async iteration.
  • Introduced/expanded durable-task building blocks: TaskResult, RetryPolicy, resume HTTP route, hosted provider client, lease renewal helper, and substantial new test coverage + samples.
  • Updated docs/changelogs and reformatted various tests/samples for style consistency.

Reviewed changes

Copilot reviewed 88 out of 92 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing_e2e.py Formatting-only adjustments (line wrapping/blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_session_id.py Formatting-only adjustments (blank lines, wrapped AsyncClient context).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_server_routes.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_limits.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_id.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_multimodal_protocol.py Minor whitespace cleanup and section spacing.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_invoke.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_graceful_shutdown.py Formatting + wrapped long asserts for readability.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_get_cancel.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_edge_cases.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_decorator_pattern.py Formatting (wrapped JSONResponse returns).
sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py Reformatted token list for readability.
sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py Formatting; JSONResponse construction wrapped.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt New sample requirements.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py New durable multiturn sample host wiring.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py New durable multiturn sample agent task.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt New sample requirements (LangGraph + deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py New streaming + steering durable LangGraph host sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt New sample requirements (Copilot SDK, core, Starlette, uvicorn).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py New durable Copilot host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py New steerable durable Copilot agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/requirements.txt New sample requirements (Anthropic SDK + runtime deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/app.py New durable Claude host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/agent.py New steerable durable Claude agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py Formatting-only adjustments (wrapped JSON dict literals).
sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md Changelog updates to mention durable samples + dependency bump.
sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/tests/test_startup_logging.py Formatting-only adjustments and wrapped long lines.
sdk/agentserver/azure-ai-agentserver-core/tests/test_server_routes.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_logger.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_graceful_shutdown.py Formatting-only adjustments and wrapped long asserts.
sdk/agentserver/azure-ai-agentserver-core/tests/test_config.py Formatting for long function signatures.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_task_result.py New tests for TaskResult wrapper behavior + guardrails.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_streaming.py New tests for pluggable stream handler integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_source.py New tests exercising source field persistence.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_retry.py New tests for RetryPolicy and retry integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_resume_route.py New tests for the resume HTTP route behavior.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_models.py New tests for durable models/exceptions.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_metadata.py New tests for dict-like TaskMetadata + flush semantics.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_local_provider.py New tests for local durable provider CRUD/listing.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_lifecycle.py New lifecycle automation tests.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_get.py New tests for DurableTask.get().
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_entry_mode.py New tests for ctx.entry_mode across paths.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_decorator.py New tests for @durable_task decorator/options/type extraction.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_cancellation_timeout.py New tests for cancellation, timeout, and termination.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_callable_factories.py New tests for callable factories on tags/description.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/init.py New package init for durable tests.
sdk/agentserver/azure-ai-agentserver-core/tests/conftest.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py New sample demonstrating streaming with durable tasks.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/durable_source.py New sample demonstrating source usage.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/durable_retry.py New sample demonstrating retry policies.
sdk/agentserver/azure-ai-agentserver-core/pyproject.toml Added httpx dependency + optional hosted extras (azure-identity).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_stream.py New StreamHandler protocol + default QueueStreamHandler + factory alias.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_run.py New TaskRun async-iter streaming integration and lifecycle control methods.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_retry.py New RetryPolicy implementation and presets.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_resume_route.py New Starlette route for POST /tasks/resume.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_result.py New TaskResult wrapper class.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_provider.py New storage provider protocol for durable subsystem.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_metadata.py New dict-like TaskMetadata with flush/auto-flush.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_lease.py New lease identity utilities + renewal loop.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_exceptions.py New durable exception types (failed/suspended/cancelled/etc.).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_context.py New TaskContext with stream support and lifecycle fields.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_client.py New hosted durable task provider httpx client.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/init.py New public durable API exports.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_middleware.py Formatting-only adjustments for imports/log calls.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_errors.py Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_config.py Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/init.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/README.md Added durable-task documentation section + link.
sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md Large changelog entry documenting durable subsystem and other changes.
sdk/agentserver/.gitignore Added .vscode/ ignore.
Comments suppressed due to low confidence (1)

sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py:1

  • For JSON persistence, it’s better to write/read with an explicit encoding (UTF-8) for cross-platform consistency. Consider using open(fd, \"w\", encoding=\"utf-8\") (or os.fdopen) and also using read_text(encoding=\"utf-8\") in load() to avoid platform-default encoding surprises.

Comment on lines +57 to +72
if initial_delay.total_seconds() < 0:
raise ValueError(f"initial_delay must be >= 0, got {initial_delay}")
if max_attempts < 1 and not (
max_attempts == 1 and initial_delay == timedelta(0)
):
pass # allow no_retry preset
if backoff_coefficient < 1.0:
raise ValueError(
f"backoff_coefficient must be >= 1.0, got {backoff_coefficient}"
)
if max_delay < initial_delay:
raise ValueError(
f"max_delay ({max_delay}) must be >= initial_delay ({initial_delay})"
)
if max_attempts < 1:
raise ValueError(f"max_attempts must be >= 1, got {max_attempts}")
Comment on lines +191 to +192
except Exception as exc:
if "not found" in str(exc).lower():
Comment on lines +210 to +213
if task_info.payload and "metadata" in task_info.payload:
meta_data: dict[str, Any] = task_info.payload["metadata"]
for key, value in meta_data.items():
self._metadata.set(key, value)
and self._flush_callback is not None
and self._flush_task is None
):
self._flush_task = asyncio.get_event_loop().create_task(
Comment on lines +60 to +67
except Exception as exc: # pylint: disable=broad-exception-caught
msg = str(exc).lower()
if "not found" in msg:
return Response(status_code=404)
if "not 'suspended'" in msg or "already" in msg or "conflict" in msg:
return Response(status_code=409)
logger.error("Resume failed for task %s: %s", task_id, exc, exc_info=True)
return Response(status_code=500)

### Breaking Changes

- **`source` parameter removed** — The `source` keyword argument has been removed from `@durable_task()`, `.run()`, `.start()`, and `.options()`. Source provenance is now auto-stamped by the framework and cannot be overridden by developers. Use `tags` for custom metadata.
@RaviPidaparthi RaviPidaparthi changed the title feat(agentserver): Durable Task Framework for azure-ai-agentserver-core feat(agentserver): Durable Tasks for azure-ai-agentserver-core May 19, 2026
RaviPidaparthi and others added 20 commits May 19, 2026 21:48
- Pin aiohttp>=3.9.0,<4.0.0 to prevent pre-release 4.0.0a1 from being
  pulled by --pre flag (fails to compile on Python 3.13)
- Disable mindependency for invocations/responses since
  azure-ai-agentserver-core>=2.0.0b4 is not yet on PyPI
- Disable apistub for core (tool bug with Generic[Input,Output] on 3.10)
- Change task API route from /storage/tasks to /internal/tasks
- Add durable task overview documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
….0a0

- AgentServerHost lifespan now automatically creates and initializes
  a DurableTaskManager during startup, and shuts it down on exit.
  This fixes 'DurableTaskManager not initialized' errors when using
  @durable_task without manual manager setup.

- Pin aiohttp<4.0.0a0 to exclude pre-release 4.0.0a1 which fails to
  build (missing longintrepr.h) when CI uses --pre flag for nightly
  builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed HostedDurableTaskProvider base URL from /storage/tasks to /tasks
- Task API integration remains disabled (FOUNDRY_TASK_API_ENABLED=0)
- Includes all durable demo improvements: 12-stage research pipeline,
  crash recovery, GET reconnect with file fallback, cancel support,
  supervisor proxy, and updated README with demo script

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ointer

Replace hand-crafted @durable_task checkpoint logic with LangGraph StateGraph
and AsyncSqliteSaver. This eliminates FileStreamHandler, manual metadata
management, and JSONL-based replay.

Key changes:
- agent.py: LangGraph StateGraph with looping research_stage node
- app.py: Simplified HTTP handlers (no durable task framework imports)
- GET handler: replays from checkpoint state instead of JSONL files
- Cancel: asyncio.Event checked at node entry
- requirements.txt: added langgraph, langgraph-checkpoint-sqlite, aiosqlite
- README: updated architecture docs
- .env: committed for deployment config

All 5 test scenarios pass:
- Full 12-stage execution with checkpointing
- Already-complete detection on re-invocation
- Cancel mid-execution (stops at next node boundary)
- Resume after cancel (clears stale cancel flag)
- Unknown thread returns None

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- POST handler returns 202 immediately (fire-and-forget) with
  invocation_id, session_id, task_id in response body
- GET handler streams SSE with sequential id: N on each event
- Supports last_event_id query param to skip already-seen events
  on reconnect (platform strips non x-client- headers)
- Crash handler returns 202 then exits asynchronously
- Session ID resolution simplified to use framework config
- Demo client (demo-client.sh): POST→GET flow, client-side skip,
  LAST_EVENT_ID tracking, logs command for 3rd terminal
- Verified live: crash mid-stage-5 → reconnect resumes from stage 5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… architecture

- Document POST 202 fire-and-forget flow
- Document GET streaming with last_event_id query param
- Add container logs section (azd ai agent monitor)
- Update manual curl demo steps
- Add 'How it works (client flow)' section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…veness

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When reusing a session for a new task, the stale LAST_EVENT_ID from
the previous run caused the client to skip the first stage header.
Now resets to 0 when POST returns status='started'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n=v1

- task_id = invocation_id (no stale stream file reuse across sessions)
- Reconnect does GET-only with last_event_id (no redundant POST)
- Switch from api-version=2025-11-15-preview to v1
- Add Foundry-Features: HostedAgents=V1Preview header
- Print exact method + endpoint on each call for debugging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- durable_task → task (decorator)
- DurableTask → Task (wrapper class)
- DurableTaskOptions → TaskOptions (config)
- DurableTaskManager → TaskManager
- DurableTaskProvider → TaskProvider
- HostedDurableTaskProvider → HostedTaskProvider
- LocalFileDurableTaskProvider → LocalFileTaskProvider
- _durable_task_ tag prefix → _task_
- agentserver.durable_task source type → agentserver.task
- LIST API: server-side source_type filter (replaces client-side)
- LIST API: cursor-based pagination (limit=100, has_more/after)
- Dockerfile: FOUNDRY_TASK_API_ENABLED=1

All 189 tests pass. E2E validated with 2 crash recoveries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RaviPidaparthi and others added 14 commits June 15, 2026 20:56
…pagation fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…agation on cancel + get

Adds 3 tests pinning the contract that the cancel and get invocation
endpoints expose agent_session_id as request.state.session_id (mirror
of the invoke endpoint, fixed in 57ca8f7):

1. test_cancel_propagates_agent_session_id_to_request_state
   - POST /invocations/{id}/cancel?agent_session_id=X
   - Asserts request.state.session_id == "X" inside a custom cancel
     handler.

2. test_get_propagates_agent_session_id_to_request_state
   - GET /invocations/{id}?agent_session_id=X
   - Same assertion for the get endpoint's handler.

3. test_cancel_without_agent_session_id_leaves_request_state_session_id_empty
   - Pins the explicit-empty-string contract when the query param is
     absent (handlers can branch on falsy without an AttributeError).

Verified by reverting the framework fix locally — all 3 tests fail
with AttributeError: 'State' object has no attribute 'session_id'.
With the fix re-applied, all 10 cancel/get tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…var per invocation protocol spec

Per `invocation-protocol-spec.md` §1.2 (GET) and §1.3 (cancel),
neither endpoint has a platform-defined `agent_session_id` query
parameter. The session is implicit: sourced from the
`FOUNDRY_AGENT_SESSION_ID` environment variable the platform sets on
the container (surfaced via `self.config.session_id`).

The previous one-line fix (commit 57ca8f7) read the query param
and stamped it on `request.state.session_id`. That worked for local
dev where the demo's curl loop passes `agent_session_id` as a query
param, but in the hosted contract the platform never sends a query
param, so `request.state.session_id` resolved to the empty string and
the demo's cancel handler had to depend on its own
`app.config.session_id` fallback — defeating the purpose of having
the framework propagate it.

The fix here matches the invoke endpoint's source-precedence:
caller-provided query param wins (callers may still pass one — the
spec forwards any non-platform-defined query params transparently),
env var falls back (the hosted default), empty string when both
absent. Custom cancel/get handlers can now uniformly read
`request.state.session_id` without owning the fallback ladder.

Test coverage (`test_get_cancel.py`):
- `test_cancel_propagates_session_id_from_env_var` (hosted default)
- `test_get_propagates_session_id_from_env_var` (hosted default)
- `test_cancel_caller_query_param_overrides_env_var` (precedence)
- `test_cancel_without_env_var_or_query_param_yields_empty_session_id`
  (graceful no-op)

Verified by reverting the framework change locally: all 4 fail; with
the fix re-applied, all 11 cancel/get tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n_id source for cancel/get

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anch

The checked-in preview wheels were sitting at sdk/agentserver/wheels/
on the durable-tasks branch even though only the durable-agent-demo
sample consumes them (via its build.sh that copies into the docker
build context). Wheels move to the demo branch where they actually
belong, and where they can also bundle the responses package alongside
core + invocations (the demo's docker image is the only consumer of
this wheels directory).

durable-tasks branch is now wheels-free. Downstream consumers that
need the preview wheels should base off the demo branch where the
build-wheels.sh script + checked-in *.whl files live.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The two standalone "AI-coding-agent skill" files (durable-task-skill.md,
streaming-skill.md) move to the demo branch alongside the preview
wheels they pair with. Their long-form developer guides
(durable-task-guide.md, streaming-guide.md, task-and-streaming-spec.md)
stay here in the core package's docs — those are SOT reference
documentation tied to the package surface, not portable
copy-into-your-project artifacts like the skills.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cold-start recovery scan (_recover_stale_tasks) reclaimed a stale
in_progress task via a direct provider.update that discarded the
post-reclaim record and pre-tracked the stale scan etag. The lease
renewal heartbeat (routed through _provider_update_locked with
force_if_match=True) then kept sending the stale pre-reclaim etag. Both
the LocalFileTaskProvider and the hosted task API enforce If-Match
strictly, so the first heartbeat (~one lease half-life, 30s) 412'd and
the renewal loop cancelled the recovered execution as 'lost ownership',
truncating every crash recovery.

Fix: route the scan reclaim through _reclaim_one (which now returns the
post-reclaim TaskInfo and refreshes the tracked etag via
_provider_update_locked), and adopt that record so (a) the heartbeat's
tracked etag matches the store and (b) _start_existing_task sees the
post-reclaim lease generation/instance. Also have
_steering_cleanup_orphan_attachments return its updated record so the
reclaim carries the current etag when cleanup wrote, and correct the
stale _reclaim_one docstring that claimed the local provider ignores
if_match.

Adds tests/durable/test_recovery_lease_etag.py (RED without the fix).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Backport the spec-024 storage-path unification (.durable-tasks ->
.durable via resolve_durable_subdir) to the durable_research
invocation sample on the base branch, so the invocation samples match
across the durable-tasks -> responses-spec016 -> demo stack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…te-serialization (spec 031)

Behavioral tests through real framework wiring (no mocks): pending_input_count
must reflect the live queued-steering count at the cancel boundary (SOT §13
ordering invariant), and a steer+drain must run the steered turn with no blind
PATCHes (SOT §25.1). The pending_input_count test lands RED (count==0, the
observed defect) ahead of the fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pec 031)

Conformance-restoration: the implementation had drifted from the SOT spec
(docs/task-and-streaming-spec.md). Fixes:

- pending_input_count (FR-001/002, SOT §12/§13): add settable _ActiveTask slot
  _pending_input_count; record it BEFORE ctx.cancel in the same-process enqueue
  and both cross-process steering polls (§13 ordering invariant); reset to the
  remaining backlog on drain. Previously read a never-written attribute -> 0.
- No blind writes (FR-005b, SOT §25.1): the queued-steering-cancel path used a
  bare provider.update with no If-Match (lost-update risk) -> now routes through
  the lock-held update primitive carrying the tracked etag.
- Lock-held update primitive (FR-005a): extract _provider_update_lock_held for
  callers that already hold the non-reentrant per-task lock; _provider_update_locked
  delegates to it.
- Drain read-inside-lock (FR-005, SOT §25.2): the steering drain now performs its
  record read + compute + write under one per-task lock hold (read was lock-free
  before, letting the lease heartbeat invalidate the pinned etag and starve the
  retry budget). Cross-process conflicts retry outside the lock.
- SOT reconciled: §12/§13 (pending_input_count is in-process observed state,
  failure-tolerant, set-before-cancel), §25.1 (no blind writes), §25.2
  (read-inside-lock rationale + two-helper split).

Validated: 646 durable + 746 durable+streaming green; responses consumer green
(2 pre-existing unrelated failures). Tests from prior commit now GREEN.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ry, provider parity

Completes the remaining spec-031 conformance work:
- Drain recovers from cross-process etag conflict (FR-006/SOT §25.3): the drain
  now also catches EtagConflict (consistent with the terminal-write + metadata
  paths) so a conflict landing on its write re-reads + retries instead of being
  abandoned by the caller's bare except. RED test reproduced the exact hosted
  'Failed to drain steering queue' failure, then GREEN.
- Metadata flush conflict retry (FR-006a/SOT §25.3): re-read + retry (≤5,
  last-write-wins on the namespace slot) instead of swallow-and-defer.
- Local/file-provider hosted-parity assertions (FR-008/009): stale-if_match ->
  hosted-identical etag_mismatch/412 classification; lease-only update bumps the
  etag; two providers over one store reproduce a deterministic cross-process
  conflict + recovery.
- Registered the new behavioral tests in the completeness meta-test (FR-016).
- FR-005c if_match-site audit recorded; FR-012 streams conformance confirmed
  already-comprehensive by audit; FR-018 guide already consistent.

Validated: 651 durable + 755 durable+streaming green; pylint unchanged (9.59/10).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n bug (spec 031)

Hosted re-test of the invocations steering demo revealed the real drain failure
is NOT an etag race but a lease-renewal-on-suspended rejection: the multi-turn
turn writes status=suspended before the drain runs, and the hosted task store
rejects the drain PATCH's lease-renewal piggyback with 'lease renewal is only
supported for in_progress tasks'. The local provider was too permissive (allowed
renewal on {in_progress, suspended}), masking it. This commit tightens the local
provider to in_progress-only (hosted parity) + adds the drain status-transition
regression test, landing the bug RED.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…red turn runs (spec 031)

The drain promotes a queued steering input to a NEW turn, but the record was
left 'suspended' by the just-ended turn. The drain's PATCH now sets
status='in_progress' (a valid suspended->in_progress claim that the hosted store
accepts WITH lease params), instead of a bare lease renewal that the store
rejects on a suspended task. Validated on a deployed hosted agent (rapida-0687):
the steered turn now runs the new topic end-to-end; logs show 'Steering drain:
drained next input' with no 'Failed to drain'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…se renewal requires in_progress (spec 031)

Corrects §52 step 14 (drain Phase-1 PATCH MUST set status='in_progress') and
adds the §25.4 rule that the lease-extension trio is a renewal only on an
already-in_progress record, or a claim when the same PATCH flips to in_progress.
This is the doc-level root of the hosted 'Failed to drain steering queue' bug.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RaviPidaparthi and others added 11 commits June 19, 2026 00:04
…ion (data loss)

_compact_on_disk() rewrote the log to a temp file and os.replace()'d it over
self._path, but never reopened self._file — which still pointed at the old,
now-unlinked inode. Every emit()/close() after the first compaction (once the
eviction interval is crossed) therefore wrote to the orphaned inode and was
LOST on the next process lifetime, and the single-writer flock was held on the
dead inode. This breaks the persist-before-publish crash-recovery contract the
file-backed replay backing exists to provide.

Reopen self._file against the live path and re-acquire the flock after the
swap (open+lock the new handle before closing the old so the single-writer
guarantee is never released). Adds regression tests asserting post-compaction
emit + terminal persist to the live file and survive rehydrate (RED before fix).

Found via code review of the durable-tasks branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…yers

Spec 033 Phase 4 (FR-007) core side — promote the internals that the composed
protocol packages (responses/invocations) reach for, so they stop importing core
underscore-private modules:

- TaskRun.is_queued — public queued-steering-input predicate (replaces reaching
  TaskRun._queued_cancel_callback). Narrowly documented; SOT task-and-streaming-spec
  §35 + durable-task-guide §4.6 updated.
- azure.ai.agentserver.core.platform_headers — public re-export of the shared
  platform HTTP/wire-contract constants (header names, isolation keys, error tags).
- core.read_request_id(scope) — public ASGI-scope helper so callers read the
  middleware-resolved request id without depending on REQUEST_ID_STATE_KEY.
- Genericize the durable metadata docstring (drop the named '_responses' wrapper
  example — core must not name a wrapper layer).

RED-first tests: test_public_contract_surface.py (module + helper), TaskRun shape +
behavioral is_queued (queued vs fresh) in test_steering.py. Full core suite green
(866 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e 1)

De-brand the 'durable/durability' terminology (collides with Microsoft Durable
Functions / DTF / DTS) to 'resilient/resilience (crash resilience)'. Pure rename /
reframe — zero behaviour change (core suite: 866 passed, 5 skipped, unchanged).

- core/durable/ subpackage -> core/tasks/ (the broad task primitive: long-running,
  multi-turn, steering, retry AND resilience); tests/durable/ -> tests/tasks/.
- docs/durable-task-guide.md -> docs/tasks-guide.md.
- storage_paths: AGENTSERVER_DURABLE_ROOT -> AGENTSERVER_STATE_ROOT, .durable ->
  .agentserver, resolve_durable_* -> resolve_state_*, DurableSubdir -> StateSubdir
  (storage is 'state', not 'resilient'; nothing shipped, no compat needed).
- word swaps durability->resilience / durable->resilient across source, tests,
  docs, README, CHANGELOG; module-path string literals -> tasks.
- removed stale tracked .test-runs/ generated fixtures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Persisted-artifact language must use "persistent/stored/state", never
"resilient" (which denotes the crash-survival property). Corrects three
storage-sense mistranslations introduced by the durable->resilient word swap:
- _manager.py: "NOT resiliently persisted" -> "NOT persisted"; "previously-
  existing resilient backup write" -> "...backup write".
- task-and-streaming-spec.md: "abstraction over the resilient store" ->
  "...task store"; pseudo-code comment "resets retry budget resiliently" ->
  "resets retry budget".
- test_file_backed_replay_event_stream.py docstring: "resiliently persisted"
  -> "persisted".

Property-sense "resiliently" (task runs/continues resiliently) retained.
Comment/docstring-only; zero behaviour change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Phase 4)

Behaviour-preserving governance-doc reframe to de-brand "durable/durability".
- Principle X "Durability Contract Conformance" -> "Resilience Contract
  Conformance"; Principle XII (core primitive) prose reframed to task/resilience
  vocabulary.
- Repoint path references to the renamed files: resilience-contract.md,
  tests/e2e/resilience_contract/, core/tasks/, tests/tasks/,
  resilient-responses-developer-guide.md, resilient_background.
- Version 1.6.0 -> 1.7.0 (renamed principle).

The Constitution is the contract-of-record and must reference no stale name
(FR-034-7): zero durable residue.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… (Spec 034)

The blind durable->resilient word swap turned the real Microsoft product
"Azure Durable Functions" into a non-existent "Resilient Functions" in three
"use another workflow engine instead" references. Drop it, keeping the other
real examples (Temporal, Celery, Orleans) and never naming the competing
product. Comment/docstring-only; zero behaviour change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The durable->resilient word swap's `.durable`->`.agentserver` storage-dir rule
over-matched the logger namespace `azure.ai.agentserver.durable`, producing the
doubled, unintended `azure.ai.agentserver.agentserver`. Correct it to
`azure.ai.agentserver.tasks` (mirroring the core.tasks subpackage) across 8
source loggers + the `.taskapi` child, the C-OBS-2 doc clause, and the matching
test assertion. Loggers stay children of `azure.ai.agentserver`, so propagation
was unaffected and suites passed — caught by the Principle XIII final review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… core branch (Spec 034)

Per the branch-ownership model (invocations samples are owned by the core
branch and reach the demo branch transitively via core->responses->demo), the
invocations samples + tests reframe — previously applied only on the demo
branch — now originates here on core:

- Rename durable_{copilot,langgraph,multiturn,research} -> resilient_*
  (+ e2e/test_durable_* -> test_resilient_*, test_durable_samples_structure
  -> test_resilient_samples_structure); content reframed durable->resilient.
- Sample run UX: guarded sibling imports so each runs directly with
  `python app.py` from its own dir (-m still works); requirements install the
  in-repo preview packages via editable relative paths (core preview isn't on
  PyPI).
- Storage-prose fixes (durable storage -> persistent storage) in
  async_invoke_agent / multiturn_invoke_agent / _crash_harness; CHANGELOG.

Files are byte-identical to the demo branch's versions, so the forward merge
stays conflict-free. invocations suite: 214 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…"running")

A new conversation turn whose text repeated a previous turn would hang forever
in "running": the content-based dedup (compare against the last UserMessage in
Copilot's history) wrongly decided the message was "already sent", skipped
session.send, then waited for a SessionIdle that never fires (no new turn was
started).

Fix: dedup by invocation identity, not content. Persist the Copilot messageId
returned by session.send under the invocation_id as a durable "already
delivered" marker. Only a crash-recovered invocation (same invocation_id,
marker present) skips the re-send; a brand-new turn (resumed) gets a fresh
invocation_id with no marker and therefore always sends — even with identical
text. On the recovery skip-path, if the turn already completed upstream, return
that reply instead of waiting for an idle that won't come.

Also add debug-friendly logging (per-event at DEBUG; send/idle/turn-resolution
milestones at INFO) — the absence of these made this hard to diagnose.

Reproduced + verified end-to-end against the live GitHub Copilot CLI: sending
the same message twice to one session now completes both turns; a recovered
in-flight turn resumes and completes; suite unchanged (214 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…copilot dedup

Corrects the previous dedup fix. The marker approach (persist the send()
messageId and skip on its presence) was wrong for crash recovery: send()
returning does NOT guarantee Copilot persisted the message, so a crash between
send() and Copilot's persistence would leave the marker set, skip the resend,
and hang — the message never reached Copilot. (Also verified send()'s messageId
does not correlate to any get_messages() event id, so it can't be used to probe
persistence.)

New approach keeps Copilot's persisted history as the source of truth, but
scopes that content check to crash recovery only:
  copilot_has_message = entry_mode == "recovered" and <message is last UserMessage>
- Fresh / resumed *new* turns always send (fixes the original hang where a
  repeated-text turn was wrongly deduped).
- On recovery, if Copilot's history lacks the message (Copilot never persisted
  it), we RESEND; if it has it and the turn already finished, we return that
  reply instead of waiting for a SessionIdle that won't fire.

Removed the messageId marker; restored the history-based check. Kept the
debug/INFO logging. Basic flow + new-turn case verified live; suite 214 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants