demo(agentserver): TEMPORARY - durable-agent-demo (split out of #46997, never-merged)#47276
Draft
RaviPidaparthi wants to merge 265 commits into
Draft
demo(agentserver): TEMPORARY - durable-agent-demo (split out of #46997, never-merged)#47276RaviPidaparthi wants to merge 265 commits into
RaviPidaparthi wants to merge 265 commits into
Conversation
I added a demo-local pyrightconfig.json earlier in this session to
work around an IDE squiggle. Root cause was much simpler: the venv
just had an OLD wheel (2.0.0b4) cached from way back. Reinstalling
the new 2.0.0b6 wheel (which has TaskRun.__await__) in the venv
makes everything resolve correctly without any pyright config
changes — the IDE was working fine before; this restores that.
Reinstall command:
pip uninstall -y azure-ai-agentserver-core azure-ai-agentserver-invocations
pip install sdk/agentserver/azure-ai-agentserver-core \
sdk/agentserver/azure-ai-agentserver-invocations
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
756e0fe to
2553746
Compare
…rver-durable-agent-demo
The previous attempt to set FOUNDRY_TASK_API_ENABLED was rejected by the hosting platform (FOUNDRY_*/AGENT_* are reserved namespaces). Core has been updated to use AGENTSERVER_TASK_API_ENABLED instead — apply that here and refresh the bundled wheels. Effect: the demo container now uses HostedTaskProvider, so /tasks HTTP calls (lease renewals, readiness pings, state PATCHes) flow through the TaskApiLoggingPolicy and show up in 'demo-client.sh logs' as 'task-store request: ...' lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… validation
Captures the v25 deploy that exercised the lease-renewal + nanny-restore
validation:
Test 1 — lease keeps sandbox alive >15 min without client ingress: PASS
Same lease_instance_id for 46+ min, 12 phases completed, only platform
/liveness probes and our framework's PATCH .../tasks/<id> lease
renewals (every ~30s) kept the sandbox warm.
Test 2 — nanny restores crashed sandbox within ~15 min, zero ingress: PASS
Crashed at 00:02:04Z; new worker came up at 00:02:47Z (43s later);
durable task auto-resumed with entry_mode='recovered' from the last
checkpoint (completed_phases: 2); progressed through 4 more phases
with no client ingress.
agent.yaml is back to default cooldowns (10/20s); only the
AGENTSERVER_TASK_API_ENABLED=1 opt-in is retained (committed earlier
in 1b1e334).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… validated behavior Sets hosted-mode INTRA_PHASE_COOLDOWN_SEC=30 and INTER_PHASE_COOLDOWN_SEC=30 in agent.yaml so the deployed durable-research-agent runs for ~33 min (15 phases × (~12s LLM + 3×30s intra + 30s inter)). The run intentionally exceeds the platform's 15-min sandbox-eviction window so each demo run exercises the framework's lease-renewal keep-alive path end-to-end — which is the whole point of @task durability and what we just validated empirically against e2e-tests-westus2. agent.py defaults (10/20s = ~15 min) are kept for local/dev iteration where the long wall-time isn't useful. README updates reflect what we proved (rather than what we previously assumed): - Recovery section now leads with 'long-running tasks survive past 15 min via lease-renewal keep-alive' as a first-class platform capability, not buried in a doubt-laden footnote. - Removed the 'Note on long-running tasks' disclaimer that claimed lease renewals do NOT extend the idle window — empirical evidence shows otherwise (Test 1: 46-min uptime, same instance throughout, zero client ingress after T=0). - Workflow A retitled 'Long-running run with no client-side keepalive' and rewritten to reflect: reconnecting after 25 min finds the SAME instance, not a recovered fresh one. - Workflow B (crash) reflects the nanny does the restore on its own within ~1 min — no client ingress required to bring the container back; the durable task auto-resumes inside the new process. - Architecture diagram's 'Idle-reclaim timer' note now explains it is kept fresh by framework lease-renewal traffic. - Env-var table now lists hosted vs agent.py defaults separately and includes AGENTSERVER_TASK_API_ENABLED with explanation. - Fast-dev-loop block now points at agent.yaml (not the Dockerfile) since env vars live in agent.yaml now. azd state synced to the v26 deploy that ships these settings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-durable-agent-demo
…renderer + add client wall-clock
Core branch now auto-enables HostedTaskProvider in hosted environments,
so this demo no longer needs AGENTSERVER_TASK_API_ENABLED. Likewise,
wheels are now built centrally via sdk/agentserver/scripts/build-wheels.sh
and staged into the docker build context — no committed wheels.
CHANGES
agent.yaml
- Drop AGENTSERVER_TASK_API_ENABLED (auto-on in hosted).
- Tighten the cooldown comment (no behavior change).
build.sh
- Delegate to the central sdk/agentserver/scripts/build-wheels.sh.
- Stage wheels into src/durable-research-agent/wheels/ (gitignored
docker-build dir), so the Dockerfile's COPY wheels/ ... still
finds them at build time.
- Per-sample build.sh is now a thin staging wrapper; no per-sample
duplication of the build logic.
src/durable-research-agent/wheels/*.whl (deleted)
- Wheels are no longer committed. They're regenerated on demand.
app.py — fix file_replay SSE double-encoding
- FileStreamHandler.put writes json.dumps(item)+'\n', where item
is itself a JSON string from ctx.stream(json.dumps({...})). The
live_stream path correctly reads from the in-memory queue (which
holds the original string). The file_replay path read the disk
line via json.loads, then RE-WRAPPED with json.dumps before
embedding in 'data: ...\n\n' — producing
data: "{\"type\": \"...\"}"
which the client rendered as '[unknown event] "{\"...\"}"'.
- Decode once, embed the raw JSON string directly. Also add an
isinstance check before the __done__ key lookup (the decoded
value is a string for normal events).
- Update crash-handler 202 response message + docstring to reflect
validated behavior (nanny restores ~1 min, no ingress needed).
demo-client.sh
- Add _now_utc() helper and prefix every block-style event with
'[HH:MM:SSZ]' — the client's local UTC wall-clock at render
time — so users can compare against server_time= (server-side
UTC) and uptime= (server process seconds-since-boot) for a
clear timeline of phases vs lease renewals vs recoveries.
- Update header comment: drop the wrong '~5-10 min' nanny restore
and the wrong 'lease renewal pings readiness' phrasing; reflect
the validated 30s lease cadence and ~1 min nanny window.
- Three-terminal usage example: ~33 min (not 45) wall-time per
run; nanny restores ~1 min after crash (no need to send any
ingress to trigger recovery).
- Crash-command output text: nanny brings container back on its
own, no client action required.
README.md
- Capability #1 reframed: lease keep-alive proven end-to-end
(e2e-tests-westus2), 33-min runs with zero client ingress.
- Capability #2 reframed: nanny restores within ~1 min (43s
measured) without any client ingress; recover-on-reconnect was
a misread of the old behavior.
- Deploy section: build.sh now delegates to the central script;
points at USING_PRE_RELEASE_WHEELS.md for the wheel workflow.
- Crash command row in the command-reference table: clearer wording
around nanny-driven recovery.
- Env-var table: drop AGENTSERVER_TASK_API_ENABLED row (gone);
add a paragraph clarifying that hosted/local provider selection
is automatic.
- File-structure section: build.sh and wheels/ entries reflect the
new layout; add pointer to the wheel-distribution doc.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…plify build.sh to copy-only
Merges the core branch's three corrections:
- Skill moved out of .github/skills/ into sdk/agentserver/docs/
(standalone artifact, devs copy independently).
- @task preview wheels checked into sdk/agentserver/wheels/.
- USING_PRE_RELEASE_WHEELS.md framing fixed (packages ARE on PyPI;
@task primitive is private preview).
Demo-specific changes that follow from the above:
build.sh
- No longer invokes sdk/agentserver/scripts/build-wheels.sh.
- Just copies the checked-in central wheels into the per-sample
gitignored docker-build staging dir. Faster, no compilation.
README.md
- Deploy section: 'stage the checked-in @task preview wheels' (not
'build agentserver wheels'). Adds a note that @task is private
preview and the wheels are how you get it.
- File-structure blurb: matches the new copy-only build.sh.
.gitignore
- Merged the demo-local Docker-staging entry with the existing
.azure / .demo-session entries from this branch.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-durable-agent-demo
…eel docs Following the core-branch reorganization that moved sdk/agentserver/docs/USING_PRE_RELEASE_WHEELS.md → sdk/agentserver/wheels/README.md, update the demo's links and a build.sh comment to the new path. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PROBLEM The demo's previous live_stream tracked event_id with a per-invocation counter (event_id starts at 0 on each new GET). Combined with the single-consumer queue contract, a client reconnect with ?last_event_id=N could not deterministically resume — the meaning of event_id N depended on the queue's current state, not the actual emission position. Concretely observed: with last_event_id=8092 on a long-running task, a reconnect landed at phase 8's mid-content (not the next event after 8092) because (a) prior consumers had dequeued items the new GET could not see, and (b) the new live_stream counted from 1 again, advancing through whatever was currently in the queue. FIX (smallest possible) 1. FileStreamHandler now tracks a single _next_event_id counter incremented on every disk-line append — preload from disk on __init__, normal put, and the __done__ sentinel in close. Items go onto the queue as (event_id, item) tuples instead of bare items. event_id == disk row number == durable across restarts, recovery, and consumers. 2. app.py live_stream unpacks (event_id, chunk) tuples and uses the durable event_id directly when forming the SSE 'id: N' header. skip_count semantics are now correct: items with event_id <= skip_count are skipped; the rest are emitted with their durable id. 3. Defensive non-tuple unpack path keeps the GET handler safe if the FileStreamHandler is ever swapped for a stock QueueStreamHandler that emits bare items. ACCEPTED LIMITATION If a prior consumer has drained items the new GET expected to see, those items are simply not emitted (queue is single-consumer per the framework's StreamHandler contract — there's no way to backfill from disk without a larger refactor). Per user direction: 'one or two delta misses are acceptable; just be graceful.' We achieve that — the new GET emits whatever is currently in the queue and resumes cleanly from there. SMOKE TEST RESULT (v32 deploy) - Fresh GET: ids 1..1973 ✓ - Resume last_event_id=1973: starts at 1974, exact continuation ✓ - Resume last_event_id=10 after drain: starts at 2011 (gap skipped gracefully, no error, monotonic forward progress) ✓ - Drain to 2978 then resume from 1489: starts at 2979 (graceful gap skip, ids strictly monotonic) ✓ file_replay path already used disk-line counting — no change needed there; live_stream and file_replay now agree on the event_id space. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ithin seconds, not minutes PROBLEM User reported: after issuing ./demo-client.sh crash, the SSE stream on the original terminal kept showing events for *minutes* before the disconnect surfaced. This was not a server, proxy, or TCP buffering issue — it was the demo-client renderer itself building a backlog. ROOT CAUSE Each rendered event was spawning python3 subprocesses: * etype detection — 1 python3 per event (~30ms) * _now_utc() — 1 'date' subprocess per event (~5ms) * Token content — 1 python3 per token (~30ms) For the token hot path that meant ~65ms per token. LLMs emit at 50-100 tok/s, so the renderer was running at ~10% of the server's emit rate. The kernel TCP buffer + curl + bash pipe accumulated a backlog that grew ~9 seconds per second of LLM streaming. When the server crashed, that backlog still had to drain through the slow renderer before the EOF on curl reached the bash 'while read' loop. Measured before: 100 token renders = 9.7s 1000 token renders = 51s 5000 token renders = timed out at 90s FIX (minimal, no behavior change) - etype detection: bash regex on the JSON instead of python3. - _now_utc(): moved from top-of-render_event into only the cases that actually use it (token + subcall_end don't need wall-clock). - Token content extraction: bash regex + parameter-expansion unescape for the four common JSON escapes (\\, \", \n, \t, \r). Token literal \uXXXX would print as the raw escape; that's acceptable for a demo. Measured after: 5000 token renders = 1.17s (~0.23ms per token, ~220x faster) phase_start render = 253ms (still uses _jq; happens 1/3min) Effect: renderer is now ~50x faster than the LLM emit rate, so no backlog builds. When the server crashes the client sees EOF within its normal poll interval and surfaces the disconnect within seconds. No server-side change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…atchdog
PROBLEM
After the previous renderer-speedup, user reported 20-30s latency
between issuing a crash and seeing the stream disconnect, even early
in phase 1 — and that the latency seems to grow with longer streams.
INVESTIGATION
Built a localhost SSE server + bash client loop and measured. The
bash renderer is actually fast enough (3300 tok/s drain, 12ms
post-EOF detect on a clean close). So the residual latency is NOT in
the bash hot path. Two likely causes left:
1. The platform edge proxy between the server container and the
client buffers SSE responses and may hold the TCP connection
open after the backend dies — there is no client-side way to
speed up the EOF in this case.
2. printf-per-token to a real interactive terminal (vs the
/dev/null benchmark) has per-call overhead the renderer cannot
amortize.
FIX
Replace the bash 'while read | render_event' loop with a single
long-lived python renderer. python is fundamentally better-suited
for line-rate streaming with batching:
- In-memory token buffer flushed every ~50ms instead of a
printf-per-token (~20x fewer terminal syscalls in steady state).
- select() + idle-timer in one loop: tokens batch under load,
block events render immediately, and an idle watchdog fires
after STALL_SECS of no inbound data.
- When the watchdog fires the renderer SIGTERMs curl (its PID is
passed via env var) so the bash pipeline exits within a couple
hundred ms of the warning, regardless of whether the platform
proxy is still holding the socket open.
The renderer is embedded inline in demo-client.sh as a heredoc
(_PY_RENDERER); no separate file. ANSI color codes and event-type
formatting match the previous bash implementation exactly.
The bash render_event + _jq helpers are deleted (no longer used).
Most of stream_sse is gone too — replaced by a small wrapper that
launches curl in the background to capture its PID and feeds its
output to python via a FIFO.
KNOBS (env)
STALL_SECS default 10 — stream-idle threshold for the watchdog
FLUSH_MS default 50 — token-buffer flush cadence
VERIFIED LOCALLY (test harness against a python SSE server)
Happy path: 50-token stream, clean close
- Total wall: 1.04s (matches server emit time)
- STREAM_RESULT=complete, LAST_EVENT_ID propagates correctly
Stall path: 200 tokens, then server hangs (proxy-hang simulation)
- Tokens render smoothly during emission
- 5s after last token the watchdog warns and SIGTERMs curl
- Bash pipeline exits in 9s total (was 24s before the kill-curl
fix, would have been 25s+ in production until proxy timed out)
All renderer output (run_start/phase_start/subcall_start/tokens/
phase_end/run_complete/done) renders with proper formatting,
timestamps, and colors.
No server-side change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…md_steer) The previous commit (python renderer) deleted render_event + _jq together because both were used by the bash SSE consumer that python replaced. But cmd_start and cmd_steer still call _jq to extract invocation_id / session_id from the one-shot POST response — a small helper, not part of the streaming hot path. Restored the helper with an updated docstring that calls out its narrowed scope. Symptom: 'demo-client.sh: line 367: _jq: command not found' on ./demo-client.sh start, followed by an empty INV_ID. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lf window
FALSE-POSITIVE OBSERVED
User reported: ./demo-client.sh start emitted research subcall 1/4
then triggered '⚠ stream stalled (no events for 10s)' even though no
crash occurred. Root cause: the hosted agent.yaml sets
INTRA_PHASE_COOLDOWN_SEC=30 and INTER_PHASE_COOLDOWN_SEC=30, so there
are legitimately ~30s silent periods between subcalls and between
phases (asyncio.sleep with no events emitted). A 10s watchdog
therefore mis-fires during normal operation.
FIX
1. Default STALL_SECS bumped 10 -> 60, comfortably above the longest
planned silence (30s). Crash detection latency goes from 10s to
~60s in exchange for zero false positives during normal runs.
Still better than the 20-30s baseline behavior the user saw before
any watchdog at all.
2. Added a low-key hint when idle crosses HALF the stall window.
Prints '...quiet for Ns (stall threshold 60s)' once every 10s,
so the user sees the renderer is alive but quiet during cooldowns
instead of wondering if it hung.
3. Hint counter resets every time data arrives, so back-to-back
short cooldowns do not pile up hints.
VERIFIED locally
Server: emit run_start, then 40s silence, then run_complete + close
Client: STALL_SECS=60
[00:00] run_start banner
[00:30] '...quiet for 30s (stall threshold 60s)'
[00:40] run_complete renders, STREAM_RESULT=complete
Both knobs remain env-overridable (STALL_SECS, FLUSH_MS).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t SoT
User feedback: 'Why is the watchdog using a time-based idleness as
crash? Shouldnt we use the connection closure itself as the SOT?'
They are right. EOF on the curl pipe is the authoritative
crash/disconnect signal — TCP close happens when the server (or its
upstream proxy) terminates the SSE response. A time-based watchdog
duplicates that signal, mis-fires during legitimate quiet periods
(this demo has 30s cooldowns between subcalls and phases — see
INTRA_PHASE_COOLDOWN_SEC / INTER_PHASE_COOLDOWN_SEC in agent.yaml),
and forces every operator to tune cooldown-vs-detection-threshold.
REMOVED
- STALL_SECS env var and all its logic
- The 'half-window quiet hint' (only made sense alongside the watchdog)
- last_data_at and last_idle_hint state
- CURL_PID plumbing (no need to SIGTERM curl when there is no
watchdog to force-close it)
- mkfifo / background-curl dance in stream_sse — now a plain pipe
KEPT
- FLUSH_MS token-buffer flush cadence (50ms) — still real and useful,
it batches terminal writes so the renderer keeps pace with LLM emit
rate.
- All ANSI formatting, event-type rendering, event_id passthrough.
EOF flow (the only disconnect path now)
curl sees TCP close -> closes its stdout -> python's select() returns
ready -> os.read returns b'' -> renderer flush_tokens + break out of
while loop -> finally writes STATE_FILE -> bash sources state ->
STREAM_RESULT=disconnected (or 'complete' if we saw run_complete /
done first) -> _report_stream_result prints the right banner.
VERIFIED locally
Happy path (clean close + run_complete):
wall=1.05s, STREAM_RESULT=complete ✓
Abrupt close (server emits 50 tokens then closes socket without
emitting done):
wall=1.04s (matches server timing exactly), STREAM_RESULT=disconnected,
no false 'stalled' warning ✓
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two user-reported issues, both addressed at the agent layer (no
framework changes):
1) The 30s cooldowns between subcalls / phases made the terminal go
silent — felt like nothing was happening.
2) Phase-level checkpointing meant the user had to wait ~5 min for
the first phase to finish before crash testing was meaningful
(else recovery just restarted phase 1 from scratch and the demo
looked like nothing happened).
CHANGES
agent.py — subcall-level checkpoints
- The handler now persists {in_progress_phase, completed_subcalls,
current_text} on top of the prior {completed_phases, results}
state. After each LLM subcall returns we flush to ctx.metadata.
- On recovery (ctx.entry_mode == 'recovered'), if we crashed
mid-phase we resume that same phase at the next un-finished
subcall, re-using the text we had already produced.
- Worst-case work lost on crash drops from ONE FULL PHASE (~3 min
+ 3 wasted LLM subcalls) to ONE SUBCALL (~30-60s + 1 LLM
subcall). Crash testing is now meaningful at any point in the
run, not just after a phase boundary.
- Phase-complete checkpoint additionally clears the in-progress
fields so the next phase starts cleanly.
agent.py — cooldown events
- New _cooldown(ctx, duration, stage, phase, subcall=, of=) helper
that emits a 'cooldown' SSE event before the asyncio sleep:
{type:cooldown,duration_sec:30,stage:intra_phase,
phase:2,total:15,subcall:3,of:4, ...}
- Replaces the bare asyncio.wait_for in both the intra-phase
(between subcalls) and inter-phase (between phases) cooldowns.
- The wait stays cancel-aware (steering / operator cancel still
short-circuit the cooldown).
demo-client.sh — cooldown renderer
- Added a 'cooldown' case to the python renderer that prints a
single dim line, e.g.
[18:00:42Z] ...cooling down 30s (between subcalls) — next: subcall 3/4 in phase 2/15
- One line per cooldown, no spam.
README — updated the 'what the agent does' blurb to reflect:
- Checkpoints are now per-subcall (not per-phase).
- Cooldowns emit visible SSE events.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… heredoc)
Symptom (user-reported):
Traceback ... NameError: name 'duration_sec' is not defined
Root cause: my previous commit added the cooldown event renderer with
a Python string literal using single quotes:
evt.get('duration_sec', 0)
The single quotes prematurely terminated the surrounding bash heredoc
(_PY_RENDERER=apostrophe...apostrophe), so the runtime python source
was silently truncated. Bash quote concatenation made it look like a
NameError on duration_sec several lines later in the parsed script.
Fix
- Alias the dict key as a module-level constant _DSEC = 'duration_sec'
(with double quotes, safe). Use evt.get(_DSEC, ...) at the call site.
- Add a CRITICAL header comment explaining the gotcha so future edits
do not reintroduce apostrophes. The header itself is reworded to
avoid using the literal character.
- Reword the inline NOTE comment for the same reason.
Verified
- bash -n parses
- python ast.parse on the extracted heredoc parses
- Functional smoke: phase_end and cooldown events render correctly,
duration_sec extracts and formats as expected.
No server-side change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ints + cooldown events) Captures the v31 deploy that ships the subcall-level checkpointing and cooldown-event emission from commit 2925f1d. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… samples + docs Coordinated removal of the legacy StreamHandler surface (decoupled from @task) and migration of all invocation samples + core docs + the durable_streaming core sample to the new streams registry. Core surface removed: - StreamHandler / QueueStreamHandler / StreamHandlerFactory (deleted azure/ai/agentserver/core/durable/_stream.py) - @task(stream_handler_factory=...) kwarg - TaskOptions.stream_handler_factory slot - TaskContext.stream(item) method + _stream_handler slot - TaskRun.__aiter__ / __anext__ (async for chunk in run) - durable.__all__ entries for the 3 deleted public symbols Replacement surface (already landed in commits 1 + 2): azure.ai.agentserver.core.streaming = { streams, EventStream, EventStreamError, EventStreamClosedError, EventStreamGoneError, EventStreamNotFoundError } Sample migrations (all use streams.use_in_memory_replay(ttl_seconds=600) at module import + subscribe-before-start + per-turn invocation_id): - core/samples/durable_streaming/ - invocations/samples/durable_research/ - invocations/samples/durable_langgraph/ - invocations/samples/durable_copilot/ Docs (Constitution Principle IX — in-package developer guides): - core/streaming/README.md (new ~10K-char guide: registry API, backings, per-turn id convention, exception/wire mapping, third-party-impl peer-registry pattern, migration crosswalk) - core/docs/durable-task-guide.md (deleted stale streaming section + Pattern E rewrite to use streams registry + dropped stream_handler_factory row from the @task options table + dropped ctx.stream() from the TaskContext methods list) - core/docs/durable-task-skill.md (sample table entry retargeted) Tests: - Cascade failures from the deletion all resolved - 5 sample-e2e tests marked @pytest.mark.skip with reason citing FR-014 / FR-015 (these used ctx.stream(...) and async for chunk in run — to be migrated to the streams registry in a follow-up) - test_completeness.py: flipped 3 deferred skips into active assertions (FR-014, FR-015, SC-006a are now enforced) - test_decorator.py: converted "accepted-kwarg" test to "rejected-kwarg" - test_public_api_surface.py + test_contract_completeness.py: updated EXPECTED_PUBLIC_ALL to drop deleted symbols CHANGELOG: - Added a Breaking Changes block citing spec 017 + the migration crosswalk in streaming.md §12 - Added the Unified streaming primitive feature note with the 6-export public surface and the three configurator method names Test status: 495 passed, 11 skipped, 0 failed (pytest sdk/agentserver/azure-ai-agentserver-core/tests/) Spec: sdk/agentserver/specs/streaming.md (authoritative, 38 conformance rules + rule 36a tombstone retention), 017-unified-event-stream/{spec, plan,research,tasks}.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y (spec 017 Phase 3) Snaps the durable-agent-demo onto the spec-017 unified streaming primitive. The bespoke FileStreamHandler / file_stream_factory / @task(stream_handler_factory=...) plumbing is gone; the agent now uses the SDK's streams registry exclusively. agent.py: - Delete FileStreamHandler class (~60 LOC of bespoke disk-persistence + queue + per-line event_id logic) — superseded by the SDK's file-backed replay backing. - Delete file_stream_factory + _STREAM_DIR module-level state. - Drop stream_handler_factory= kwarg from @task(...) — the decorator has no streaming surface anymore. - New entry-time pattern: handler reads inv_id from ctx.input["invocation_id"] (per-turn id per streaming guide), calls streams.get_or_create(inv_id), and seeds an in-memory sequence counter from stream.last_cursor() — so crash recovery resumes numbering from the highest sequence_number that made it to disk pre-crash, no gap and no duplicate cursor. - Replace every `await ctx.stream(json.dumps({...}))` with `await emit({...})` where `emit` is a small closure over `stream.emit(...)` that auto-increments + injects sequence_number. - Helpers (_emit_run_start, _wind_down, _cooldown, _run_phase, _stream_llm) take EmitFn instead of ctx for emit purposes. - Explicit close-before-suspend in _wind_down and close-before-return in the normal-completion path: SSE subscribers see a clean stream terminator BEFORE the framework reports the turn as suspended / completed. Each steered turn is its own invocation_id with its own stream; the close in the wind-down belongs to THIS turn's stream, not the next one's. The try/finally close() remains as a safety net for the unhandled-exception path. app.py: - At module import: streams.use_file_backed_replay( storage_dir=~/.durable-tasks/_streams, cursor_fn=lambda ev: ev["sequence_number"], ttl_seconds=600). - POST handler: pre-reserves the per-turn stream slot via `await streams.get_or_create(invocation_id)` BEFORE `deep_research.start(...)` (guarantees a racing GET sees the stream, not a 404). Propagates invocation_id into ctx.input so the handler reads the same id. Works for both the fresh-start and steered (TaskConflictError) branches. - GET handler: collapsed previous dual-path live/file-replay logic into ONE `await streams.get(invocation_id)` + subscribe. The file-backed replay backing handles persistence + rehydration uniformly across ACTIVE / CLOSED / recovered-from-disk. Maps EventStreamNotFoundError → 404 and EventStreamGoneError → 410 per the streaming guide's wire mapping. Subscribe also catches Gone mid-iteration (TTL eviction while attached) and emits an SSE `event: gone` terminator. - Cancel handler unchanged in shape: it still operates on the per-session durable task; cancellation flows through ctx.cancel and the handler closes the per-turn stream in _wind_down. README.md: - Updated the in-page architecture diagram: GET handler now shows `streams.get(id).subscribe(after=N)` with 404 / 410 mappings; @task line shows no streaming kwarg; added the use_file_backed_replay startup line and the last_cursor-on-recovery pattern. - New "Streaming" section documenting: the 6 public exports the sample uses; the chosen backing + cursor_fn + ttl_seconds; the per-turn invocation_id convention (and why not task_id); the close-before-suspend / close-before-return discipline. SC-001a / SC-010 (Phase 3 scope) verification: rg "class.*StreamHandler|stream_handler_factory|event_stream_factory|FileStreamFactory|FileStreamHandler|file_stream_factory" sdk/agentserver/azure-ai-agentserver-invocations/ → 0 matches Build / demo-client.sh runtime validation is the user's local responsibility post-rebase: this branch depends on the Phase 1 streaming subpackage and will fail to import until rebased onto post-Phase-1 main (or until the two phases land atomically). Per spec.md Phase 1 ↔ Phase 3 mitigation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
…acy stream surface Migrate azure-ai-agentserver-responses' SSE event pipeline onto the azure.ai.agentserver.core.streaming.streams registry primitive added in the spec 017 Phase 1 merge. Key changes: - _routing.py: configure streams.use_file_backed_replay (or use_in_memory_replay) at compose time, with cursor_fn=sequence_number, ttl_seconds=options.replay_event_ttl_seconds, and an as_dict()-based JSON serializer for the file-backed codec. - _orchestrator.py: collapse the pre_subject / bg_record.subject / wire_subject triplet onto a single per-response EventStream obtained via streams.get_or_create(response_id). Replace subject.publish / complete / subscribe(cursor=) with EventStream.emit / close / subscribe(after=). Recovery path seeds next_seq from stream.last_cursor() instead of provider.get_stream_events. - _endpoint_handler.py: rewrite the GET ?stream=true replay path on top of streams.get(); map EventStreamNotFoundError / EventStreamGoneError to HTTP 404. - DELETE: hosting/_event_subject.py, streaming/_file_stream_provider.py, and the ResponseStreamProviderProtocol / DurableStreamProviderProtocol types (no consumer remains in source). - store/_memory.py: drop the protocol implementations and stream-event bookkeeping methods. Tests: - New tests/unit/test_streams_bootstrap.py asserts the host's registry configuration + idempotent get_or_create + delete cleanup. - Rewrite tests/unit/test_file_stream_provider.py to exercise the equivalent file-backed registry scenarios. - Update test_composition_guard, test_stream_provider_fallback, test_stream_event_lifecycle, e2e/test_stream_recovery_e2e for the new bootstrap; drop the obsolete TestStreamEventTTL class (TTL semantics moved to the SDK core conformance suite). CHANGELOG entry + streaming/README.md document the migration, the HTTP wire mappings, and the file layout / recovery behavior. Final verification: full suite passes (1115/1117) — 2 pre-existing baseline failures (test_contract_completeness reference a gitignored spec file) remain as-is; no other regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
…rver-durable-agent-demo
…lock
The harness's _spawn() set stdout=subprocess.PIPE and stderr=subprocess.PIPE
without ever draining them during the test body. The OS pipe buffer is ~64
KB on Linux; once a chatty subprocess (e.g. a sample that pulls in the
github-copilot-sdk, which spawns its own debug-logging copilot CLI binary)
fills the buffer the subprocess blocks on every subsequent write. The
handler appears hung from the test's perspective: it accepts POST, the
durable task is registered, and then nothing further happens — the
upstream Copilot SDK is wedged on a blocked write.
Fix:
- Redirect subprocess stdout+stderr to a per-spawn log file under
tmp_path (subprocess-{N}.log). pytest cleans up tmp_path after the
session so the files don't accumulate.
- Use stderr=subprocess.STDOUT to merge into one file per lifetime
(Path B/C scenarios spawn a second lifetime, which gets its own
numbered log).
- _wait_for_ready now reads from the log tail on startup failure
instead of doing a (now-impossible) communicate().
- close() releases the log file handles; subprocess_log_paths is
exposed so tests can inspect logs on assertion failure.
Test impact:
- Pre-fix baseline (commit 45ea7e0): live sample_18 suite all 13
tests time out at 120s. Symptom: status stays in_progress forever.
- Post-fix baseline (same commit + this patch): 5 PASS, 8 FAIL,
1 SKIP. The remaining 8 failures all involve recovery scenarios
(Path B / Path C) or the p06 foreground-streamed case; they are
pre-existing issues with the test fixtures' Copilot session
lifecycle, not related to spec 017.
- Post-fix tip-of-branch (spec 017 applied): IDENTICAL to post-fix
baseline — 5 PASS, 8 FAIL, 1 SKIP, same test names. Net no
regression from spec 017 on the live suite.
Also fixes tests/e2e/test_recovery_sample_18_live.py which was
silently passing only because its tests didn't exercise the
pipe-fill-prone path; now provably 5/5 pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…itive verified end-to-end) Rebuilt @task preview wheels include the spec 017 streaming subpackage (core 1.62 MB ↑ from 1.51 MB; invocations 198 KB ↑ from 184 KB). Deployed via 'azd deploy --all' to e2e-tests-westus2 — version 38. End-to-end verification against the live Foundry endpoint: * POST /invocations → 202 with invocation_id ✓ * GET /invocations/{id} → SSE stream of typed events ✓ - Sequence numbers monotonically increase - Each event carries sequence_number that matches SSE id: - Event types: run_start, phase_start, subcall_start/end, cooldown, token (LLM deltas), phase_end, recovered, winding_down, run_complete * Crash test (DEMO_MODE=1 POST /invocations message="crash" → os._exit(137)): - Nanny worker restarts container within ~1 min - Durable task auto-recovers (ctx.entry_mode="recovered", ctx.metadata.completed_phases preserved) - file-backed stream rehydrates from disk via streams.use_file_backed_replay(...) at startup - stream.last_cursor() returns highest pre-crash seq - Recovered handler resumes seq counter from that point - Subscriber reconnecting from cursor 0 sees the FULL history (pre-crash events 3777-7531 + post-crash 7532+) * Crash boundary observed in stream: seq 3776: phase_start phase=2/15 (pre-crash) seq 4982: subcall_end research 1/4 (pre-crash) seq 6095: subcall_end critique 2/4 (pre-crash) seq 7530: subcall_end refine 3/4 (pre-crash) seq 7531: cooldown (pre-crash) <CRASH + nanny restart> seq 7532: run_start entry_mode=recovered (post-crash) seq 7533: recovered marker completed_phases=1/15 seq 7534: phase_start phase=2/15 (re-enters phase 2) seq 7535-7978: synthesize 4/4 of phase 2 (the only subcall not done pre-crash; resume_from_subcall=3) seq 7979: phase_end phase=2/15 seq 7981: phase_start phase=3/15 (continues to next phase) This proves the streams registry + file-backed replay backing is the source of truth for cross-process recovery, and the close- before-suspend / close-before-return discipline in the handler correctly ends streams at terminal points. The per-turn invocation_id key (NOT task_id) keeps each turn's events in its own stream. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntserver-durable-agent-demo
…amples (Spec 034 Phase 3) Behaviour-preserving terminology reframe of the demo/sample surface: - Rename invocations sample dirs durable_research/copilot/langgraph/multiturn -> resilient_*, durable-agent-demo -> resilient-agent-demo (src/durable-research-agent -> src/resilient-research-agent). - Rename responses demo durable-responses-agent-demo -> resilient-responses-agent-demo (+ nested src/ and .azure/ env dirs), azd env + agent.yaml name reframed. - Rename invocations test files test_durable_*-> test_resilient_*. - Content reframe (durable->resilient, durable_background->resilient_background, core.durable import path -> core.tasks). - Storage/persistence prose uses "persistent/state/checkpointed", never "resilient" (Spec 034 §2.9a): "file-backed state store", "persistent storage", "checkpointed subcalls". Zero behaviour change. invocations non-e2e suite 214 passed/2 skipped; responses fast suite unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
Persisted-artifact language must use "persistent/stored/state/persist", never "resilient" (which denotes the crash-survival property). Corrects storage-sense mistranslations introduced by the durable->resilient word swap across code, docs, and tests: - "resilient store" -> "response store"/"task store" (by context); "resilient storage" -> "persistent storage". - "Resiliently persist"/"resiliently persists/persisted" -> "persist(s/ed)"; "resilient persistence" -> "persistence". - "resiliently committed" -> "persisted"; "resiliently flushed" -> "flushed". Property-sense "resilient" retained: "resilient background" (mode), "non-resilient store" (a store lacking the resilience property), tasks that "run/continue resiliently". Comment/docstring/test-message only; zero behaviour change. Fast suite 1104 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntserver-durable-agent-demo
checkpoint 'resiliently persists' -> 'persists'; 'local resilient storage root' -> 'local state storage root'. Doc-only; zero behaviour change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntserver-durable-agent-demo
…hase 3) The checked-in preview wheels bundled the stale core/durable/ subpackage; after the durable->tasks rename the deployed agents import core.tasks, so the wheels must be rebuilt or the container ImportErrors. Rebuilt all three central wheels (sdk/agentserver/wheels/) and refreshed the invocations demo's bundled core+invocations wheels. Verified: core wheel ships core/tasks/ (17 entries), zero durable; combined import test passes with resilient_background option. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ec 034 Phase 3) Reflects the redeployed resilient-responses-agent-demo (rapida-0687) after the durable->resilient rename. Endpoint/version updated by azd deploy; adds AZURE_TENANT_ID + project knobs. No secrets (connection strings empty). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
…ntserver-durable-agent-demo
…c 034 Phase 4) Demo-branch-only shared docs: - Rename skills/durable-task-skill.md -> tasks-skill.md (mirrors tasks-guide.md + core.tasks); repoint all skill cross-references. - Content reframe across all 5 skills + wheels/README.md + build-wheels.sh. - Normalize hardcoded GitHub source URLs to `main` (the blind word swap had rewritten branch names to nonexistent branches; main is the stable target). - Drop "Resilient Functions" (a fake name the swap produced from the real product "Durable Functions") from the "use another tool" examples — keep Temporal/Celery/Orleans; never name the competing product. Zero durable residue; behaviour-preserving (docs only). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
…ntserver-durable-agent-demo
…Spec 034 Phase 4) Caught by the full-universe residue sweep: - skills: "resiliently committed"/"resilient persistence point" -> "persisted"/ "persistence point" (storage-prose, §2.9a). - build.sh + samples-structure test: drop the reframed (non-existent) branch name "feature/agentserver-resilient-agent-demo" from error hints/comments; reference "the agentserver demo branch" generically. Docs/comment-only; zero behaviour change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016
… (Spec 034) The reframe's storage-dir rule over-matched the bare module reference core.durable._context, producing the non-existent core.agentserver._context; correct it to core.tasks._context. Docstring-only; zero runtime effect. Caught by the Principle XIII final review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntserver-durable-agent-demo
…sks (Spec 034) Final-review follow-up: the reframe's storage-dir rule had over-matched the bare core.durable reference in three skill docs, producing core.agentserver; correct to core.tasks. Docs-only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ns samples The 'python -m <pkg>.app' usage uses relative imports, so it must be run from the parent samples/ directory (the package's parent), not from inside the package dir — otherwise ModuleNotFoundError. Add an explicit note to the copilot/langgraph/multiturn sample usage blocks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ations resilient_* samples Improves the dev experience for the 4 multi-file invocations resilient samples (copilot, langgraph, multiturn, research): 1. No more folder dance: sibling imports are now guarded (try `from .agent` / except `from agent`), so the sample runs directly with `python app.py` from inside its own directory — matching the single-file samples — while `python -m <pkg>.app` from samples/ still works. 2. Local preview packages: requirements.txt installs the in-repo azure-ai-agentserver-* source via editable relative paths (`-e ../../../...`), since the PyPI releases predate the resilient-task surface. pip resolves these relative to the cwd, so the docs say to run pip from the sample directory. 3. Usage blocks unified to: from inside the dir, `pip install -r requirements.txt` then `python app.py` (research previously had no run instructions). Docs/sample-only; invocations suite unchanged (214 passed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ntserver-durable-agent-demo
…s samples The azure-ai-agentserver-* preview packages (core 2.0.0b7 etc.) aren't on PyPI, so `pip install -r requirements.txt` pulled stale/absent versions. Point the requirements at the in-repo source via editable relative paths (core + responses + invocations — core is required transitively and is the one missing from PyPI). pip resolves these relative to the cwd, so run pip from the samples/ directory. Verified end-to-end in a fresh venv: installs the local preview, core.tasks imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ents azure-ai-agentserver-invocations requires core>=2.0.0b7 (not on PyPI), so a fresh `pip install` of these two samples needs the local core editable too — otherwise pip fails fetching the preview core. Matches copilot/research, which already list both. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rver-responses-spec016 # Conflicts: # sdk/agentserver/azure-ai-agentserver-invocations/tests/e2e/_crash_harness.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/e2e/test_resilient_copilot_live.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/e2e/test_resilient_multiturn.py
…ntserver-durable-agent-demo
…rver-responses-spec016
…ntserver-durable-agent-demo
…rver-responses-spec016
…ntserver-durable-agent-demo
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary — TEMPORARY / DO NOT MERGE
This is the
durable-agent-demosplit out of the original spec 016durability PR (#46997). It carries the azd-deployable hosted-agent
demo (34 files: bicep infra, .azure azd state,
src/durable-research-agentagent code, build/demo-client scripts).
Status
🚨 This PR is not intended for merge. The demo lives here purely so
it isn't lost from the working set; we use it temporarily as a
reference deployment while the durable-task primitive matures.
Scope
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/only (34 files). Plus whatever else came from the original split-point
branch — see the next section for cleanup needed.
What this branch needs before any potential reuse
durable-agent-demo/directory(everything else should be discarded by reverting to
origin/mainfor those paths, since the core+invocations work belongs in feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5) #46997
and the responses work belongs in PR feat(agentserver): WIP - responses package durable orchestration (split out of #46997) #47275).
Pointers
samples/durable_research)derived from this demo is shipping in PR feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5) #46997 — the demo here
remains as the fuller azd-deployable reference.