AgentService (TS): emit §6.4 mandatory leading ack before handler runs#100
Conversation
|
Claude finished @M64GitHub's task in 12m 38s —— View job Review
The implementation is correct and the approach is sound. The ack is emitted synchronously before Two findings to address before merge, and one parity note: 1. Python
2. Test resource leak on assertion failure (inline comment on test line 128) If either pre- 3. Silent initial ack failure → handler runs for a gone caller (inline comment on service.ts line 578) Consistent with the existing error-handling pattern ( Items 2 and 3 are non-blocking — item 1 should be resolved (either fixed or explicitly tracked) before this merges. |
| const ackBytes = encodeChunk(ack); | ||
| try { | ||
| msg.respond(ackBytes); | ||
| } catch { |
There was a problem hiding this comment.
If the initial ack fails here (connection already gone), the handler still runs and does potentially expensive work (LLM call, etc.) that no one will receive. All subsequent response.send() calls will also silently throw for the same dead connection.
This is consistent with existing error-handling elsewhere in the file (keepalive and PromptResponse.send() both eat errors), but it's worth a follow-up: an early-exit when the initial ack fails would prevent wasted work. Not a blocker for this PR's scope.
| while (messages.length < 1 && Date.now() < deadline) { | ||
| await new Promise((r) => setTimeout(r, 25)); | ||
| } | ||
| expect(messages[0]).toEqual({ type: "status", status: "ack" }); |
There was a problem hiding this comment.
If either assertion on lines 128-129 fails, releaseHandler!() is never called — the handler stays suspended in handlerCanProceed, and iter is a dangling unresolved promise. afterEach calls service.stop() which stops the NATS microservice, but the handler task itself stays alive until the test runner times it out.
This won't cause false positives (Vitest throws immediately on assertion failure), but it does mean a single unexpected failure here can slow down the suite by the configured test timeout. Consider a try/finally around the polling + pre-release assertions to ensure releaseHandler!() is always called.
Addresses the three reviewer items on PR #100: 1. **Python `AgentService` parity (the main miss).** The Python agent-sdk had the exact same pre-PR pattern — `_keepalive_loop` only, no eager initial ack. Now emits the §6.4 mandatory ack synchronously in `_on_prompt_request` before the handler runs, mirroring the TS change. Brings the Python reference agent and every Python harness consuming `AgentService` into spec compliance automatically. - `service.py`: eager `await request.respond(encode_chunk(...))` before the keep-alive task is created. - `test_keepalive_e2e.py`: renamed + updated existing tests for the new "always one initial ack" semantics; added `test_initial_ack_arrives_before_handler_runs` that gates the handler on `asyncio.Event` and pins the eager behavior. - `test_query_e2e.py`, `test_error_completion_e2e.py`, `test_echo_e2e.py`, `test_attachments_e2e.py`, `test_reference_agent_e2e.py`: account for the additional ack frame in their wire-shape / chunk-count assertions. - `CHANGELOG.md`: `[Unreleased]` entry mirroring the TS one. 2. **TS test resource leak on assertion failure.** Wrapped the pre-`releaseHandler()` polling block in a `try { ... } finally { releaseHandler!(); }` so a failed assertion still releases the suspended handler — keeps the test runner from hanging until its own timeout. 3. **TS CHANGELOG.md prettier formatting.** CI's `format:check` was tripping on indentation in the new `[Unreleased]` block; `prettier --write` reflowed it cleanly. (Was the cause of the `CI — agent-sdk/typescript / Lint + typecheck (pull_request)` failure.) Reviewer's third note (silent initial-ack publish failure → handler runs on a dead reply subject) intentionally not addressed in this commit — it matches the existing best-effort error-handling pattern across `PromptResponse.send()` and the keepalive loop in both languages. Worth a follow-up that adds early-abort across all four sites at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude - please review again, I think we fixed everything |
|
Claude finished @M64GitHub's task in 1m 47s —— View job Re-review
All three issues from the previous review have been addressed. Here's the rundown: Issue 1 — Python parity ✅ Fixed
try:
await request.respond(encode_chunk(StatusChunk(status="ack")))
except Exception:
# Best-effort, matching the keepalive loop and PromptStream
# error-handling pattern…
log.exception("initial ack emit failed on %s", request.subject)The implementation is correct: ack fires before the Issue 2 — TS test resource leak ✅ FixedThe polling + assertion block is now wrapped in try {
// poll + assertions
} finally {
releaseHandler!();
}The Python counterpart ( Issue 3 — Silent ack failure informational ✅ AcknowledgedBehavior unchanged (consistent with existing error-handling), which is the right call for this PR's scope. Additional observationsPython test coverage is thorough.
Minor: TS CHANGELOG indentation drift. Two continuation lines in VerdictAll three originally flagged items are resolved. The Python and TypeScript implementations are now in parity, test coverage pins the eager-ack behavior on both sides, and the changelogs are accurate. Ready to merge (minor CHANGELOG formatting aside). |
Spec §6.4 was tightened to require an immediate
`{"type":"status","data":"ack"}` chunk as the first message on the
reply subject, before any latency-inducing handler work
(nats-agent-sdk-docs@b1c6972). The Python SDK adopted this in PR #101
(merged as 0.4.0); this commit brings `@synadia-ai/agent-service`
into parity.
Change: `AgentService.#dispatchPrompt` publishes
`encodeChunk({type:"status",status:"ack"})` synchronously after a
successful envelope decode and before invoking the user-supplied
handler. The pre-existing periodic keep-alive interval is retained
for §6.6 inactivity-timer defense — the spec text no longer mentions
it but the wire shape remains valid; `keepaliveIntervalS` now
controls only the periodic cadence, not the leading ack.
Added integration test pinning the eager behavior: a handler that
blocks on a promise, the test asserts the ack arrives before the
handler is released. try/finally around the pre-release assertions
makes a failed expectation reliably unblock the suspended handler
so the test runner doesn't hang.
Wire-compatible — callers already decode arbitrary `status` chunks
via `@synadia-ai/agents`. Brings the TS reference agent and
`agents/open-agent` into compliance automatically. The other
agent harnesses (pi, openclaw, claude-code-headless, pi-headless)
already comply via their own pre-handler ack publishes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c8879d4 to
5c297d2
Compare
|
@claude - we had to abandon the python stuff, as that is already done meanwhile by PR 101 + PR 102. Can you please check now. |
|
Claude finished @M64GitHub's task in 2m 4s —— View job Re-review (TS-only scope)
The diff is TS-only:
|
Summary
TS-side counterpart to PR #101 (Python SDK, merged) — brings
@synadia-ai/agent-serviceinto parity with the §6.4 spec tightening (nats-agent-sdk-docs@b1c6972), which requires an immediate{"type":"status","data":"ack"}chunk as the first message on the reply subject, before any latency-inducing handler work.What changes
agent-sdk/typescript/src/service.ts—#dispatchPromptnow publishesencodeChunk({type:"status",status:"ack"})synchronously after envelope decode and before invoking the handler. Comment block uses "leading ack" terminology for cross-SDK consistency with the Python side. The pre-existing periodic keep-alive interval is retained as §6.6 inactivity-timer defense;keepaliveIntervalSnow controls only the periodic cadence, not the leading ack.agent-sdk/typescript/test/integration/agent-service.test.ts— new testemits the §6.4 mandatory leading 'ack' as the first chunk before the handler runs: handler blocks on a promise, the test asserts the ack lands before the handler is released.try/finallyaround the pre-release assertions guarantees the suspended handler is unblocked even on assertion failure.agent-sdk/typescript/CHANGELOG.md—[Unreleased]entry mirroring the Python SDK'sagent-sdk/python/CHANGELOG.mdshape.Wire compatibility
Fully compatible. Callers already decode arbitrary
statuschunks via@synadia-ai/agents. The only observable change is one extraackevent at the head of every stream, which clients have always accepted.Test plan
agent-sdk/typescripttypecheck + eslint + prettier--check+ 38 tests (37 existing + 1 new) all green against a local NATS brokerclient-sdk/typescript280 tests still pass — wire decode untouched#dispatchPrompt