Skip to content

docs(agent-sdk): add Ollama LLM streaming example#133

Merged
M64GitHub merged 3 commits into
mainfrom
examples/ollama-agent
May 30, 2026
Merged

docs(agent-sdk): add Ollama LLM streaming example#133
M64GitHub merged 3 commits into
mainfrom
examples/ollama-agent

Conversation

@M64GitHub
Copy link
Copy Markdown
Contributor

What

Adds agent-sdk/typescript/examples/02-ollama.tsstep 2 of the agent-sdk example ladder, after 01-echo.ts.

It's deliberately the echo agent with one thing changed: the AgentService setup, connection resolution, and shutdown are identical; only the onPrompt body differs. Instead of a single echo: <prompt> reply, it forwards the prompt to a local Ollama and streams the model's answer back token by token.

// echo (01):  one reply
await response.send(`echo: ${envelope.prompt}`);

// ollama (02): a stream of replies
for await (const token of ollamaTokens(envelope.prompt)) {
  await response.send(token);
}

The single helper, ollamaTokens(), is an async generator that hits /api/generate with plain fetch (no new dependencies) and reassembles Ollama's newline-delimited JSON into tokens. The SDK owns stream termination, so the handler just returns when the loop ends.

Config

  • MODEL constant, default llama3.2, overridable via OLLAMA_MODEL
  • Host via OLLAMA_URL (default http://localhost:11434)

Context

Built as conference-talk material (reThinkConn) demonstrating the Synadia Agent Protocol in three small steps: echo → LLM prompt/response → tool calls. This is step 2; step 3 (tool calls) to follow.

Verification

  • bun run typecheck, bun run lint, bun run format:check — clean
  • Smoke-tested the streaming/parse path against a real local Ollama — tokens stream and reassemble correctly
  • Author confirmed it works end-to-end against the agent

README updated with the new row and a run note. No SDK surface, wire-format, or dependency changes — examples-only.

🤖 Generated with Claude Code

Add `02-ollama.ts` to the agent-sdk TypeScript examples — step 2 of the
example ladder after `01-echo.ts`. Same `AgentService` shape as the echo
agent; the only change is the `onPrompt` body, which forwards each prompt
to a local Ollama (`/api/generate`, plain `fetch`, no new deps) and streams
the model's reply back token by token via repeated `response.send(...)`.

Model is a `MODEL` constant (default `llama3.2`), overridable via
`OLLAMA_MODEL`; host via `OLLAMA_URL`. README updated with the new row and
a short run note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer Bot commented May 30, 2026

Claude finished @M64GitHub's task in 1m 50s —— View job


PR #133 Review — docs(agent-sdk): add Ollama LLM streaming example

  • Gather context and read changed files
  • Review 02-ollama.ts for correctness, security, and protocol compliance
  • Review README changes
  • Post inline comments and summary

Summary

The overall shape is correct. The streaming logic — TextDecoder with { stream: true }, newline-delimited JSON reassembly, and relying on the SDK's finally block to send the terminator — is sound. No SDK surface, wire-format, or dependency changes; protocol compliance is handled by the AgentService layer.

Two real issues found (both have inline suggestions):

  1. Missing Content-Type: application/json (line 39): The fetch POST sends a JSON body without the content-type header. Ollama typically accepts it anyway, but the header is required by HTTP semantics and some proxies/newer Ollama versions may enforce it.

  2. Spurious empty chunk from the done packet (line 51): Ollama's final {"done":true,"response":""} object causes one extra response.send("") over the wire. Not protocol-breaking, but filtering on done and skipping empty tokens is cleaner.

README update is clean — table row and run note are accurate and well-formatted.

body: JSON.stringify({ model: MODEL, prompt, stream: true }),
});
if (!res.ok || res.body === null) {
throw new Error(`Ollama request failed: ${res.status} ${res.statusText}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Content-Type: application/json header. Ollama happens to work without it in practice, but HTTP semantics require the header for a POST with a JSON body — some Ollama versions or proxies in front of it may reject the request.

Suggested change
throw new Error(`Ollama request failed: ${res.status} ${res.statusText}`);
body: JSON.stringify({ model: MODEL, prompt, stream: true }),
headers: { "Content-Type": "application/json" },
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ffc628e — added headers: { "Content-Type": "application/json" } to the POST.

buffer = lines.pop() ?? "";
for (const line of lines) {
if (line.trim() === "") continue;
yield (JSON.parse(line) as { response?: string }).response ?? "";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The done packet Ollama sends looks like {"done":true,"response":"","done_reason":"stop",...}. The response field is an empty string, so the generator yields "" one extra time and the handler calls response.send("") — sending a vacuous ResponseChunk over the wire. Not protocol-breaking, but worth filtering:

Suggested change
yield (JSON.parse(line) as { response?: string }).response ?? "";
const parsed = JSON.parse(line) as { response?: string; done?: boolean };
if (parsed.done) break;
const token = parsed.response ?? "";
if (token) yield token;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ffc628e. Verified against a live Ollama: the old path emitted 2 chunks (1 empty, the done packet); the fix yields only non-empty response text, so 1 chunk / 0 empties. To be explicit for anyone reading this thread — send("") was not the §6.5 stream terminator: it encodes to a 29-byte {"type":"response","data":""} chunk, whereas the SDK's terminator is a genuinely zero-byte body emitted automatically after the handler returns. So no early close; this was just trimming a vacuous chunk (also keeps the protocol visualizer clean for the demo).

- Send `Content-Type: application/json` on the Ollama POST — correct HTTP
  semantics; some Ollama versions / fronting proxies may reject without it.
- Skip Ollama's final `{"done":true,"response":""}` packet so the handler
  never streams a vacuous `{"type":"response","data":""}` chunk to the
  caller (verified against a live Ollama: trailing empty chunk eliminated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@M64GitHub
Copy link
Copy Markdown
Contributor Author

@claude - please review again

@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer Bot commented May 30, 2026

Claude finished @M64GitHub's task in 1m 19s —— View job


PR #133 Re-Review — docs(agent-sdk): add Ollama LLM streaming example

  • Read current state of 02-ollama.ts
  • Verify previous issues are resolved
  • Check for remaining issues
  • Post review summary

Both previous issues are resolved ✅

1. Content-Type header — Fixed at 02-ollama.ts:36:

headers: { "Content-Type": "application/json" },

2. Spurious empty chunk from done packet — Fixed at 02-ollama.ts:54-55:

const token = (JSON.parse(line) as { response?: string }).response ?? "";
if (token) yield token;

This filters at the generator level rather than checking done explicitly — functionally equivalent since Ollama's done packet carries response: "", and the approach is clearly commented.


One new issue found

README copy-paste bug — wrong subject in nats req example (README.md:55):

The nats req command under ### 02-ollama.ts still targets the echo agent subject:

nats req agents.prompt.echo.<you>.main "hello!" \

The Ollama agent registers as "ollama" (02-ollama.ts:70), so the subject should be:

nats req agents.prompt.ollama.<you>.main "hello!" \

The expected output block at README.md:62-65 also shows the echo agent's verbatim response ("echo: hello!"), which won't match what Ollama returns. A reader following these instructions to smoke-test the Ollama agent would hit the wrong subject and get confused. Fix this →


Overall

The implementation itself (02-ollama.ts) looks clean and correct after the fixes. The streaming logic, error handling, buffer management, and shutdown wiring are all sound. The README fix above is the only remaining item before merge.

The `02-ollama` section was inserted above echo's "drive it with `nats req`"
block, stranding echo's example (and its `agents.prompt.echo...` subject +
`echo: hello!` output) under the Ollama heading. Restore echo's drive/output
block to the echo section, and give the ollama section its own correct
`agents.prompt.ollama...` invocation with a representative streamed output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@M64GitHub
Copy link
Copy Markdown
Contributor Author

Good catch — fixed in 1ad9ec8. Root cause: the 02-ollama heading was inserted above echo's "drive it with nats req" block, which orphaned echo's example (subject + echo: hello! output) under the Ollama heading.

Fix: echo's drive/output block is back under the echo run section, and 02-ollama now has its own self-contained invocation against agents.prompt.ollama.<you>.main with a representative streamed output (one response chunk per token, then the terminator). prettier --check clean.

@M64GitHub M64GitHub merged commit 444c37f into main May 30, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant