From 9db72dcad04a4ede6073a05b896458f3b859c7db Mon Sep 17 00:00:00 2001
From: Federico Kamelhar
- Build AI workflows that actually ship.
- Oracle Generative AI · Multi-Agent · Reasoning · Orchestrator SDK.
+ Build agents that reason and solve together.
+ The Oracle Gen AI Multi-Agent Reasoning SDK.
@@ -26,14 +26,17 @@
---
-Spin up a **swarm** of specialists. Hand a conversation off across an
-**escalation desk**. Run an **orchestrator** of experts in parallel.
-Wire up a **state graph** that loops until confident. Mesh agents
-**across processes** with A2A. Or just ship one self-correcting agent
-that knows when to stop.
+Reasoning lives inside the loop. **Reflexion** evaluates every turn.
+**Grounding** verifies every claim against its source. **Causal**
+traces root cause from symptom.
-Six multi-agent shapes plus A2A. One Oracle-native runtime. Every
-model on OCI the day it lands.
+Six shapes for six problems. **Compose** linear pipelines.
+**Orchestrate** specialists in parallel. **Swarm** for peer-to-peer
+research. **Handoff** for escalation desks. **StateGraph** loops
+until confident. **Functional** maps across agents. **A2A** meshes
+across processes.
+
+Every model on Oracle Generative AI the day it lands.
```bash
pip install "locus[oci]"
@@ -58,7 +61,7 @@ def book_flight(flight_id: str, customer_id: str) -> dict:
return billing.charge_and_book(flight_id, customer_id)
agent = Agent(
- model="oci:openai.gpt-5.5",
+ model="oci:openai.gpt-5",
tools=[search_flights, book_flight],
system_prompt="You are a travel concierge. Find a flight, then book it.",
reflexion=True, # self-correct mid-run
@@ -191,7 +194,7 @@ def book_meeting(date: str, attendees: list[str]) -> dict:
return calendar.book(date, attendees)
agent = Agent(
- model="oci:openai.gpt-5.5",
+ model="oci:openai.gpt-5",
tools=[get_today_date, book_meeting],
system_prompt="You are a scheduling assistant.",
)
diff --git a/docs/FEATURES.md b/docs/FEATURES.md
index a38f4c0..76d038f 100644
--- a/docs/FEATURES.md
+++ b/docs/FEATURES.md
@@ -1,93 +1,157 @@
-# Locus feature matrix
+# Capabilities
-What ships in `locus`, grouped by area.
+Everything `locus` ships, what it does, and where to find it.
+
+!!! oracle-distinctive "Distinctive to locus"
+ These ship as core primitives, inside the ReAct loop — not as
+ middleware, plugins, or third-party libraries:
+
+ - **Idempotent tools** — `@tool(idempotent=True)` dedupes on `(name, args)` inside the loop. No double-charge, double-book, double-page.
+ - **Reasoning loop nodes** — Reflexion, Grounding, Causal as first-class
+ Think → Execute → **Reflect** → Think nodes, not bolted-on libraries.
+ - **GSAR** — typed-grounding layer from [arXiv:2604.23366](https://arxiv.org/abs/2604.23366) with four-way claim partition + tiered replanning.
+ - **Termination algebra** — `MaxIterations(10) | TextMention("DONE") & ConfidenceMet(0.9)` is real Python (`__or__` / `__and__` operator overloads).
+ - **Six multi-agent shapes plus A2A** — Composition, Orchestrator, Swarm, Handoff, StateGraph, Functional + A2A for cross-process meshes.
+ - **OCI Generative AI day-zero** — two transports (V1 and native SDK), auto-routed by model id.
## Agent core
-| Feature | Surface |
-|---|---|
-| `Agent` + `AgentConfig` + `AgentResult` | `locus.agent` |
-| Composable termination algebra (`MaxIterations \| ToolCalled & ConfidenceMet`) | `locus.core.termination` |
-| Idempotent tools — `@tool(idempotent=True)` dedupes repeat calls | `locus.tools.decorator` |
-| Reflexion (`reflexion=True`) + Grounding (`grounding=True`) | `locus.reasoning` |
-| Causal chains (standalone graph builder) | `locus.reasoning.causal.CausalChain` |
-| Cancel signal (thread-safe `agent.cancel()`) | `Agent.cancel` |
-| Interrupts + resume (HITL) | `agent.run` yields `InterruptEvent`; `agent.resume(...)` |
-| Structured output (`output_schema=` Pydantic) | `locus.agent.config`, `locus.core.structured` |
-| Hooks lifecycle (before/after × invocation × tool × model + iteration) | `locus.hooks.provider` |
-| Plugin bundling (hooks + tools as one unit) | `locus.hooks.plugin` |
-
-## Memory
-
-| Feature | Backends |
-|---|---|
-| Native checkpointers | `MemoryCheckpointer`, `FileCheckpointer`, `HTTPCheckpointer`, `OCIBucketBackend` |
-| Storage-backed (auto-wrapped via `StorageBackendAdapter`) | `SQLiteBackend`, `RedisBackend`, `PostgreSQLBackend`, `OpenSearchBackend`, `OracleBackend` |
-| Conversation managers | `SlidingWindowManager`, `SummarizingManager`, `LLMCompactor` |
-| Long-term key-value store with optimistic locking (`version` counter) | `locus.memory.store` |
+| Feature | What it does | Surface |
+|---|---|---|
+| **Agent** + `AgentConfig` + `AgentResult` | The Think → Execute → Reflect → Terminate loop | `locus.agent` · [Agent loop](concepts/agent-loop.md) |
+| **Termination algebra** | Compose stop conditions with `&` and `\|` operator overloads | `locus.core.termination` · [Termination](concepts/termination.md) |
+| **Idempotent tools** | `@tool(idempotent=True)` dedupes repeat calls inside the loop — exactly-once side effects | `locus.tools.decorator` · [Idempotency](concepts/idempotency.md) |
+| **Reflexion** | Self-evaluation node in the ReAct cycle; rewrites the next turn when the last one was wrong | `Agent(reflexion=True)` · [Reasoning](concepts/reasoning.md) |
+| **Grounding** | LLM-as-judge claim verification against tool results; below-threshold triggers replanning | `Agent(grounding=True)` · [Reasoning](concepts/reasoning.md) |
+| **Causal chains** | Cause-effect graph builder with cycle/contradiction detection | `locus.reasoning.causal.CausalChain` · [Reasoning](concepts/reasoning.md) |
+| **GSAR** | Typed-grounding safety layer (arXiv:2604.23366) — four-way claim partition + tiered replanning | `Agent(gsar=GSARConfig(...))` · [GSAR](concepts/gsar.md) |
+| **Cancel** | Thread-safe abort during a run; emits `TerminateEvent` with reason | `agent.cancel()` · [Agent loop](concepts/agent-loop.md) |
+| **Interrupts (HITL)** | Pause via `InterruptEvent`; resume with `agent.resume(...)` | `locus.core.interrupt` · [Interrupts](concepts/interrupts.md) |
+| **Structured output** | Pass `output_schema=` (Pydantic), final answer is parsed into a typed instance | `locus.agent.config`, `locus.core.structured` · [Structured output](concepts/structured-output.md) |
+| **Hooks** | before/after × invocation × tool × model lifecycle observation + steering | `locus.hooks.provider` · [Hooks](concepts/hooks.md) |
+| **Plugins** | Bundle hooks + tools as one drop-in unit | `locus.hooks.plugin` · [Hooks](concepts/hooks.md) |
+
+## Multi-agent
+
+| Shape | What it does | Surface |
+|---|---|---|
+| **Composition** | Linear chain · fan-out + merge — the simplest multi-agent shape | `locus.multiagent.composition` · [Composition](concepts/multi-agent/composition.md) |
+| **Orchestrator** | One coordinator dispatches specialists in parallel | `locus.multiagent.orchestrator` · [Orchestrator](concepts/multi-agent/orchestrator.md) |
+| **Swarm** | Open-ended peer-to-peer collaboration | `locus.multiagent.swarm` · [Swarm](concepts/multi-agent/swarm.md) |
+| **Handoff** | Specialist-to-specialist context transfer with chain-of-custody | `locus.multiagent.handoff` · [Handoff](concepts/multi-agent/handoff.md) |
+| **StateGraph** | Cycles, conditional edges, subgraphs — when DAG isn't enough | `locus.multiagent.graph` · [StateGraph](concepts/multi-agent/graph.md) |
+| **Functional API** | Map / reduce over agents with `@task` and `@entrypoint` | `locus.multiagent.functional` · [Functional](concepts/multi-agent/functional.md) |
+| **A2A** | Cross-process agent meshes — `AgentCard` discovery + HTTP/SSE transport | `locus.a2a` · [A2A](concepts/multi-agent/a2a.md) |
+
+## Reasoning
+
+| Feature | What it does | Surface |
+|---|---|---|
+| **Reflexion** | After each turn, the agent self-evaluates and re-plans on wrong premises | `Agent(reflexion=True)` · [Reasoning](concepts/reasoning.md) |
+| **Grounding** | LLM-as-judge over claims vs the tool results that produced them | `Agent(grounding=True)` · [Reasoning](concepts/reasoning.md) |
+| **Causal** | Build a cause-effect graph from the trace; surface contradictions | `build_causal_chain()` · [Reasoning](concepts/reasoning.md) |
+| **GSAR** | Typed claim partition (cited / supported / unsupported / mismatched) + `proceed`/`regenerate`/`replan`/`abstain` decision | `Agent(gsar=GSARConfig(...))` · [GSAR](concepts/gsar.md) |
## Tools
-| Feature | Surface |
-|---|---|
-| `@tool` decorator with auto JSON-Schema | `locus.tools.decorator` |
-| Sequential / Concurrent / CircuitBreaker executors | `locus.tools.executor` |
-| Tool-result store offload (large outputs) | `locus.tools.result_storage` |
-| MCP — client + server | `locus.integrations.fastmcp` |
-| Path/URL safety helpers | `locus.tools.path_safety`, `locus.tools.url_safety` |
+| Feature | What it does | Surface |
+|---|---|---|
+| `@tool` decorator | Function → JSON-Schema-typed tool the model can call | `locus.tools.decorator` · [Tools](concepts/tools.md) |
+| Idempotent dedup | `@tool(idempotent=True)` skips repeat calls (same args) in the loop | `locus.tools.decorator` · [Idempotency](concepts/idempotency.md) |
+| **Sequential executor** | Run tool calls one at a time | `locus.tools.executor` · [Executors](concepts/executors.md) |
+| **Concurrent executor** | Run tool calls in parallel | `locus.tools.executor` · [Executors](concepts/executors.md) |
+| **CircuitBreaker executor** | Auto-disable a tool after N failures | `locus.tools.executor` · [Executors](concepts/executors.md) |
+| Result-store offload | Move large tool results to object storage; agent sees a pointer | `locus.tools.result_storage` |
+| Path / URL safety | Validate filesystem and network access from tool args | `locus.tools.path_safety`, `locus.tools.url_safety` · [Safety](concepts/safety.md) |
+| **MCP — client + server** | Talk to / be talked to by Anthropic-spec MCP servers | `locus.integrations.fastmcp` · [MCP](concepts/mcp.md) |
+
+## Memory — checkpointer backends
+
+| Backend | Best for | Surface |
+|---|---|---|
+| `MemoryCheckpointer` | Tests, REPL — in-process dict | `locus.memory.backends.memory` · [Checkpointers](concepts/checkpointers.md) |
+| `FileCheckpointer` | Local dev — JSON files on disk | `locus.memory.backends.file` |
+| `HTTPCheckpointer` | A remote checkpoint service you already run | `locus.memory.backends.http` |
+| **`OCIBucketBackend`** | OCI-native, lifecycle policies, region replication | `locus.memory.backends.oci_bucket` |
+| `SQLiteBackend` | Single-process durability | `locus.memory.backends.sqlite` |
+| `RedisBackend` | Multi-replica, fast, TTLs | `locus.memory.backends.redis` |
+| `PostgreSQLBackend` | Production DB with metadata queries | `locus.memory.backends.postgresql` |
+| `OpenSearchBackend` | Full-text search across past runs | `locus.memory.backends.opensearch` |
+| `OracleBackend` | Oracle DB with JSON queries | `locus.memory.backends.oracle` |
+
+## Memory — context management
+
+| Feature | What it does | Surface |
+|---|---|---|
+| `SlidingWindowManager` | Keeps the last N messages; drops the rest | `locus.memory.compactor` · [Conversation management](concepts/conversation-management.md) |
+| `SummarizingManager` | LLM rollup of older turns | `locus.memory.compactor` |
+| **`LLMCompactor`** | Budget-aware compaction with head + tail protection | `locus.memory.compactor` |
+| Long-term key-value store | Cross-run user prefs / results with optimistic-locking `version` counter | `locus.memory.store` |
## Hooks (built-in)
-`LoggingHook`, `StructuredLoggingHook`, `TelemetryHook` (OpenTelemetry),
-`NoOpTelemetryHook`, `ModelRetryHook`, `GuardrailsHook`,
-`ContentFilterHook`, `SteeringHook` — all import from
-`locus.hooks.builtin`.
+| Hook | What it does | Import |
+|---|---|---|
+| `LoggingHook` / `StructuredLoggingHook` | Stdlib / structured-JSON logs of every event | `locus.hooks.builtin` · [Observability](concepts/observability.md) |
+| **`TelemetryHook`** | OpenTelemetry traces + metrics (counters, histograms) | `locus.hooks.builtin` |
+| `NoOpTelemetryHook` | Opt-out variant for tests | `locus.hooks.builtin` |
+| `ModelRetryHook` | Auto-retry model calls on throttle/empty with exponential back-off | `locus.hooks.builtin` · [Retry](concepts/retry.md) |
+| **`GuardrailsHook`** | Block dangerous tools, redact PII, enforce content/topic policies | `locus.hooks.builtin` · [Safety](concepts/safety.md) |
+| `ContentFilterHook` | Standalone content moderation | `locus.hooks.builtin` |
+| **`SteeringHook`** | LLM-as-judge approval gate on every tool call | `locus.hooks.builtin` · [Safety](concepts/safety.md) |
-## Multi-agent
+## Streaming + Server
-`SequentialPipeline` / `ParallelPipeline` / `LoopAgent`
-(plus `sequential()`, `parallel()`, `loop()` helpers); `Orchestrator` +
-`Specialist`; `Swarm` + `SharedContext`; `Handoff` + `HandoffAgent`;
-`StateGraph` (cycles, conditional edges, subgraphs); Functional API
-(`@task` / `@entrypoint`); `A2AServer` + `A2AClient` + `AgentCard`.
+| Feature | What it does | Surface |
+|---|---|---|
+| **Typed events** | Frozen Pydantic events for `match`-statement consumers | `locus.core.events` · [Events](concepts/events.md) |
+| `StructuredStream` | Incremental Pydantic-partial parsing during streaming | `locus.core.structured` |
+| Console + SSE handlers | Render to terminal or stream over Server-Sent Events | `locus.core.events` · [Streaming](concepts/streaming.md) |
+| **`AgentServer`** | Drop-in FastAPI app: `/invoke`, `/stream`, `/threads/{id}`, `/health` | `locus.server` · [Agent Server](concepts/server.md) |
+| Per-principal threads | Bearer-token auth + thread-id namespacing prevents cross-tenant leaks | `AgentServer(api_key=...)` · [Agent Server](concepts/server.md) |
+| Graph streaming | Multi-agent state-graph event streams | `locus.multiagent.graph` · [Graph streaming](concepts/graph-streaming.md) |
## RAG
-Seven vector stores under `locus.rag.stores`: Chroma, in-memory,
-OpenSearch, Oracle 26ai, pgvector, Pinecone, Qdrant. Embeddings:
-`OCIEmbeddings`, `OpenAIEmbeddings`. Multimodal processors:
-`TextProcessor`, `ImageProcessor`, `PDFProcessor`, `AudioProcessor`,
-`MultimodalProcessor`.
+| Component | Options | Surface |
+|---|---|---|
+| Vector stores | Oracle 26ai · OpenSearch · pgvector · Qdrant · Pinecone · Chroma · in-memory | `locus.rag.stores` · [RAG](concepts/rag.md) |
+| Embeddings | `OCIEmbeddings` (Cohere) · `OpenAIEmbeddings` | `locus.rag.embeddings` |
+| Multimodal processors | Text · PDF (text + OCR) · Image (OCR) · Audio (transcription) | `locus.rag.multimodal` |
+| Tool wiring | `create_rag_tool(retriever)` exposes the retriever as a `@tool` | `locus.rag.tools` |
-## Streaming + Server
+## Models
-Typed events (`ThinkEvent`, `ModelChunkEvent`, `ToolStartEvent`,
-`ToolCompleteEvent`, `ReflectEvent`, `GroundingEvent`, `InterruptEvent`,
-`TerminateEvent`); `StructuredStream` (incremental Pydantic partials);
-console + SSE handlers; `AgentServer` with `/invoke`, `/stream`,
-`GET /threads/{id}`, `DELETE /threads/{id}`, `/health` and
-bearer-principal-scoped thread namespaces.
+| Provider | Models | Surface |
+|---|---|---|
+| **OCI Generative AI — V1 transport** | `openai.*`, `meta.*`, `xai.*`, `google.*`, `mistral.*` on OCI | `locus.models.providers.oci.openai_compat` · [OCI](concepts/providers/oci.md) |
+| **OCI Generative AI — SDK transport** | Cohere `command-r-*` series — proprietary chat shape | `locus.models.providers.oci.OCIModel` · [OCI](concepts/providers/oci.md) |
+| OpenAI | All commercial models (gpt-5, o-series, etc) | `locus.models.providers.openai` · [OpenAI](concepts/providers/openai.md) |
+| Anthropic | Claude 4 / 4.5 / 4.7 / 4.8 — direct API | `locus.models.providers.anthropic` · [Anthropic](concepts/providers/anthropic.md) |
+| Ollama | Local models | `locus.models.providers.ollama` · [Ollama](concepts/providers/ollama.md) |
+| Auto-routing | `get_model("oci:openai.gpt-5")` picks transport from id | `locus.models.registry.get_model` |
+| Decorators | Failover · pooled · cached · rate-limited wrappers over any provider | `locus.models.decorators` |
## Skills + Playbooks
-Three-tier skill disclosure (`SkillsPlugin`); `PlaybookEnforcer` with
-YAML / JSON / Python loaders; `Skill.from_directory()` activation.
-
-## Models
-
-`OpenAIModel`, `AnthropicModel`, `OllamaModel`, `OCIModel` (native SDK
-transport for Cohere R-series), `OCIOpenAIModel` (`/openai/v1` for
-openai.*/ meta.* / xai.*/ google.* / mistral.* on OCI). `get_model()`
-auto-routes by model id. Failover, pooled, caching, rate-limit
-decorators included.
+| Feature | What it does | Surface |
+|---|---|---|
+| **Skills** | AgentSkills.io progressive disclosure (catalog → instructions → resources) | `locus.skills.SkillsPlugin` · [Skills](concepts/skills.md) |
+| `Skill.from_directory()` | Load a folder of `SKILL.md` bundles | `locus.skills.models.Skill` |
+| **Playbooks** | Numbered execution plans with per-step `PlaybookEnforcer` | `locus.playbooks` · [Playbooks](concepts/playbooks.md) |
+| YAML / JSON / Python loaders | Author playbooks in any of three formats | `locus.playbooks.loader` |
## Evaluation
-`EvalCase`, `EvalRunner`, `EvalReport`, `EvalResult` — pass/score/duration
-reporting, custom evaluators, `expected_tools` / `expected_output_contains`
-matchers.
+| Class | What it does | Surface |
+|---|---|---|
+| `EvalCase` | A single test case — expected tools / output / iteration / duration budgets | `locus.evaluation` · [Evaluation](concepts/evaluation.md) |
+| `EvalRunner` | Runs a list of cases against an agent, returns `EvalReport` | `locus.evaluation` |
+| `EvalResult` | Per-case pass / score / duration + diagnostic checks | `locus.evaluation` |
+| `EvalReport` | Aggregate stats with `summary()` + JSON serialisation | `locus.evaluation` |
-## Source pointers
+## Where to next
-For depth on any feature, the README headlines link to its source
-directory; canonical entry is `src/locus/__init__.py`.
+- **For first-time visitors**: [Quickstart](how-to/quickstart.md) ships a working agent in five minutes.
+- **For architecture**: [Agent loop](concepts/agent-loop.md) is the canonical reference.
+- **For depth on any feature**: every row in this matrix links to its concept page. Source lives at [`src/locus/`](https://github.com/oracle-samples/locus/tree/main/src/locus); canonical entry is [`src/locus/__init__.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/__init__.py).
diff --git a/docs/concepts/hooks.md b/docs/concepts/hooks.md
index e5c8a79..32fc117 100644
--- a/docs/concepts/hooks.md
+++ b/docs/concepts/hooks.md
@@ -1,20 +1,47 @@
# Hooks
-Hooks observe and modify agent behavior at lifecycle points. Every
-hook inherits `HookProvider` and is registered in a `HookRegistry`.
-Events fire at six phases:
+Hooks are how you **observe and modify** agent behaviour at the
+moments that matter — before / after the run starts, before / after
+each model call, before / after each tool call. Every cross-cutting
+concern that *isn't* the agent's primary task lives here: logging,
+telemetry, retry policy, guardrails, PII redaction, LLM-as-judge tool
+approval.
-1. `on_before_invocation` — before the agent starts
-2. `on_after_invocation` — after the agent finishes
-3. `on_before_model_call` — before each model request
-4. `on_after_model_call` — after each model response
-5. `on_before_tool_call` — before each tool runs
-6. `on_after_tool_call` — after each tool completes
+You can use the ones locus ships (covers most production needs out
+of the box) or write your own — a hook is a small subclass with the
+methods it cares about.
-## Writing a hook
+## When to write a hook
+
+| You want… | Write a hook |
+|---|---|
+| Log every tool call to your aggregator | ✓ |
+| Add OpenTelemetry spans / metrics | ✓ — use the built-in `TelemetryHook` |
+| Retry model calls with backoff | ✓ — `ModelRetryHook` |
+| Reject tool calls that look dangerous | ✓ — `GuardrailsHook`, `ContentFilterHook`, `SteeringHook` |
+| Add a tool to the registry | use [`tools=[...]` on Agent](tools.md) |
+| Change the system prompt mid-run | hooks can read state but not mutate the prompt; use a [skill](skills.md) instead |
+
+## The six lifecycle phases
+
+A hook can subscribe to any of these. Each method receives a typed,
+write-protected event object.
+
+| Phase | Fires | Useful for |
+|---|---|---|
+| `on_before_invocation` | once, when `agent.run()` starts | initialise per-run state, open spans |
+| `on_after_invocation` | once, after the agent finishes | flush metrics, close spans |
+| `on_before_model_call` | before each request to the model | redact PII, count tokens |
+| `on_after_model_call` | after each response from the model | log usage, retry on empty |
+| `on_before_tool_call` | before each tool body runs | guardrails, audit, approval gates |
+| `on_after_tool_call` | after each tool body completes | log result, update metrics |
+
+## Getting started
+
+### 1. Subclass `HookProvider`
```python
-from locus.hooks.provider import HookProvider, HookPriority
+from locus.hooks.provider import HookPriority, HookProvider
class AuditHook(HookProvider):
name = "audit"
@@ -25,58 +52,145 @@ class AuditHook(HookProvider):
async def on_after_tool_call(self, event):
print(f"← {event.tool_name} = {event.result}")
-
-agent = Agent(..., hooks=[AuditHook()])
```
-## Priorities
-
-Hooks run in priority order (lower number first for `before_*`,
-reversed for `after_*` so teardown pairs with setup):
-
-| Range | Intended use |
-|---|---|
-| 0–99 | Security (guardrails, PII redaction) |
-| 100–199 | Observability (logging, telemetry) |
-| 200–299 | Business logic |
-| 300+ | Cosmetic |
+Override only the phases you care about. Unimplemented phases inherit
+no-op defaults from the base class.
-Use the constants in `HookPriority` instead of magic numbers.
+### 2. Pass to the agent
-## Write-protected events
+```python
+agent = Agent(
+ model="oci:openai.gpt-5.5",
+ tools=[search, book_flight],
+ hooks=[AuditHook()],
+)
+```
-Event objects are Pydantic models with frozen fields. You cannot
-accidentally mutate them from a hook. Methods that exist to let hooks
-steer the agent — cancelling a tool, retrying a model call — are
-explicit, so the intent is unambiguous.
+### 3. Run
-## Built-in hooks
+The hook fires automatically — no further wiring.
-Locus ships these out of the box:
+## What you get out of the box
-| Hook | What it does |
-|---|---|
-| `LoggingHook` / `StructuredLoggingHook` | Plain or JSON-structured logs at every phase |
-| `TelemetryHook` / `NoOpTelemetryHook` | OpenTelemetry spans + counters + histograms |
-| `ModelRetryHook` | Backoff retries on empty / rate-limited model responses |
-| `GuardrailsHook` / `ContentFilterHook` | PII / SQL / XSS / command-injection regex policies |
-| `SteeringHook` | LLM-as-judge tool approval (a second model votes before each tool call) |
+locus ships these hooks. Composed in this order, they cover most
+production needs without writing custom code.
```python
from locus.hooks.builtin import (
- GuardrailsHook,
- LoggingHook,
+ LoggingHook, StructuredLoggingHook,
+ TelemetryHook,
ModelRetryHook,
+ GuardrailsHook, ContentFilterHook,
SteeringHook,
- TelemetryHook,
)
agent = Agent(
- ...,
+ model="oci:openai.gpt-5.5",
+ tools=[...],
hooks=[
- LoggingHook(),
- ModelRetryHook(max_retries=3),
- GuardrailsHook(),
+ StructuredLoggingHook(), # JSON logs at every phase
+ TelemetryHook(), # OTel spans + metrics + histograms
+ ModelRetryHook(max_retries=3), # backoff on empty / rate-limited responses
+ GuardrailsHook(), # PII / SQL / XSS / command-injection
+ SteeringHook(approver=second_model), # LLM-as-judge tool approval
],
)
```
+
+### `LoggingHook` / `StructuredLoggingHook`
+
+Plain-text or JSON-structured logs at every lifecycle phase. Drop in
+when you want a paper trail without writing your own logger.
+
+### `TelemetryHook`
+
+OpenTelemetry spans for every model + tool call, counters for tool
+invocations, histograms for latency. Use `NoOpTelemetryHook` when
+you want the API surface but no actual export (useful for tests).
+
+### `ModelRetryHook`
+
+Backoff retries on empty model responses, rate-limit errors, and
+transient connection failures. Configurable `max_retries` and
+`backoff_seconds`. Doesn't intercept your tool calls — only the
+model layer.
+
+### `GuardrailsHook` / `ContentFilterHook`
+
+Regex-based policies on tool inputs (`GuardrailsHook`) and model
+outputs (`ContentFilterHook`). Catches PII, SQL injection patterns,
+shell-command injection, and credit-card-shaped strings. Reject or
+redact at the boundary.
+
+### `SteeringHook` — LLM-as-judge tool approval
+
+A *second model* sees each tool call before it runs and votes
+"approve / reject / rewrite". Use this when the cost of a wrong tool
+call is higher than the cost of a second model round-trip.
+
+```python
+agent = Agent(
+ ...,
+ hooks=[SteeringHook(approver="oci:openai.gpt-5.5")],
+)
+```
+
+## Priorities — the ordering rules
+
+Hooks run in priority order. Lower numbers run first on `before_*`
+phases; the order reverses for `after_*` so teardown pairs with
+setup.
+
+| Range | Intended use |
+|---|---|
+| `0`–`99` | **Security** — guardrails, PII redaction (must run first to short-circuit unsafe calls) |
+| `100`–`199` | **Observability** — logging, telemetry |
+| `200`–`299` | **Business logic** — domain-specific hooks |
+| `300+` | **Cosmetic** — pretty-printing, console UI |
+
+Use the constants in `HookPriority` (e.g. `HookPriority.SECURITY_MAX`,
+`HookPriority.OBSERVABILITY_MIN`) instead of magic numbers — the
+intent is more obvious in code review.
+
+## Write-protected events — by design
+
+Event objects are frozen Pydantic models. You **cannot** accidentally
+mutate them from a hook — try and you get a `ValidationError`. The
+methods that *do* let hooks steer the agent (`event.cancel()`,
+`event.retry()`, `event.replace_arguments(...)`) are explicit and
+named for what they do, so the intent is unambiguous in a review:
+
+```python
+async def on_before_tool_call(self, event):
+ if "DROP TABLE" in str(event.arguments):
+ event.cancel(reason="SQL injection blocked by GuardrailsHook")
+```
+
+Compare to a callback-based system where any code can monkey-patch
+any field; this is intentionally tight.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| Hook never fires | Forgot to pass it on `Agent(hooks=[...])`. The `HookRegistry` only sees what you register. |
+| Hook fires in the wrong order | Set `priority` explicitly. The default priority is intentionally mid-range so security hooks always come before yours. |
+| `ValidationError: cannot mutate frozen instance` | You tried to write `event.foo = bar`. Hooks observe, not mutate; use the explicit steering methods. |
+| `on_after_tool_call` doesn't see the result | The tool raised. Check `event.error` instead of `event.result`. |
+| Telemetry spans aren't exported | `TelemetryHook` needs an OTel exporter configured upstream — see [Observability](observability.md). |
+
+## Source and examples
+
+- [`HookProvider` and `HookOrchestrator`](https://github.com/oracle-samples/locus/blob/main/src/locus/hooks/provider.py)
+- [Built-in hooks](https://github.com/oracle-samples/locus/tree/main/src/locus/hooks/builtin)
+- [`tutorial_05_agent_hooks.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_05_agent_hooks.py) — write your first hook.
+- [`tutorial_27_hooks_advanced.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_27_hooks_advanced.py) — guardrails + steering, end to end.
+
+## See also
+
+- [Tools](tools.md) — the things hooks observe.
+- [Events](events.md) — the typed event objects hooks receive.
+- [Safety & guardrails](safety.md) — production policies built on `GuardrailsHook`.
+- [Observability](observability.md) — wiring `TelemetryHook` to your OTel collector.
+- [Retry strategies](retry.md) — how `ModelRetryHook` works under the hood.
diff --git a/docs/concepts/idempotency.md b/docs/concepts/idempotency.md
index 4e19c90..061c136 100644
--- a/docs/concepts/idempotency.md
+++ b/docs/concepts/idempotency.md
@@ -1,11 +1,36 @@
# Idempotency
-The single most important word in production agents is **once**. The
-model is allowed to retry; the side-effect isn't. locus makes that a
-one-keyword decision on the tool.
+> The single most important word in production agents is **once**.
+
+The model is *allowed* to retry. The side effect *isn't*. locus
+makes that distinction a one-keyword decision on the tool, enforced
+inside the ReAct loop. This is a locus-specific primitive — none of
+LangChain / LangGraph / CrewAI / Strands ship it.
+
+If you ever plan to run an agent that **books**, **charges**,
+**emails**, **pages**, or **writes**, this is the most important
+single page on the docs site.
+
+## When to use `idempotent=True`
+
+| Situation | `idempotent=True`? |
+|---|---|
+| Side-effecting tool with real-world cost (charge, email, page, book) | **yes — always** |
+| Database write you can't trivially roll back | **yes** |
+| External service that's already idempotent on its end | yes — locus dedupes the round-trip too |
+| Read-only catalogue lookup | no — re-reads are cheap, leave it to the model |
+| Tool that *intentionally* generates a new entity each call (e.g. `mint_uuid`) | no — that breaks the contract |
+
+## How it works
+
+Inside a single agent run, locus hashes the tool's
+`(name, arguments)` tuple as the model emits each call. **The first
+call with a given key hits the function body** and the result is
+recorded. **Every subsequent call with the same key short-circuits
+to the cached response** without invoking the body.
```python
-from locus.tools.decorator import tool
+from locus import tool
@tool(idempotent=True)
def transfer(from_acct: str, to_acct: str, amount: float) -> dict:
@@ -13,44 +38,103 @@ def transfer(from_acct: str, to_acct: str, amount: float) -> dict:
return ledger.transfer(from_acct, to_acct, amount)
```
-Inside a single agent run, locus hashes the tool's `(name, kwargs)`
-tuple. The first call hits the body and the result is cached. Every
-subsequent call with identical arguments — whether the model retried,
-got confused, or asked again on a later turn — short-circuits to the
-cached response.
+The argument hash is the trust boundary:
+
+- **Same call**: the model re-emits `transfer("A", "B", 100)` after
+ seeing the receipt → cache hit, body skipped.
+- **Different call**: the model emits `transfer("A", "B", 200)` →
+ different key, body runs.
+
+Caching is keyed on the **canonical JSON form** of the arguments, so
+key order, default values, and whitespace don't matter.
## Why this matters
-- **Booking, billing, payments.** The model that calls `book_flight`
- twice is more common than you think. Without idempotency you have a
- duplicate charge and an angry customer.
-- **Outbound side-effects.** `email_cfo`, `page_oncall`, `submit_po` —
- one and done.
-- **Database writes you can't easily roll back.**
+### Booking, billing, payments
+
+The model that calls `book_flight` twice in one run is more common
+than you think. Sometimes it sees an ambiguous tool result and tries
+again "to be sure". Sometimes the network glitches and the model
+believes the call failed. Without idempotency, you charge the
+customer twice and they're on the phone with their bank.
+
+```python
+@tool(idempotent=True)
+def book_flight(flight_id: str, customer_id: str) -> dict:
+ return billing.charge_and_book(flight_id, customer_id)
+```
+
+The customer gets billed once. Always.
-The argument hash is the trust boundary: if the model re-issues the
-*same* call, you fire once. If it changes any argument, that's a new
-call and the body runs.
+### Outbound side-effects
-## When to use it
+`email_cfo`, `page_oncall`, `submit_po`, `slack_alert` — anything
+that touches a human or a downstream system. **One and done**.
-| Situation | `idempotent=True`? |
+### Database writes you can't roll back
+
+Insert into a journal table, append to a Kafka topic, sign a JWT —
+operations where retrying isn't free. Idempotent tools turn the
+"exactly once" problem into a "not-our-problem-after-the-first-call"
+guarantee.
+
+### Replays after checkpoint resume
+
+When a checkpointer resumes a stalled run, the model may decide to
+re-issue tool calls it's already seen. Idempotent tools see the
+cache pre-populated from the checkpoint and skip the side effect on
+replay. (This requires `tool_executions` to be restored from the
+checkpoint; locus's [native checkpointers](checkpointers.md) handle
+it.)
+
+## What it is *not*
+
+| Concept | Idempotency is… | Idempotency is *not*… |
+|---|---|---|
+| Scope | within a single agent run | cross-run — restart and the cache is gone (use a [checkpointer](checkpointers.md)) |
+| Failure | one fire per identical call | retry — if the body raises, the exception propagates as the cached "result" |
+| Boundary | per-agent | network — two different agents both calling `transfer(a, b, 100)` each fire once |
+
+If you need cross-run idempotency, configure a checkpointer + an
+idempotent server-side endpoint. The combo gives you "the side
+effect runs at most once across all replays of all agents".
+
+## Practical recipe — vendor PO approval
+
+A canonical multi-agent idempotency shape: an agent (or three of
+them, debating) loops over a vendor decision, then writes once.
+
+```python
+@tool(idempotent=True)
+def submit_po(vendor_id: str, line_items: list[dict]) -> dict:
+ return procurement.submit(vendor_id, line_items)
+
+@tool(idempotent=True)
+def email_cfo(po_id: str, summary: str) -> str:
+ return mail.send(to="cfo@org.com", subject=f"PO {po_id}", body=summary)
+```
+
+The agent can iterate ten times reasoning about whether to approve.
+The PO ships once. The CFO email lands once. The model can fail
+mid-run and a checkpointer-backed resume re-issues the same calls;
+the side effects still fire exactly once.
+
+## Common gotchas
+
+| Symptom | Likely cause |
|---|---|
-| Side-effecting tool with a real-world cost (charge, email, page) | **yes** |
-| Read-only catalogue lookup | no — caching the model's reads is its problem, not yours |
-| Tool that *intentionally* generates a new entity each call (e.g. `mint_uuid`) | no |
-| External service that's already idempotent | yes anyway — locus dedupes the round-trip too |
+| Tool re-fires despite `idempotent=True` | Argument changed between calls. Check that the model isn't mutating ids / amounts between turns. |
+| Idempotent cache survives across runs unexpectedly | It shouldn't — only the checkpointer persists state. If you're seeing this, you're loading state from a checkpoint and don't want to. |
+| Body raised first time, cache returns the exception | This is by design — the failure is part of the "result" of the first call. The model sees the failure and can react. To re-attempt, the model must change an argument. |
+| Read-only lookup tagged `idempotent=True` | Harmless but wasteful — the cache hit savings are negligible vs the read itself. Leave it off. |
-## What it is not
+## Source and tutorial
-- It's not idempotency *across runs*. Restart the agent and the cache
- is gone — that's what your **checkpointer** is for.
-- It's not retry. If the body raises, the exception propagates.
-- It's not a network-layer cache. Two different agents calling
- `transfer(a, b, 100)` each fire once.
+- [`@tool` decorator with idempotency hook](https://github.com/oracle-samples/locus/blob/main/src/locus/tools/decorator.py)
+- [`_find_matching_execution`](https://github.com/oracle-samples/locus/blob/main/src/locus/loop/nodes.py#L114) — where the dedup actually happens, in the ReAct loop's Execute node.
+- [`tutorial_03_tools_and_state.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_03_tools_and_state.py) — walks through `@tool(idempotent=True)` end-to-end.
-## Source and tutorials
+## See also
-- `src/locus/tools/decorator.py` — the `@tool` decorator and idempotency hook.
-- Tutorial: [`tutorial_03_tools_and_state.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_03_tools_and_state.py)
- walks through `@tool(idempotent=True)` end-to-end.
+- [Tools](tools.md) — the full `@tool` decorator surface.
+- [Checkpointers](checkpointers.md) — durable runs where idempotency interacts with replay.
diff --git a/docs/concepts/mcp.md b/docs/concepts/mcp.md
index 3a69c9a..f0fec86 100644
--- a/docs/concepts/mcp.md
+++ b/docs/concepts/mcp.md
@@ -1,51 +1,166 @@
-# MCP (both ways)
+# MCP — Model Context Protocol
The [Model Context Protocol](https://modelcontextprotocol.io) is an
-Anthropic-spec interop standard for tools. locus speaks MCP in both
-directions.
+Anthropic-spec interop standard for tools. Define a tool once,
+expose it over MCP, and any MCP-compatible client (Claude Desktop,
+Cline, Strands, another locus agent) can call it. Or consume tools
+from existing MCP servers (filesystem, git, postgres, github,
+sequential-thinking) without writing any glue.
-## Consume MCP servers
+**locus speaks MCP both ways**. That's a deliberate differentiator —
+most agent frameworks consume MCP servers but don't expose their own
+tools as MCP. Round-trip means an agent built with locus can be
+either side of the conversation.
-`MCPClient` wraps an external MCP server's tools so the agent can call
-them as if they were native locus tools.
+## When to use MCP
+
+| You want… | Use MCP |
+|---|---|
+| Your locus agent to use Anthropic's published filesystem / git / postgres servers | ✓ — `MCPClient` |
+| Your `@tool` library to be callable by Claude Desktop / Cline / other agents | ✓ — `LocusMCPServer` |
+| Two locus agents to share tools across processes / machines | ✓ — works, but [A2A](multi-agent/a2a.md) is the better protocol |
+| In-process multi-agent — share tools by importing | use the [tools](tools.md) directly, not MCP |
+| Deterministic tests | use [Ollama](providers/ollama.md) + plain `@tool` — MCP adds I/O |
+
+## Getting started — consume an MCP server
+
+### 1. Install the MCP extras
+
+```bash
+pip install "locus[mcp]"
+```
+
+### 2. Spawn the server and wrap it with `MCPClient`
```python
from locus.integrations.fastmcp import MCPClient
-# spawn the MCP server as a subprocess (stdio transport)
-fs = MCPClient.stdio(command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "/data"])
+# Spawn Anthropic's filesystem server as a subprocess (stdio transport):
+fs = MCPClient.stdio(
+ command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "/data"],
+)
+```
+
+`MCPClient.stdio` runs the subprocess, opens an MCP session over its
+stdin/stdout, and discovers what tools the server exposes.
-agent = Agent(model=..., tools=[*fs.tools()]) # MCP tools become locus tools
+### 3. Pass the tools straight into an Agent
+
+```python
+from locus import Agent
+
+agent = Agent(
+ model="oci:openai.gpt-5.5",
+ tools=[*fs.tools()], # MCP tools become locus tools
+ system_prompt="You can read files in /data.",
+)
+result = agent.run_sync("Summarise the README in /data.")
```
-The client registers every MCP tool with locus's tool registry, with
-schema, descriptions, and call-through plumbing intact.
+`fs.tools()` returns a list of locus `Tool` objects with full
+schemas, descriptions, and call-through plumbing. The agent doesn't
+know they're MCP — they look like any other `@tool`.
-## Expose locus tools as MCP
+## Getting started — expose your tools as MCP
-`LocusMCPServer` turns a set of locus tools into an MCP server other
-agents can consume.
+### 1. Wrap a tool list in `LocusMCPServer`
```python
from locus.integrations.fastmcp import LocusMCPServer
server = LocusMCPServer(tools=[search_vendors, submit_po])
-server.run_stdio() # or .run_http(port=7400)
```
-Anthropic Claude, Strands, or any MCP-spec client can now call your
-locus tools.
+### 2. Pick a transport
+
+```python
+server.run_stdio() # for desktop clients
+server.run_http(port=7400) # for HTTP MCP clients
+```
+
+`run_stdio()` is what Claude Desktop, Cline, and most MCP clients
+expect. `run_http()` runs an HTTP MCP server (transport + JSON-RPC)
+that any HTTP MCP client can reach.
+
+### 3. Point a client at it
+
+For Claude Desktop, edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
+
+```json
+{
+ "mcpServers": {
+ "my-locus-tools": {
+ "command": "python",
+ "args": ["-m", "my_package.mcp_server"]
+ }
+ }
+}
+```
+
+Restart Claude Desktop. Your `search_vendors` and `submit_po` tools
+appear in the model's tool list.
+
+## What you get out of the box
+
+### Schema preservation
+
+`@tool`'s docstring + type hints become the MCP tool's name,
+description, and JSON schema — losslessly. The MCP client sees the
+same parameter types, defaults, and descriptions a locus agent
+would.
+
+### Both transports
+
+| Transport | Use case |
+|---|---|
+| **stdio** — process pipes | Desktop clients (Claude Desktop, Cline). The MCP server is spawned as a subprocess. |
+| **HTTP** — JSON-RPC over POST | Browser-side or networked clients. Good for shared tool servers. |
+
+### Idempotency carries through
+
+A tool tagged `@tool(idempotent=True)` keeps that semantic when
+exposed via MCP. The dedup happens locus-side; the MCP client
+doesn't need to know.
## Round-trip example
-A common shape: locus agent A consumes an MCP filesystem server, plus
-a locus agent B exposed as MCP that A can also call. Same client API,
-different transports.
+A common shape: a locus agent A consumes a filesystem MCP server,
+*and* exposes its own tools as MCP for another agent B to consume:
+
+```python
+# Agent A — consumes filesystem, exposes its own analytics tools
+fs = MCPClient.stdio(command=[...]) # consumer side
+analytics = LocusMCPServer( # producer side
+ tools=[summarise_csv, plot_histogram],
+)
+analytics.run_http(port=7400, in_background=True)
+
+agent_a = Agent(
+ model="oci:openai.gpt-5.5",
+ tools=[*fs.tools(), summarise_csv, plot_histogram],
+)
+```
+
+Same `MCPClient` API on the consumer side, same `LocusMCPServer` on
+the producer side, same tool definitions. The transport is an
+implementation detail.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `MCP server failed to start` | The MCP server subprocess crashed before establishing the session. Run the command manually to see the error. |
+| `Tool 'X' not found in MCP discovery` | The server exposes a different name than you expected. Print `[t.name for t in fs.tools()]` to see the actual list. |
+| `Schema validation failed on call` | MCP tool returned an arg type that doesn't match its declared schema. Common with hand-written MCP servers; the standard ones are fine. |
+| Claude Desktop doesn't show your locus tools | `claude_desktop_config.json` not picked up — check the file lives at the right path and Claude has been restarted. |
+| Hangs on `MCPClient.stdio` startup | The MCP subprocess is waiting for input on stdin (some servers expect a handshake). Pass `wait_for_init=True` and a timeout. |
-## Tutorial
+## Source and tutorial
-[`tutorial_12_mcp_integration.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_12_mcp_integration.py).
+- [`locus.integrations.fastmcp`](https://github.com/oracle-samples/locus/blob/main/src/locus/integrations/fastmcp.py) — built on FastMCP.
+- [`tutorial_12_mcp_integration.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_12_mcp_integration.py) — consumer + producer end-to-end.
-## Source
+## See also
-`src/locus/integrations/fastmcp.py` — built on FastMCP.
+- [Tools](tools.md) — the `@tool` decorator MCP wraps.
+- [A2A](multi-agent/a2a.md) — purpose-built protocol for cross-process locus-to-locus agent meshes.
diff --git a/docs/concepts/providers/openai.md b/docs/concepts/providers/openai.md
index d3c21c8..5810501 100644
--- a/docs/concepts/providers/openai.md
+++ b/docs/concepts/providers/openai.md
@@ -33,12 +33,12 @@ That's the only setup. locus reads the env var automatically.
```python
from locus import Agent
-agent = Agent(model="openai:gpt-5.5", system_prompt="You are helpful.")
+agent = Agent(model="openai:gpt-5", system_prompt="You are helpful.")
```
-The string `"openai:gpt-5.5"` does two things: tells locus to use the
+The string `"openai:gpt-5"` does two things: tells locus to use the
OpenAI provider (`openai:` prefix), and which model id to call
-(`gpt-5.5`). Any model id OpenAI accepts, locus accepts.
+(`gpt-5`). Any model id OpenAI accepts, locus accepts.
### 3. Run it
@@ -55,7 +55,7 @@ without further configuration.
### Chat completions across the GPT family
-Every chat-shaped OpenAI model: `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5.5`,
+Every chat-shaped OpenAI model: `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5`,
`gpt-image-1`. Vision input (image URLs / base64), audio input, and
function calling work the same way you'd use them on the OpenAI SDK
directly — locus just normalises the events the model emits.
@@ -107,7 +107,7 @@ class Answer(BaseModel):
confidence: float
agent = Agent(
- model="openai:gpt-5.5",
+ model="openai:gpt-5",
output_schema=Answer,
system_prompt="Reply as JSON matching the schema.",
)
diff --git a/docs/concepts/streaming.md b/docs/concepts/streaming.md
index babc0ce..7f367c3 100644
--- a/docs/concepts/streaming.md
+++ b/docs/concepts/streaming.md
@@ -1,13 +1,46 @@
# Streaming
-Every locus agent emits typed events as it runs. They are real
-classes, not strings — drop them into `match` statements and let the
-type checker verify your handler is exhaustive.
+Every locus agent emits a **typed event stream** as it runs. The
+events aren't strings or `dict[str, Any]` blobs — they're frozen
+Pydantic classes, designed to drop into a `match` statement and let
+your type checker verify the handler is exhaustive.
+
+This is the surface a UI consumes (live token rendering, tool-call
+indicators, reasoning bubbles), the surface telemetry hooks observe,
+and the surface `AgentServer` re-emits over Server-Sent Events for
+browsers.
+
+## When to consume the event stream
+
+| You want… | Use… |
+|---|---|
+| Live token-by-token rendering in a UI | `async for event in agent.run(...)` |
+| The final answer as a single value (tests, scripts, REPL) | `agent.run_sync(prompt).message` — no event handling |
+| Spans / metrics on every model + tool call | install [`TelemetryHook`](hooks.md#telemetryhook) |
+| To stream over HTTP to a browser | [`AgentServer`](server.md) re-emits as SSE |
+
+## Getting started
+
+### 1. Use `agent.run(prompt)` instead of `run_sync`
+
+```python
+async for event in agent.run("Plan a trip to Paris."):
+ print(event)
+```
+
+`agent.run(...)` returns an async iterator. Each iteration yields one
+event in the order it occurred.
+
+### 2. Pattern-match on the event types
```python
from locus.core.events import (
- ThinkEvent, ToolStartEvent, ToolCompleteEvent,
- ModelChunkEvent, ReflectEvent, TerminateEvent,
+ ThinkEvent,
+ ToolStartEvent,
+ ToolCompleteEvent,
+ ModelChunkEvent,
+ ReflectEvent,
+ TerminateEvent,
)
async for event in agent.run("Plan a trip to Paris."):
@@ -19,50 +52,126 @@ async for event in agent.run("Plan a trip to Paris."):
case ToolCompleteEvent(tool_name=n, result=r):
print(f" ↳ {r}")
case ModelChunkEvent(content=c) if c:
- print(c, end="", flush=True) # token-level streaming
+ print(c, end="", flush=True) # token-level streaming
case ReflectEvent(assessment=a, new_confidence=c):
print(f"🪞 {a} ({c:.2f})")
case TerminateEvent(final_message=m):
print(f"\n✅ {m}")
```
-## Event taxonomy
+`match` checks every branch against the event class. If you forget a
+branch your IDE underlines it; if you mistype a field name (e.g.
+`reasonng` instead of `reasoning`) you get a static error.
-| Event | When |
-|---|---|
-| `ThinkEvent` | Model emits reasoning (extended-thinking models). |
-| `ModelChunkEvent` | Each streamed text chunk. Pipe straight to a UI. |
-| `ToolStartEvent` | Agent decided to call a tool. |
-| `ToolCompleteEvent` | Tool returned (or raised). |
-| `ReflectEvent` | Reflexion loop emitted a self-evaluation. |
-| `GroundingEvent` | Grounding evaluation finished. |
-| `InterruptEvent` | A tool requested human-in-the-loop input. |
-| `TerminateEvent` | The run is done — terminal condition met. |
+## The event taxonomy
+
+| Event | When it fires | Useful for |
+|---|---|---|
+| `ThinkEvent` | The model emits reasoning (extended-thinking models like Claude 4 / o-series) | Render "thinking…" bubbles in a UI |
+| `ModelChunkEvent` | Each streamed text chunk from the model | Token-level live rendering |
+| `ToolStartEvent` | The agent decided to call a tool | Show a "calling X" indicator |
+| `ToolCompleteEvent` | A tool returned (or raised — check `error`) | Show the result inline |
+| `ReflectEvent` | Reflexion emitted a self-evaluation | Show "I'm checking my work" |
+| `GroundingEvent` | Grounding evaluation finished | Show "verifying claims" |
+| `InterruptEvent` | A tool requested human-in-the-loop input | Block on user approval |
+| `TerminateEvent` | The run finished — terminal condition met | Show the final answer |
+
+Every event carries an `event_type` discriminator and a UTC
+`timestamp`, so persisted streams replay deterministically.
+
+## Write-protected — by design
+
+Events are **frozen** Pydantic models. A hook can read every field;
+it **cannot** mutate one. Try and you get a `ValidationError`. If a
+hook wants to steer the agent (cancel a tool, retry a model call),
+it uses an explicit method on the event (`event.cancel()`,
+`event.retry()`, `event.replace_arguments(...)`) — the intent is
+visible in code review.
+
+Why this is important: in callback-based event systems any code can
+silently mutate a field and you find out three hops downstream when
+the value's wrong. locus's frozen events make that impossible.
-Every event carries `event_type` and a UTC `timestamp`.
+## Sync wrapper — when you don't need the stream
-## Write-protected
+```python
+result = agent.run_sync("What is 2+2?")
+print(result.message) # 'Four.'
+print(result.metrics.iterations)
+```
+
+`agent.run_sync(prompt)` consumes the event stream internally and
+returns the final `AgentResult`. The events still emit (hooks still
+fire), but you get a single value back. Use this in tests, REPLs,
+and scripts where the trace doesn't matter.
+
+## Practical recipe — render to a terminal UI
+
+```python
+async for event in agent.run("Find Q3 revenue and email it to me."):
+ match event:
+ case ToolStartEvent(tool_name=n):
+ print(f"\n🔧 {n}", end="", flush=True)
+ case ToolCompleteEvent(error=e) if e:
+ print(f" ✗ {e}")
+ case ToolCompleteEvent():
+ print(" ✓")
+ case ModelChunkEvent(content=c) if c:
+ print(c, end="", flush=True)
+ case TerminateEvent():
+ print()
+```
-Events are write-protected value objects. A hook *cannot* mutate one;
-the type system enforces it. If a hook needs to influence the run, it
-returns a control directive (e.g. `Cancel`, `Retry`).
+Every event class is a small Pydantic record — there's no hidden
+state. What you see is what gets serialised over SSE, what your
+checkpointer persists, what your structured logger records.
-## Sync wrapper
+## SSE over HTTP — for browser UIs
-If you don't want to consume events, `agent.run_sync(prompt)` returns
-the final `AgentResult` directly.
+The reference [`AgentServer`](server.md) maps the same event stream
+onto Server-Sent Events. Same `event_type`, same fields, just
+`Content-Type: text/event-stream` over HTTP.
-## SSE over HTTP
+```python
+from locus.server import AgentServer
+import uvicorn
+
+server = AgentServer(agent=agent)
+uvicorn.run(server.app, port=8000)
+```
+
+```javascript
+// Browser-side
+const es = new EventSource('/stream?prompt=...');
+es.addEventListener('ModelChunkEvent', (e) => {
+ const { content } = JSON.parse(e.data);
+ document.getElementById('out').innerText += content;
+});
+```
-The reference [AgentServer](server.md) maps the same events onto
-Server-Sent Events for browser consumption — same shape, different
-transport.
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `async for` exhausts immediately | You're calling `agent.run_sync()` (sync) instead of `agent.run()` (async). |
+| `ModelChunkEvent`s but no `TerminateEvent` | Generator was cancelled mid-stream. Check for exceptions in the consumer. |
+| Same event fires twice | A hook re-yielded an event it received. Hooks observe, they don't re-emit. |
+| Browser SSE drops every 30s | Default proxy timeout. Set `proxy_read_timeout` higher or have the agent send heartbeats. |
## Tutorials
-- [`tutorial_04_agent_streaming.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_04_agent_streaming.py)
-- [`tutorial_21_sse_streaming.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_21_sse_streaming.py)
+- [`tutorial_04_agent_streaming.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_04_agent_streaming.py) — your first event consumer.
+- [`tutorial_21_sse_streaming.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_21_sse_streaming.py) — full SSE wiring against `AgentServer`.
## Source
-`src/locus/streaming/` and `src/locus/core/events.py`.
+- [`locus.core.events`](https://github.com/oracle-samples/locus/blob/main/src/locus/core/events.py) — every event class.
+- [`Agent.run`](https://github.com/oracle-samples/locus/blob/main/src/locus/agent/agent.py) — the iterator that emits them.
+- [`AgentServer`](https://github.com/oracle-samples/locus/tree/main/src/locus/server) — the SSE wrapper.
+
+## See also
+
+- [Events](events.md) — full taxonomy in reference form.
+- [Hooks](hooks.md) — observe the same stream from inside the loop.
+- [Agent Server](server.md) — re-emit over HTTP/SSE.
+- [Graph streaming](graph-streaming.md) — multi-agent state-graph event streams.
diff --git a/docs/concepts/tools.md b/docs/concepts/tools.md
index 9143e15..0d80735 100644
--- a/docs/concepts/tools.md
+++ b/docs/concepts/tools.md
@@ -1,27 +1,69 @@
# Tools
-Tools are the agent's way of affecting the world. You write a regular
-Python function, decorate it, and pass it to `Agent(tools=[...])`. The
-`@tool` decorator introspects the signature and docstring to build a
-JSON-schema description the model can call.
+Tools are how a locus agent affects the world. The model decides
+*"call `search` with query='hnsw'"*; locus runs your `search`
+function, captures the return value, and feeds it back. From your
+side, a tool is **a regular Python function with a `@tool`
+decorator** — locus introspects the signature and docstring to build
+the schema the model sees.
+
+This is the seam most production code touches. Get tools right and
+the rest of the framework gets out of your way.
+
+## When to write a tool
+
+| You want… | Write a tool |
+|---|---|
+| The model to call your API / database / file system | ✓ |
+| Side-effecting actions the model should be able to invoke | ✓ |
+| Read-only lookups (catalogue search, status checks) | ✓ |
+| To mutate the agent's *internal* state (system prompt, config) | use a [hook](hooks.md), not a tool |
+| To intercept *every* tool call (logging, retry) | use a [hook](hooks.md) |
+
+## Getting started
+
+### 1. Decorate a function
```python
from locus import tool
@tool
def search(query: str, limit: int = 10) -> list[str]:
- """Search the knowledge base for `query`, up to `limit` results."""
+ """Search the knowledge base for ``query``, up to ``limit`` results."""
return backend.search(query, limit)
```
-The docstring becomes the tool description. Parameters are taken
-from the signature — type hints drive the JSON schema. Defaults are
-optional parameters.
+The docstring becomes the tool description the model reads. Type
+hints (`str`, `int`, `list[str]`) build the JSON schema. Defaults
+mark optional parameters.
+
+### 2. Pass to the agent
+
+```python
+agent = Agent(model="oci:openai.gpt-5.5", tools=[search])
+```
+
+That's the wiring. The model now sees `search` in its tool list and
+can call it whenever it decides to.
+
+### 3. Run it
+
+```python
+result = agent.run_sync("Find documents about HNSW.")
+```
+
+If the model decides to call `search("hnsw")`, locus invokes your
+function with that argument, captures the return value, and feeds it
+into the next model turn. You write Python; locus handles the
+schema marshalling.
-## Idempotent tools
+## What you get out of the box
-Some tools have side effects you never want duplicated — bookings,
-transfers, writes. Mark them idempotent:
+### Idempotent tools — the model can retry; the side effect can't
+
+This is locus's flagship tool primitive. Some side-effecting tools
+must run *exactly once* per logical request — bookings, charges,
+emails, paging. Mark them `idempotent=True`:
```python
@tool(idempotent=True)
@@ -33,42 +75,133 @@ def book_flight(flight_id: str, customer_id: str) -> dict:
```
When the model re-issues a tool call with the same
-`(name, arguments)` that already ran in this agent run, the ReAct
-loop reuses the prior result instead of invoking the function again.
-Useful for defending against:
+`(name, arguments)` tuple that already ran in this agent run, the
+ReAct loop **reuses the prior result instead of invoking the
+function again**. Defends against:
-- Models that repeat calls after seeing the result.
-- Network glitches where a call looks failed but actually succeeded.
+- Models that re-emit the same call after seeing the result.
+- Network glitches where a call appears failed but actually succeeded.
- Users re-prompting "do X" when X has already been done.
+- Replays after a checkpoint resume.
+
+Read the [idempotency concept page](idempotency.md) for the full
+picture and the matching tutorial.
+
+### Sync and async bodies
+
+Both shapes are supported. Async bodies run on the agent's event
+loop directly; sync bodies run in a thread-pool executor so the loop
+is never blocked.
+
+```python
+@tool
+def add(a: int, b: int) -> int:
+ return a + b # sync — runs in thread pool
+
+@tool
+async def fetch(url: str) -> str:
+ async with httpx.AsyncClient() as c:
+ return (await c.get(url)).text # async — runs on the loop
+```
+
+### Parallel by default — fast when the model wants multiple things
-This is a Locus-specific primitive; LangChain, LangGraph, and Strands
-do not ship it.
+```python
+agent = Agent(
+ model=...,
+ tools=[search_a, search_b, search_c],
+ tool_execution="concurrent", # default
+)
+```
+
+When the model emits multiple tool calls in one turn, locus runs
+them concurrently via `asyncio.gather`. Three independent searches
+finish in `max(t1, t2, t3)`, not `t1+t2+t3`.
+
+If your tools have side effects that must be ordered, switch to
+`tool_execution="sequential"`.
-## Custom names and descriptions
+### Error handling — tool failures don't crash the agent
-Override the defaults via keyword arguments:
+If a tool raises, the executor catches the exception, wraps it as a
+`ToolResult(success=False, error=...)`, and feeds it back into the
+next model turn. The model sees the failure and can react: retry,
+try a different tool, or report to the user.
```python
-@tool(name="find_customer", description="Look up a customer by email.")
-async def _find(email: str) -> Customer:
+@tool
+def lookup_by_id(id: str) -> dict:
+ record = db.get(id)
+ if record is None:
+ raise ValueError(f"no record with id={id}")
+ return record
+```
+
+The model sees `"no record with id=42"` and decides what to do.
+Behind the scenes, locus chains the original exception as the cause
+on a `ToolExecutionError` for your structured logs.
+
+### Custom names and descriptions
+
+Override the auto-derived defaults when the function name doesn't
+read well to the model:
+
+```python
+@tool(name="find_customer", description="Look up a customer by email address.")
+async def _find_customer_internal(email: str) -> Customer:
...
```
-Both sync and async bodies are supported. Sync bodies run in a
-thread-pool executor so the event loop is not blocked.
+The model sees `find_customer`; your code keeps the internal name.
+
+## Practical recipes
+
+### Read-only lookups
+
+```python
+@tool
+def get_order_status(order_id: str) -> dict:
+ """Return the current status and shipment info for an order."""
+ return orders.get(order_id)
+```
+
+No need for `idempotent=True` — read-only calls are safe to repeat.
+
+### Idempotent writes
+
+```python
+@tool(idempotent=True)
+def submit_po(vendor_id: str, line_items: list[dict]) -> dict:
+ """Submit a purchase order. Re-fires return the cached PO id."""
+ return procurement.submit(vendor_id, line_items)
+```
+
+### A tool that's also exposed via MCP
+
+If you've built a tool you want other agents to reach, expose it
+through `LocusMCPServer` — same `@tool`, no rewrite. See
+[MCP](mcp.md).
+
+## Common gotchas
-## Parallel vs sequential execution
+| Symptom | Likely cause |
+|---|---|
+| Model never calls the tool | Description / docstring isn't telling the model when to use it. Be explicit: *"Use this tool when the user asks about X."* |
+| Tool fires twice on the same input | You're seeing the model retry. Add `idempotent=True`. |
+| `TypeError: missing 1 required positional argument` at call time | Function signature has a parameter without a default that you didn't surface in the docstring; the model omitted it. Add a default or explain the parameter. |
+| Tool returns Python objects but the model echoes `<__main__.X object at 0x…>` | Tool return value isn't JSON-serialisable. Return a dict / Pydantic model / list of strings, not arbitrary objects. |
+| Async tool blocks the event loop | The "async" body is calling sync I/O. Wrap the blocking call in `asyncio.to_thread(...)` or use an async client. |
-The agent decides based on `config.tool_execution`:
+## Source
-- `"concurrent"` (default) — tool calls run in parallel via
- `asyncio.gather`.
-- `"sequential"` — tool calls run one at a time. Pick this when tool
- side effects must be ordered.
+- [`@tool` decorator and `Tool` class](https://github.com/oracle-samples/locus/blob/main/src/locus/tools/decorator.py)
+- [`ToolRegistry`](https://github.com/oracle-samples/locus/blob/main/src/locus/tools/registry.py)
+- [Built-in tools](https://github.com/oracle-samples/locus/tree/main/src/locus/tools/builtins) — `get_today_date`, `task_complete`, `ask_user`
-## Error handling
+## See also
-If a tool raises, the exception is caught at the executor boundary,
-wrapped as a `ToolResult(success=False, error=...)`, and passed to the
-model so it can react. The original exception is chained as the cause
-on a `ToolExecutionError` (see [Errors](errors.md)).
+- [Idempotency](idempotency.md) — the full story on `idempotent=True`.
+- [Hooks](hooks.md) — for cross-cutting concerns (logging, retry, guardrails).
+- [Executors](executors.md) — how concurrent vs sequential tool execution works.
+- [MCP](mcp.md) — expose your tools to other agents over the Model Context Protocol.
+- [Errors](errors.md) — how tool failures surface in the event stream.
diff --git a/docs/img/sequence-26ai.svg b/docs/img/sequence-26ai.svg
index 0173c8e..609a3ed 100644
--- a/docs/img/sequence-26ai.svg
+++ b/docs/img/sequence-26ai.svg
@@ -54,7 +54,7 @@