From 91de528a253fedc5245894a007c9133482153c07 Mon Sep 17 00:00:00 2001
From: Federico Kamelhar <federico.kamelhar@oracle.com>
Date: Fri, 1 May 2026 20:26:17 -0400
Subject: [PATCH] docs(providers): rewrite OpenAI / Anthropic / Ollama / OCI
 pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The previous provider pages were correct but terse — capability tree
dumps with no narrative, weak "when to pick" guidance, and no
practical-workflow worked examples. This rewrite restructures every
page around the same easy-to-read template:

  1. One-paragraph "what this provider is + why you'd pick it"
  2. "When to pick X" table with cross-links to alternatives
  3. Getting started — three numbered steps
  4. What you get out of the box — each capability as natural prose
     plus a runnable code example, not a single-line bullet
  5. Practical workflow / gotcha section
  6. Common gotchas table
  7. Source links + See also

Per-page highlights
-------------------
- OpenAI: separated chat / o-series reasoning / streaming /
  tool-calling / structured-output as their own subsections; gateway
  tour (Azure / Portkey / LiteLLM / vLLM / together) explains when
  each one is the right pick.
- Anthropic: the prompt-caching cost story spelled out (1/10× input
  on cached span; ~5-min ephemeral window; auto when system >
  ~1024 tokens). Extended-thinking ThinkEvent example. Cross-link to
  Claude-on-OCI as a no-API-key alternative.
- Ollama: positioned as "develop offline / test deterministically /
  iterate before paying"; per-model tool-calling support table; the
  laptop→hosted swap-one-line workflow demonstrated.
- OCI: now leads with the value prop ("90+ models, day-0 model
  coverage, one auth surface"). Three-transport story (V1 / SDK /
  DAC) with a single ascii table showing routing by model-id
  pattern. DAC section integrates the new how-to. The "laptop dev →
  OKE production" workflow shows the value of one auth surface
  concretely.

Page sizes: 318 → 719 lines total.
- openai: 78 → 165
- anthropic: 71 → 186
- ollama: 73 → 167
- oci: 97 → 201

Validation
----------
- ``hatch run check`` clean.
- ``mkdocs build --strict`` not re-run here; docs.yml CI workflow
  catches link rot on the next push to main.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
---
 docs/concepts/providers/anthropic.md | 185 +++++++++++++++++++++-----
 docs/concepts/providers/oci.md       | 186 +++++++++++++++++++++------
 docs/concepts/providers/ollama.md    | 172 +++++++++++++++++++------
 docs/concepts/providers/openai.md    | 176 ++++++++++++++++++-------
 4 files changed, 560 insertions(+), 159 deletions(-)

diff --git a/docs/concepts/providers/anthropic.md b/docs/concepts/providers/anthropic.md
index 212a81c1..ab9a543b 100644
--- a/docs/concepts/providers/anthropic.md
+++ b/docs/concepts/providers/anthropic.md
@@ -1,71 +1,186 @@
 # Anthropic
 
-Direct calls to `api.anthropic.com` via `AnthropicModel`.
+The Anthropic provider connects locus directly to Anthropic's API
+(`api.anthropic.com`). Use it when you want **the Claude family** —
+Opus for the hardest problems, Sonnet as the everyday workhorse,
+Haiku for high-volume cheap calls — and want to talk to Anthropic
+without going through an intermediary.
 
-```python
-agent = Agent(model="anthropic:claude-sonnet", ...)
-```
+Two things make this provider distinct: **prompt caching** (long
+system prompts and tool blocks pay 1/10th the input cost on repeat
+turns) and **extended thinking** (Claude 4 surfaces its reasoning as
+a stream of typed events your UI can render).
+
+## When to pick Anthropic
+
+| You want… | This is the right provider |
+|---|---|
+| Claude Opus / Sonnet / Haiku from Anthropic directly | ✓ |
+| Long system prompts amortised across many turns | ✓ — automatic prompt caching |
+| Extended-thinking models with visible reasoning | ✓ — `ThinkEvent` stream |
+| Claude on Oracle infrastructure (no separate API key) | use [OCI](oci.md) — `oci:anthropic.claude-sonnet` |
+| GPT or Llama | use [OpenAI](openai.md) or [OCI](oci.md) instead |
+
+## Getting started
+
+### 1. Set your API key
 
 ```bash
 export ANTHROPIC_API_KEY=sk-ant-...
 ```
 
-## Capabilities
-
-```text
-anthropic:
-│
-├── Claude family       — opus · sonnet · haiku
-├── streaming           — real SSE, token-level
-├── tool calling        — Anthropic tool-use protocol
-├── structured output   — tool-as-schema pattern
-│
-├── prompt caching      — long system + tool blocks marked cacheable;
-│                         subsequent turns pay 1/10th input cost
-│
-└── extended thinking   — passes thinking blocks through as ThinkEvent
+### 2. Pick a Claude model
+
+```python
+from locus import Agent
+
+agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    system_prompt="You are a helpful assistant.",
+)
+```
+
+The string `"anthropic:claude-sonnet-4-20250514"` tells locus the
+provider (`anthropic:`) and the exact model id. Any model id
+Anthropic accepts, locus accepts — including the dated revision
+suffixes (`-20250514`).
+
+### 3. Run it
+
+```python
+result = agent.run_sync("Summarise the design doc in three bullets.")
+print(result.message)
+```
+
+That's the full setup. Streaming, tool calling, prompt caching, and
+extended thinking work without extra configuration.
+
+## What you get out of the box
+
+### The whole Claude family
+
+Whatever Anthropic ships, you can address by name:
+
+| Model | When to pick it |
+|---|---|
+| `claude-opus-4-…` | Hardest problems — code archaeology, deep research, multi-step reasoning |
+| `claude-sonnet-4-…` | Everyday workhorse — fast enough, smart enough, cheap enough |
+| `claude-haiku-4-…` | High-volume cheap calls — classification, routing, simple summaries |
+
+### Real SSE streaming
+
+Token-level streaming. The model emits content deltas; locus
+converts them to `ModelChunkEvent`s; your `async for` loop reads
+them as they arrive.
+
+```python
+async for event in agent.run("Write a haiku about latency."):
+    if isinstance(event, ModelChunkEvent) and event.content:
+        print(event.content, end="", flush=True)
+```
+
+### Tool calling — the Anthropic tool-use protocol
+
+`@tool` functions are translated into Anthropic's `tools` schema; the
+model's structured `tool_use` blocks are parsed back into locus
+`ToolCall`s. Parallel tool calls are supported (the model can
+request multiple tools per turn; locus runs them concurrently via
+the `ConcurrentExecutor`).
+
+### Structured output — tool-as-schema
+
+Anthropic doesn't expose a `response_format` field, so locus uses
+the standard "single-tool" trick: define the schema as a tool, force
+the model to call it. From your side, the API is identical to the
+other providers:
+
+```python
+from pydantic import BaseModel
+
+class Triage(BaseModel):
+    severity: str
+    needs_human: bool
+
+agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    output_schema=Triage,
+)
+result = agent.run_sync("This page is broken!")
+print(result.parsed)        # Triage(severity='high', needs_human=True)
 ```
 
-## Prompt caching
+### Prompt caching — automatic for long prompts
+
+This is the biggest cost saver if your system prompt or tool block is
+long (skills, playbooks, RAG context). Anthropic's prompt-caching
+mechanism marks a span of the request as cacheable; subsequent turns
+within the cache window pay **1/10th** the input cost on the cached
+span.
 
-locus marks long system prompts and tool blocks as cacheable
-automatically. Subsequent turns within the cache window pay 1/10th
-the input cost.
+locus reads the request shape and applies `cache_control` to anything
+beyond a small threshold automatically. You don't opt in.
 
-You don't have to opt in — locus reads the request shape and applies
-`cache_control` to anything beyond a small threshold. To force or
-suppress it, set `prompt_cache=True|False` on the model config.
+```python
+# Force or suppress caching explicitly:
+agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    model_config={"prompt_cache": True},   # or False to opt out
+)
+```
 
-## Extended thinking
+When it kicks in:
 
-When the model returns thinking blocks (Claude 4 models with
-`thinking_enabled`), locus emits a `ThinkEvent` per block in the
-event stream. Pipe it straight to your UI:
+- A 5-minute "ephemeral" cache (rolling window) — the default.
+- Subsequent turns reusing the same prefix pay `0.1× input rate` on
+  the cached portion.
+- Effective when system prompts > ~1024 tokens, or you've loaded a
+  big skill / playbook / RAG block.
+
+### Extended thinking — visible reasoning
+
+Claude 4 models with `thinking_enabled` think before answering, the
+way the OpenAI o-series does. Anthropic surfaces those thinking
+blocks in the response; locus emits a `ThinkEvent` for each one so
+your UI can show what the model is working on:
 
 ```python
 async for event in agent.run("..."):
     match event:
         case ThinkEvent(reasoning=r) if r:
             print(f"💭 {r}")
+        case ModelChunkEvent(content=c) if c:
+            print(c, end="", flush=True)
 ```
 
-## Claude on OCI
+## Claude on OCI — same model, different provider
 
-For Claude without a separate Anthropic API key, use the OCI
-transport instead — same `Agent`, different prefix:
+Don't have an Anthropic API key? Want Claude billed through your
+Oracle account on Oracle infrastructure? Switch the prefix:
 
 ```python
 agent = Agent(model="oci:anthropic.claude-sonnet", ...)
 ```
 
-This routes through `OCIOpenAIModel` and inherits OCI auth, so no
-`ANTHROPIC_API_KEY` is needed.
+That routes through `OCIOpenAIModel` against the OCI Generative AI
+endpoint (uses `OCI_PROFILE` for auth, no `ANTHROPIC_API_KEY` needed).
+Same model behind it; different billing surface.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `401 authentication_error` | `ANTHROPIC_API_KEY` not set, or set to a key without console access |
+| `404 not_found_error` on the model id | Dated revision suffix is wrong; check `https://docs.anthropic.com/en/docs/about-claude/models/all-models` |
+| `429 overloaded_error` | Anthropic capacity; the `ModelRetryHook` re-tries with backoff if installed |
+| Prompt caching not visible in usage stats | Cache window expired (5 min ephemeral) or prompt below the threshold |
+| `ThinkEvent`s never fire | Model not in the extended-thinking subset, or `thinking_enabled` not set in `model_config` |
 
 ## Source
 
-[`AnthropicModel` in `models/native/anthropic.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/anthropic.py).
+[`AnthropicModel` in `src/locus/models/native/anthropic.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/anthropic.py)
 
 ## See also
 
 - [Models overview](../models.md) — the full provider tree.
 - [OCI Generative AI](oci.md) — Claude via OCI.
+- [OpenAI](openai.md) — GPT family direct.
diff --git a/docs/concepts/providers/oci.md b/docs/concepts/providers/oci.md
index 40cb6876..e1761391 100644
--- a/docs/concepts/providers/oci.md
+++ b/docs/concepts/providers/oci.md
@@ -1,14 +1,41 @@
 # OCI Generative AI
 
-OCI is the day-1 target. **90+ models, two transports under one
-class hierarchy, day-0 model support.** When OCI ships a new model
-id, locus already supports it.
+OCI Generative AI is locus's **day-1 target** and the most capable
+provider in the box. It exposes 90+ models — OpenAI commercial
+families, Meta Llama, Anthropic Claude, Google Gemini, xAI Grok,
+Mistral, and Cohere — through Oracle's hosted inference service.
+**When OCI ships a new model id, locus already supports it** — you
+just pass the new id.
+
+The headline value over the direct providers:
+
+- **One auth surface.** Same `OCI_PROFILE` mechanism on a laptop, in
+  CI, or running on OCI Compute / OKE / Functions.
+- **Day-0 model coverage.** New OpenAI / Anthropic / Llama models
+  reach OCI on the day they're released.
+- **No per-provider API keys.** GPT, Claude, Llama all bill through
+  your OCI tenancy.
+- **Dedicated AI Cluster (DAC) endpoints** for predictable latency
+  and isolation when on-demand isn't enough.
+
+## When to pick OCI
+
+| You want… | This is the right provider |
+|---|---|
+| GPT, Claude, Llama, Cohere, Gemini, Grok, Mistral all in one place | ✓ |
+| Production inference on Oracle infrastructure (OKE / Compute / Functions) | ✓ |
+| One auth surface across laptop, CI, OCI workloads | ✓ |
+| Provisioned-capacity inference via [DAC](../../how-to/oci-dac.md) | ✓ |
+| To avoid managing per-provider API keys | ✓ |
+| Bleeding-edge OpenAI features the day they ship | use [OpenAI](openai.md) direct — OCI sometimes lags by hours/days |
+| Local development without auth setup | use [Ollama](ollama.md) instead |
 
-OCI exposes its inference service in two ways. locus speaks both
-and picks the right one automatically from the model id. You do not
-have to know which transport a model uses to call it.
+## Two transports under one prefix
 
-## Model families
+OCI Generative AI exposes its inference service in two ways. locus
+speaks both and **picks the right one automatically from the model
+id** — you don't have to know which transport a model uses to call
+it.
 
 ```text
 oci:                                 (one prefix · two transports)
@@ -21,77 +48,154 @@ oci:                                 (one prefix · two transports)
 │   ├─ google.*       — Google Gemini family
 │   └─ anthropic.*    — Anthropic Claude on OCI (no separate API key)
 │
-└── SDK transport · OCIModel         OCI Generative AI Python SDK
-    └─ cohere.command-r*  — Cohere R-series only (native API only)
+├── SDK transport · OCIModel         OCI Generative AI Python SDK
+│   └─ cohere.command-r*  — Cohere R-series only (native API only)
+│
+└── DAC endpoints     · OCIModel     DedicatedServingMode
+    └─ ocid1.generativeaiendpoint....   — provisioned capacity
 ```
 
-## V1 transport — `/openai/v1` (OpenAI-compatible)
+### V1 transport — `/openai/v1` (OpenAI-compatible)
 
 `OCIOpenAIModel` calls
 `https://inference.generativeai.<region>.oci.oraclecloud.com/openai/v1/chat/completions`.
 
-Used for the majority of OCI models: OpenAI commercial, Meta Llama,
-xAI Grok, Mistral, Google Gemini, and Anthropic on OCI. Real SSE
-streaming, OpenAI-style function calling, structured output. The
-wire format is identical to OpenAI's — anything you know about
-prompting OpenAI directly carries over.
+This is the **default path for the majority of OCI models**:
+OpenAI commercial, Meta Llama, xAI Grok, Mistral, Google Gemini, and
+Claude on OCI. The wire format is identical to OpenAI's, so anything
+you know about prompting OpenAI carries over: real SSE streaming,
+OpenAI-style function calling, structured output, vision input.
+
+```python
+agent = Agent(model="oci:openai.gpt-5.5")           # OpenAI commercial
+agent = Agent(model="oci:meta.llama-3.3-70b-instruct")  # Meta Llama
+agent = Agent(model="oci:anthropic.claude-sonnet")  # Claude — no Anthropic key needed
+```
 
-## SDK transport — OCI native API
+### SDK transport — OCI native API
 
-`OCIModel` calls the OCI Generative AI Python SDK directly. Used
-**only for Cohere R-series** (`cohere.command-r-*`), which OCI
+`OCIModel` calls the OCI Generative AI Python SDK directly. It's
+used **only for Cohere R-series** (`cohere.command-r-*`), which OCI
 exposes through the native API rather than the OpenAI-compatible
-gateway.
+gateway. Cohere R has its own request shape (separate `message` +
+`chat_history` instead of a flat `messages` array).
 
-## Transport selection — automatic
+```python
+agent = Agent(model="oci:cohere.command-r-plus-08-2024")  # SDK transport
+```
+
+### DAC endpoints — dedicated capacity
+
+When you've provisioned a Dedicated AI Cluster (DAC), OCI gives you
+a **generative AI endpoint OCID**. Pass it as the model id and locus
+auto-routes through the SDK transport with `DedicatedServingMode`:
 
 ```python
-# Both work; the transport is picked from the model id:
-agent = Agent(model="oci:openai.gpt-5.5")           # → V1 (OCIOpenAIModel)
-agent = Agent(model="oci:cohere.command-r-plus")    # → SDK (OCIModel)
+agent = Agent(
+    model=get_model(
+        "oci:ocid1.generativeaiendpoint.oc1.<region>....",
+        compartment_id="ocid1.compartment.oc1...",
+        profile_name="DEFAULT",
+    ),
+)
 ```
 
-Override with `LOCUS_OCI_TRANSPORT=v1` or `=sdk` if you ever need to
-force one path.
+[Full DAC how-to →](../../how-to/oci-dac.md) — covers Qwen-on-DAC,
+streaming, tool-call quirks per model.
 
-## Auth — one surface for every environment
+## Transport selection — automatic
 
-Same `OCI_PROFILE` mechanism on the laptop, in CI, and on OCI
-workloads. `OCI_AUTH_TYPE` selects the signer:
+You don't pick the transport. locus looks at the model id and
+chooses:
 
-| Auth type | Where it works |
+| Model id pattern | Transport |
 |---|---|
-| `api_key` | Laptop with `~/.oci/config` profile |
-| `session_token` | Federated SSO laptop · `oci session authenticate` |
-| `instance_principal` | OCI Compute · OKE pods |
-| `resource_principal` | OCI Functions · serverless |
+| `ocid1.generativeaiendpoint....` | SDK + `DedicatedServingMode` (DAC) |
+| `cohere.command-r-*` | SDK + `OnDemandServingMode` |
+| `openai.*` / `meta.*` / `xai.*` / `mistral.*` / `google.*` / `anthropic.*` | V1 (OpenAI-compatible) |
+
+Need to override? Set `LOCUS_OCI_TRANSPORT=v1` or `LOCUS_OCI_TRANSPORT=sdk`.
+
+## One auth surface — laptop, CI, OCI workloads
+
+Same `OCI_PROFILE` env var everywhere. `OCI_AUTH_TYPE` selects the
+signer:
+
+| Auth type | Where it works | What you set |
+|---|---|---|
+| `api_key` | Laptop with `~/.oci/config` profile | `OCI_AUTH_TYPE=api_key`, `OCI_PROFILE=DEFAULT` |
+| `session_token` | Federated SSO laptop | `oci session authenticate` first; then `OCI_AUTH_TYPE=session_token` |
+| `instance_principal` | OCI Compute · OKE pods | `OCI_AUTH_TYPE=instance_principal` (no key file needed) |
+| `resource_principal` | OCI Functions · serverless | `OCI_AUTH_TYPE=resource_principal` (provider-injected) |
 
 ```bash
 export OCI_PROFILE=DEFAULT
-export OCI_AUTH_TYPE=api_key      # or session_token / instance_principal / resource_principal
+export OCI_AUTH_TYPE=api_key
 ```
 
-No code change between environments — only the env var differs.
+**No code change between environments — only the env var differs.**
+That's the value: prototype on your laptop, deploy to OKE, route
+through Compute. Same `Agent` instance, same model id, three
+different signers.
 
 ## Region
 
 OCI Generative AI is offered in `us-chicago-1`, `eu-frankfurt-1`,
-`uk-london-1`, `sa-saopaulo-1`, and a growing list. Pass `OCI_REGION`
-to override the region baked into your profile:
+`uk-london-1`, `sa-saopaulo-1`, and a growing list. The region baked
+into your profile is the default; override with `OCI_REGION`:
 
 ```bash
 export OCI_REGION=us-chicago-1
 ```
 
+## Practical wiring — laptop dev → OKE production
+
+```python
+# Same code on your laptop and on OKE:
+from locus import Agent
+
+agent = Agent(
+    model="oci:openai.gpt-5.5",
+    system_prompt="You are a helpful assistant.",
+)
+```
+
+```bash
+# Laptop:
+export OCI_PROFILE=DEFAULT
+export OCI_AUTH_TYPE=api_key
+
+# OKE pod:
+export OCI_AUTH_TYPE=instance_principal
+# (no profile / key file — OKE injects the principal at runtime)
+```
+
+The agent doesn't care. That's the OCI provider's whole pitch.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `404 Not Authorized` (yes, 404 not 403) | OCI's standard permission-denied disguise. Your principal lacks `inspect generative-ai-endpoints` policy in the compartment. |
+| `model_id not found` | Model id doesn't exist in your tenancy's region. Check `oci generative-ai model list --region <region>`. |
+| `compartment_id is required` | DAC endpoints enforce it even when on-demand wouldn't. Pass `compartment_id=` on the model. |
+| Streaming yields one big chunk | DAC endpoint rejected `is_stream`. The fall-back path swallows the failure and emits the full response as one chunk; check `OCI_LOG_REQUESTS=1`. |
+| Cohere R model fails on V1 | Force the SDK transport: `LOCUS_OCI_TRANSPORT=sdk`. |
+
 ## Source
 
 | | |
 |---|---|
-| `OCIOpenAIModel` (V1) | [`models/providers/oci/openai_compat.py:163`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/providers/oci/openai_compat.py#L163) |
-| `OCIModel` (SDK) | [`models/providers/oci/__init__.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/providers/oci/__init__.py) |
-| Submodel providers (Cohere, Generic) | [`models/providers/oci/models/`](https://github.com/oracle-samples/locus/tree/main/src/locus/models/providers/oci/models) |
+| `OCIOpenAIModel` (V1) | [`src/locus/models/providers/oci/openai_compat.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/providers/oci/openai_compat.py) |
+| `OCIModel` (SDK + DAC) | [`src/locus/models/providers/oci/__init__.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/providers/oci/__init__.py) |
+| Per-family request builders | [`src/locus/models/providers/oci/models/`](https://github.com/oracle-samples/locus/tree/main/src/locus/models/providers/oci/models) |
+| Routing | [`src/locus/models/registry.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/registry.py) — `_make_oci()` |
 
 ## See also
 
-- [OCI GenAI models how-to](../../how-to/oci-models.md) — auth setup, region selection, debugging.
 - [Models overview](../models.md) — the full provider tree.
+- [OCI GenAI models how-to](../../how-to/oci-models.md) — auth setup, region selection, debugging.
+- [OCI Dedicated AI Cluster (DAC)](../../how-to/oci-dac.md) — provisioned-capacity endpoints.
+- [OpenAI](openai.md) — direct OpenAI when OCI lags.
+- [Anthropic](anthropic.md) — Claude direct when OCI lags.
+- [Ollama](ollama.md) — local development before swapping to OCI.
diff --git a/docs/concepts/providers/ollama.md b/docs/concepts/providers/ollama.md
index f7114a38..48ca4069 100644
--- a/docs/concepts/providers/ollama.md
+++ b/docs/concepts/providers/ollama.md
@@ -1,73 +1,167 @@
 # Ollama
 
-Local model runtime. `OllamaModel` calls a local Ollama server.
+The Ollama provider is **locus pointed at a local model runtime**.
+Ollama runs open-weight models on your laptop or a shared GPU box;
+locus calls it over HTTP exactly the way it would call OpenAI or
+Anthropic. No API key, no network egress, no per-token billing.
+
+This is the right pick for **offline development**, **deterministic
+tests**, and **iterating on agent design before you spend a dollar
+on hosted inference**.
+
+## When to pick Ollama
+
+| You want… | This is the right provider |
+|---|---|
+| To develop offline — laptop, plane, isolated network | ✓ |
+| Deterministic tests — same prompt + seed → same output | ✓ |
+| Cost-free agent iteration before swapping to a paid API | ✓ |
+| Privacy-sensitive prototyping where data can't leave the machine | ✓ |
+| A frontier model (GPT-5, Claude Opus 4) | use [OpenAI](openai.md) or [Anthropic](anthropic.md) |
+| Production-scale concurrency | use [OCI](oci.md), [OpenAI](openai.md), [Anthropic](anthropic.md) |
+
+## Getting started
+
+### 1. Install Ollama and pull a model
+
+Ollama itself isn't a Python package — it's a small binary that runs
+a local HTTP server.
+
+```bash
+# macOS (Homebrew) — or download from ollama.com
+brew install ollama
+
+# Start the server (it backgrounds itself):
+ollama serve &
+
+# Pull a model with native tool-calling support:
+ollama pull llama3.3
+```
+
+`ollama list` will show what you've pulled. Anything in that list is
+addressable from locus immediately.
+
+### 2. Wire locus
 
 ```python
-agent = Agent(model="ollama:llama3.2", ...)
+from locus import Agent
+
+agent = Agent(model="ollama:llama3.3", system_prompt="You are helpful.")
 ```
 
-```bash
-export OLLAMA_HOST=http://localhost:11434   # default
+That's it. No env vars, no auth — Ollama is local-first by default.
+
+### 3. Run it
+
+```python
+result = agent.run_sync("Sum 7 plus 35 in one word.")
+print(result.message)
+# → '42.'
 ```
 
-No API key — Ollama is local-first.
+Done. Streaming and tool calling work the same as for any other
+provider — provided the model you pulled supports them.
 
-## Capabilities
+## What you get out of the box
 
-```text
-ollama:
-│
-├── any pulled local model     — run `ollama list` to see what is installed
-├── streaming                  — token-level via the local SSE stream
-├── tool calling               — works for any model that supports it
-│                                (llama3.1+, qwen2.5, mistral, deepseek-r1)
-└── auth                       — none · OLLAMA_HOST=http://localhost:11434 (default)
+### Any pulled local model — no locus change needed
+
+The `model_id` after `ollama:` is whatever appears in `ollama list`.
+locus doesn't maintain an allow-list; if Ollama can run it, locus
+can address it.
+
+```bash
+ollama list
+# llama3.3:latest
+# qwen2.5-coder:32b
+# deepseek-r1:14b
 ```
 
-Whatever you `ollama pull` is available immediately. No locus change
-needed.
+```python
+agent_a = Agent(model="ollama:llama3.3")
+agent_b = Agent(model="ollama:qwen2.5-coder:32b")
+agent_c = Agent(model="ollama:deepseek-r1:14b")
+```
 
-## When to use Ollama
+### Real local streaming
 
-- **Offline development** — laptops, planes, isolated networks.
-- **Deterministic tests** — no network egress; same model, same
-  prompt, same seed → same output.
-- **Cost-control sandboxing** — iterate on agent design with a
-  free local model before swapping to a paid API.
-- **Privacy-sensitive prototyping** — data never leaves the machine.
+Ollama emits SSE-shaped chunks; locus reads them as `ModelChunkEvent`s
+just like any other provider. Token-level streaming over localhost
+is fast — typically <5 ms per chunk.
 
-## Tool calling
+```python
+async for event in agent.run("Write a haiku about caching."):
+    if isinstance(event, ModelChunkEvent) and event.content:
+        print(event.content, end="", flush=True)
+```
 
-Tool calling support is per-model in Ollama. As of writing:
+### Tool calling — model-dependent
+
+Ollama supports tool calling for models that emit it natively. As of
+writing:
 
 | Model family | Tool calling |
 |---|---|
-| llama3.1+, llama3.2, llama3.3 | ✅ |
-| llama4 | ✅ |
-| qwen2.5, qwen2.5-coder | ✅ |
-| mistral, mixtral | ✅ |
-| deepseek-r1 | ✅ (with reasoning) |
-| phi3 | ❌ (no native tool calling) |
+| `llama3.1` / `llama3.2` / `llama3.3` | ✓ |
+| `llama4` | ✓ |
+| `qwen2.5` / `qwen2.5-coder` / `qwen3` | ✓ |
+| `mistral` / `mixtral` | ✓ |
+| `deepseek-r1` | ✓ (with reasoning) |
+| `phi3` | ✗ — no native tool calling |
 
-If a model doesn't support native tool calling, the agent will
-still run but the model can't invoke tools — the loop terminates
-on the first turn.
+If a model doesn't support tool calling, the agent will still **run** —
+it just won't be able to invoke any `@tool` you defined. The loop
+then terminates after the first turn (no tools called, no follow-up
+needed).
 
-## Custom Ollama server
+### No auth — by design
 
-Override the host for a remote Ollama (e.g., a shared GPU box):
+Ollama listens on `localhost:11434` with no authentication. That's
+intentional for the local-first use case. To run against a shared
+remote Ollama:
 
 ```bash
 export OLLAMA_HOST=http://gpu-box.internal:11434
 ```
 
-The same `OllamaModel` class works against any HTTP-reachable Ollama
-endpoint.
+The same `OllamaModel` class talks to any HTTP-reachable Ollama
+endpoint. (If you're exposing a remote Ollama, put it behind a VPN
+or auth proxy yourself — Ollama doesn't ship one.)
+
+## Practical workflow — develop local, ship hosted
+
+A common pattern: prototype an agent against Ollama for free, then
+swap one line to point at OCI / OpenAI / Anthropic for production.
+
+```python
+# Development:
+agent = Agent(model="ollama:llama3.3", tools=[...], system_prompt="...")
+
+# Production — same agent, swap the model id:
+agent = Agent(model="oci:openai.gpt-5.5", tools=[...], system_prompt="...")
+```
+
+Everything else — tools, hooks, checkpointers, termination, RAG —
+stays identical. You're not coupled to the local runtime; Ollama is
+just a model address.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `Connection refused` on `localhost:11434` | Ollama server isn't running. `ollama serve &` in another terminal. |
+| `model 'X' not found` | Haven't pulled it yet. `ollama pull X`. |
+| Slow first response after hours of idle | Ollama unloads models from VRAM after inactivity. The first call after a long pause re-loads (a few seconds). |
+| Tool calls never fire | The model you pulled doesn't support tools (e.g. `phi3`). Switch to `llama3.3` or `qwen2.5`. |
+| `tool_calls` parsed as text instead of structured | Some Ollama versions emit XML-style `<tool_call>{...}</tool_call>` blocks. Update Ollama (`brew upgrade ollama`) or use a model with stable structured tool-call output. |
+| Different output every run despite the same prompt | Set `temperature=0` and pin `seed` in `model_config`. |
 
 ## Source
 
-[`OllamaModel` in `models/native/ollama.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/ollama.py).
+[`OllamaModel` in `src/locus/models/native/ollama.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/ollama.py)
 
 ## See also
 
 - [Models overview](../models.md) — the full provider tree.
+- [OpenAI](openai.md) — GPT family direct.
+- [OCI Generative AI](oci.md) — production-scale OCI inference.
diff --git a/docs/concepts/providers/openai.md b/docs/concepts/providers/openai.md
index 3109c8b5..d3c21c8e 100644
--- a/docs/concepts/providers/openai.md
+++ b/docs/concepts/providers/openai.md
@@ -1,77 +1,165 @@
 # OpenAI
 
-Direct calls to `api.openai.com` via `OpenAIModel`.
+The OpenAI provider connects locus directly to OpenAI's API
+(`api.openai.com`). It's what you reach for when you want **the latest
+OpenAI model the day it ships** without going through any gateway,
+translation layer, or middleware.
 
-```python
-agent = Agent(model="openai:gpt-5.5", ...)
-agent = Agent(model="openai:o3", ...)             # reasoning model
-```
+It's also the **fastest way to try locus** — one env var, one line of
+code, you're talking to GPT-5 or the o-series reasoning models.
+
+## When to pick OpenAI
+
+| You want… | This is the right provider |
+|---|---|
+| GPT-5, GPT-4o, or any latest OpenAI release | ✓ |
+| The o-series reasoning models (`o3`, `o4-mini`) | ✓ |
+| To go through Azure / Portkey / LiteLLM / vLLM | ✓ — same class, different `base_url` |
+| Claude or Llama | use [Anthropic](anthropic.md) or [OCI](oci.md) instead |
+| To run on Oracle infrastructure | use [OCI](oci.md) — you'll get the same OpenAI models without a separate key |
+
+## Getting started
+
+### 1. Set your API key
 
 ```bash
 export OPENAI_API_KEY=sk-...
 ```
 
-## Capabilities
-
-```text
-openai:
-│
-├── chat completions   — gpt-* family (vision, audio, structured output)
-├── reasoning models   — o-series (adds reasoning_effort: low | medium | high)
-├── streaming          — real SSE, token-level
-├── tool calling       — OpenAI tool-call protocol
-├── structured output  — response_model / JSON schema
-│
-└── base_url override  — any OpenAI-compatible gateway
-    ├─ Azure OpenAI
-    ├─ Portkey
-    ├─ LiteLLM proxy
-    ├─ vLLM (self-hosted)
-    └─ together.ai · fireworks · groq · any /v1-shaped endpoint
+That's the only setup. locus reads the env var automatically.
+
+### 2. Pick a model
+
+```python
+from locus import Agent
+
+agent = Agent(model="openai:gpt-5.5", system_prompt="You are helpful.")
 ```
 
-## Custom base URL — Azure, Portkey, LiteLLM, vLLM
+The string `"openai:gpt-5.5"` does two things: tells locus to use the
+OpenAI provider (`openai:` prefix), and which model id to call
+(`gpt-5.5`). Any model id OpenAI accepts, locus accepts.
 
-`base_url` overrides the API endpoint. Any OpenAI-compatible gateway
-works under the same `OpenAIModel` class:
+### 3. Run it
+
+```python
+result = agent.run_sync("What is two plus two?")
+print(result.message)
+# → 'Four.'
+```
+
+Done. Streaming, tool calls, structured output — all of it works
+without further configuration.
+
+## What you get out of the box
+
+### Chat completions across the GPT family
+
+Every chat-shaped OpenAI model: `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5.5`,
+`gpt-image-1`. Vision input (image URLs / base64), audio input, and
+function calling work the same way you'd use them on the OpenAI SDK
+directly — locus just normalises the events the model emits.
+
+### Reasoning models — the o-series
+
+`o1`, `o3`, `o4-mini` route through the same `Agent(model="openai:o3")`
+call. They're slower and more expensive but think before they answer.
+locus surfaces the model's thinking blocks as `ThinkEvent`s so your
+UI can show "thinking…" without parsing the response yourself.
 
 ```python
 agent = Agent(
-    model="openai:gpt-4o",
-    model_config={"base_url": "https://api.portkey.ai/v1"},
+    model="openai:o3",
+    model_config={"reasoning_effort": "high"},   # low | medium | high
 )
 ```
 
-| Gateway | `base_url` |
-|---|---|
-| Azure OpenAI | `https://<resource>.openai.azure.com/openai/deployments/<deployment-id>` |
-| Portkey | `https://api.portkey.ai/v1` |
-| LiteLLM Proxy | `https://<your-litellm-host>/v1` |
-| vLLM (self-hosted) | `http://localhost:8000/v1` |
-| together.ai / fireworks / groq | their published `/v1` URL |
+`reasoning_effort` is OpenAI's knob for how long the model spends
+thinking. Default is `medium`.
+
+### Real SSE streaming
+
+Token-level streaming over Server-Sent Events. The model emits
+deltas, locus turns them into `ModelChunkEvent`s, your `async for`
+loop reads them as they arrive — no buffering, no fake chunking.
+
+```python
+async for event in agent.run("Write a haiku about latency."):
+    if isinstance(event, ModelChunkEvent) and event.content:
+        print(event.content, end="", flush=True)
+```
 
-For Azure, `api_key` carries the Azure key. For Portkey and LiteLLM,
-their virtual-key system applies.
+### Tool calling — the OpenAI protocol
 
-## Reasoning models
+`@tool` functions are converted to OpenAI's tool-call schema and
+the structured `tool_calls` field in the response is parsed back into
+locus `ToolCall` objects. Parallel tool calls are supported (the
+model can request multiple tools per turn; locus runs them
+concurrently via the `ConcurrentExecutor`).
 
-Reasoning models (`o1`, `o3`, `o4-mini`) route through the same
-class. locus adds `reasoning_effort` to the request when set:
+### Structured output — Pydantic models in, validated objects out
 
 ```python
+from pydantic import BaseModel
+
+class Answer(BaseModel):
+    summary: str
+    confidence: float
+
 agent = Agent(
-    model="openai:o3",
-    model_config={"reasoning_effort": "high"},
+    model="openai:gpt-5.5",
+    output_schema=Answer,
+    system_prompt="Reply as JSON matching the schema.",
 )
+result = agent.run_sync("Was the meeting productive?")
+print(result.parsed)        # Answer(summary='...', confidence=0.83)
 ```
 
-The model's thinking blocks come through as `ThinkEvent`s in the
-event stream so you can show "thinking…" in your UI.
+Under the hood, locus sends an OpenAI `response_format` with the
+schema and a strict-mode flag; if the model produces invalid JSON,
+locus retries with the validation errors in the prompt
+(`output_schema_retries=2` by default).
+
+## Going through a gateway
+
+A `base_url` override turns `OpenAIModel` into a client for any
+OpenAI-compatible endpoint:
+
+| Gateway | When to use it | `base_url` |
+|---|---|---|
+| **Azure OpenAI** | Enterprise / regulated workloads, Azure billing | `https://<resource>.openai.azure.com/openai/deployments/<deployment-id>` |
+| **Portkey** | Virtual keys, request routing across providers, retries | `https://api.portkey.ai/v1` |
+| **LiteLLM Proxy** | Self-hosted control plane in front of N providers | `https://<your-litellm-host>/v1` |
+| **vLLM** | Self-hosted inference for open models with the OpenAI shape | `http://localhost:8000/v1` |
+| **together.ai / fireworks / groq** | Hosted open-model inference at OpenAI-shape | their published `/v1` |
+
+```python
+agent = Agent(
+    model="openai:gpt-4o",
+    model_config={"base_url": "https://api.portkey.ai/v1"},
+)
+```
+
+The `api_key` your `OPENAI_API_KEY` provides is forwarded — for Azure
+that's the Azure resource key, for Portkey it's the Portkey virtual
+key, etc.
+
+## Common gotchas
+
+| Symptom | Likely cause |
+|---|---|
+| `401 Unauthorized` | `OPENAI_API_KEY` not set, or set to the wrong project's key |
+| `429 Rate limit exceeded` | OpenAI quota; locus retries automatically with `ModelRetryHook` if installed |
+| `model_not_found` | Model id doesn't exist for your tier — check `https://platform.openai.com/docs/models` |
+| Empty `tool_calls` | Model decided not to call a tool; check the system prompt |
+| `reasoning_effort` rejected | Only valid for o-series models, not GPT-4o / GPT-5 |
 
 ## Source
 
-[`OpenAIModel` in `models/native/openai.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/openai.py).
+[`OpenAIModel` in `src/locus/models/native/openai.py`](https://github.com/oracle-samples/locus/blob/main/src/locus/models/native/openai.py)
 
 ## See also
 
 - [Models overview](../models.md) — the full provider tree.
+- [Anthropic](anthropic.md) — Claude family direct.
+- [OCI Generative AI](oci.md) — same OpenAI models without a separate key, on Oracle infrastructure.