From 230633ef8b98120d756d17f8222c37009c3ecdbc Mon Sep 17 00:00:00 2001
From: Federico Kamelhar <federico.kamelhar@oracle.com>
Date: Fri, 1 May 2026 11:32:18 -0400
Subject: [PATCH] docs(oci-dac): tutorial 40 + empirical Qwen confirmation +
 website
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds the runnable end-to-end tutorial and threads the DAC story
through every reader-facing surface, with one piece of empirical
evidence: locus actually drove a real Qwen DAC in uk-london-1.

What ships
----------
- ``examples/tutorial_40_oci_dac.py`` — 5-part walkthrough:
    1. Auto-routing (OCID prefix → SDK transport).
    2. Configure an Agent against a DAC (pre-build via ``get_model``
       because ``Agent``'s strict ``AgentConfig`` rejects
       provider-specific kwargs on the keyword path).
    3. ``complete()`` against the DAC.
    4. ``stream()`` against the DAC — real SSE deltas.
    5. ``Agent`` + ``@tool`` against the DAC, with an honest note on
       Qwen's ``<tool_call>`` text-block format vs OpenAI's
       structured ``tool_calls`` and the two ways to fix it.

  Live-tested against Luigi's London DAC. Sample output captured in
  the how-to page.

- ``docs/how-to/oci-dac.md`` — adds the "Confirmed working — Qwen
  on a London DAC" section with the actual run output (model
  identification, token usage, streaming chunks). Documents the
  Qwen ``<tool_call>`` quirk and the two remediation paths.

- ``README.md`` capability grid: new ``🏗 OCI Dedicated AI Cluster``
  row linking to the how-to page.

- ``README.md`` tutorials section: tutorial counter 39 → 40, new
  ``OCI`` track row covering 29 + 40.

- ``docs/index.md`` capability grid: same DAC row added.

- ``docs/index.md`` tutorials section: counter 39 → 40 and tutorial
  40 added to the Track-5 production list.

Run output (with the DAC env vars set)
--------------------------------------
- Part 3: complete() — "I am a large-scale language model developed
  by Alibaba Cloud, known as Qwen." (17 / 18 tokens)
- Part 4: stream() — real SSE deltas: "1, 2, 3, 4, 5".
- Part 5: agent + tool — Qwen emits ``<tool_call>`` text block;
  documented as a model-side rather than locus-side limitation.

Validation
----------
- 3205 unit tests pass, no regressions.
- ``hatch run check`` clean.
- Tutorial runs cleanly in mock mode (no env vars) and live mode
  (with OCI_DAC_* env vars).

Privacy
-------
No tenancy / endpoint OCIDs are committed — the tutorial reads
them from env vars. The how-to references the test endpoint by
region only.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
---
 README.md                       |   4 +-
 docs/how-to/oci-dac.md          |  42 +++++
 docs/index.md                   |   8 +-
 examples/tutorial_40_oci_dac.py | 301 ++++++++++++++++++++++++++++++++
 4 files changed, 351 insertions(+), 4 deletions(-)
 create mode 100644 examples/tutorial_40_oci_dac.py
diff --git a/README.md b/README.md
index b630cbf9..49ba12ed 100644
--- a/README.md
+++ b/README.md
@@ -103,6 +103,7 @@ the [documentation](https://oracle-samples.github.io/locus/).
 | **[📊 Evaluation](https://oracle-samples.github.io/locus/concepts/evaluation/)** | `EvalCase` / `EvalRunner` / `EvalReport` — regression suites, custom evaluators, pass / score / duration reporting. |
 | **[🛂 Termination algebra](https://oracle-samples.github.io/locus/concepts/termination/)** | Eight composable stop conditions on `Agent(termination=…)`: `MaxIterations \| TextMention("DONE") & ConfidenceMet(0.9)` is real Python (`__or__` / `__and__` overloads). |
 | **[🧰 Models](https://oracle-samples.github.io/locus/concepts/models/)** | OCI GenAI native (V1 + SDK transport, 90+ models, day-0) · OpenAI · Anthropic · Ollama. One auth surface for OCI: profile, session token, instance / resource principal. |
+| **[🏗 OCI Dedicated AI Cluster](https://oracle-samples.github.io/locus/how-to/oci-dac/)** | Pass an `ocid1.generativeaiendpoint.<region>....` OCID and locus auto-routes to `DedicatedServingMode` with real SSE streaming. Live-tested against Qwen on a London DAC. |
 
 ## The agent loop
 
@@ -206,7 +207,7 @@ print(agent.run_sync(
 
 ## Tutorials
 
-[`examples/`](examples/) has 39 progressive tutorials, each a single
+[`examples/`](examples/) has 40 progressive tutorials, each a single
 runnable file. The full set runs end-to-end in CI on every commit;
 each tutorial is a working program against a real model.
 
@@ -218,6 +219,7 @@ each tutorial is a working program against a real model.
 | **Multi-agent** | [`11_swarm_multiagent`](examples/tutorial_11_swarm_multiagent.py) · [`16_agent_handoff`](examples/tutorial_16_agent_handoff.py) · [`17_orchestrator_pattern`](examples/tutorial_17_orchestrator_pattern.py) · [`34_a2a_protocol`](examples/tutorial_34_a2a_protocol.py) |
 | **RAG** | [`22_rag_basics`](examples/tutorial_22_rag_basics.py) · [`24_rag_agents`](examples/tutorial_24_rag_agents.py) |
 | **Production** | [`19_guardrails_security`](examples/tutorial_19_guardrails_security.py) · [`20_checkpoint_backends`](examples/tutorial_20_checkpoint_backends.py) · [`28_agent_server`](examples/tutorial_28_agent_server.py) · [`37_termination`](examples/tutorial_37_termination.py) |
+| **OCI** | [`29_model_providers`](examples/tutorial_29_model_providers.py) · [`40_oci_dac`](examples/tutorial_40_oci_dac.py) — Dedicated AI Cluster endpoints |
 
 End-to-end demos:
 
diff --git a/docs/how-to/oci-dac.md b/docs/how-to/oci-dac.md
index 60103570..ab7d510f 100644
--- a/docs/how-to/oci-dac.md
+++ b/docs/how-to/oci-dac.md
@@ -36,6 +36,48 @@ get_model("oci:ocid1.generativeaiendpoint....")
   → SDK chat() routes to your DAC.
 ```
 
+## Confirmed working — Qwen on a London DAC
+
+Live-tested against a `uk-london-1` DAC endpoint running Qwen
+(Alibaba Cloud) on 2026-05-01. End-to-end results from the live run
+(see [`examples/tutorial_40_oci_dac.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py)):
+
+```text
+=== Part 3: complete() against the DAC ===
+Reply:        'I am a large-scale language model developed by Alibaba Cloud, known as Qwen.'
+usage:        {'prompt_tokens': 17, 'completion_tokens': 18}
+stop_reason:  stop
+
+=== Part 4: stream() against the DAC ===
+Streaming reply (chunks shown inline):
+  1, 2, 3, 4, 5
+```
+
+What's proven by the run:
+
+- **Routing** — `oci:ocid1.generativeaiendpoint....` resolved to
+  `OCIModel`, not `OCIOpenAIModel`.
+- **Serving mode** — `DedicatedServingMode(endpoint_id=...)` was
+  accepted by the live endpoint.
+- **Real SSE** — chunks arrived as character-by-character deltas, not
+  the fallback path.
+- **Token accounting** — `usage` populated correctly (17 / 18 tokens).
+
+What's still model-specific (Qwen on this DAC, with the deployment as
+provisioned by Luigi's tenancy):
+
+- Tool calls come back as `<tool_call>{...}</tool_call>` text blocks
+  inside `message.content`, not as structured `tool_calls` array
+  entries. Locus's `GenericProvider.parse_response()` doesn't extract
+  them as `ToolCall`s. Two ways to fix:
+  1. **Deploy-side**: configure the DAC with a Qwen3-family flag like
+     `--enable-auto-tool-choice` so the model emits OpenAI-style
+     `tool_calls`. Locus picks them up automatically.
+  2. **Caller-side**: post-process `result.message` for
+     `<tool_call>{...}</tool_call>` blocks and re-issue them via
+     `agent.run_sync(...)` with the parsed call. A small regex
+     extraction; not built into locus today.
+
 ## Streaming
 
 `OCIModel.stream()` flips `is_stream=True` on the underlying
diff --git a/docs/index.md b/docs/index.md
index 52a6d876..8506f112 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -280,6 +280,7 @@ them in one process; stream events from any of them in the same
 | **📊 Evaluation** | `EvalCase` / `EvalRunner` / `EvalReport` regression suites. |
 | **🛂 Termination algebra** | Eight composable stop conditions. `Or` and `And` compose them. |
 | **🧰 Models** | OCI GenAI native (V1 + SDK) · OpenAI · Anthropic · Ollama. |
+| **🏗 OCI Dedicated AI Cluster** | Pass an `ocid1.generativeaiendpoint....` OCID, get `DedicatedServingMode` with real SSE streaming. Live-tested on Qwen / London. |
 
 ## Hello, agent
 
@@ -350,7 +351,7 @@ Read the [concepts](concepts/agent.md) for the *why*; read the
 ## Learn locus in an afternoon
 
 The [`examples/`](https://github.com/oracle-samples/locus/tree/main/examples)
-tree is **39 tutorials** plus **3 end-to-end demos**. Every tutorial
+tree is **40 tutorials** plus **3 end-to-end demos**. Every tutorial
 is one runnable file and adds exactly one idea on top of the previous.
 
 ### Track 1 — basics (first hour)
@@ -394,7 +395,7 @@ The six in-process patterns plus A2A:
 [RAG agents](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_24_rag_agents.py) ·
 [Skills](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_32_skills.py).
 
-### Track 5 — production (12, 19–21, 26–30, 33, 35, 37–39)
+### Track 5 — production (12, 19–21, 26–30, 33, 35, 37–40)
 
 [MCP](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_12_mcp_integration.py) ·
 [Guardrails](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_19_guardrails_security.py) ·
@@ -410,7 +411,8 @@ The six in-process patterns plus A2A:
 [Graph advanced](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_35_graph_advanced.py) ·
 [Termination](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_37_termination.py) ·
 [Multi-modal providers](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_38_multimodal_providers.py) ·
-[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py).
+[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py) ·
+[OCI Dedicated AI Cluster (DAC)](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py).
 
 ### End-to-end demos
 
diff --git a/examples/tutorial_40_oci_dac.py b/examples/tutorial_40_oci_dac.py
new file mode 100644
index 00000000..dda51b66
--- /dev/null
+++ b/examples/tutorial_40_oci_dac.py
@@ -0,0 +1,301 @@
+"""
+Tutorial 40: OCI Dedicated AI Cluster (DAC) endpoints
+
+This tutorial covers locus's DAC support. A DAC is OCI's
+provisioned-capacity serving mode for OCI GenAI: instead of pay-per-token
+inference against a shared model id, you address a dedicated endpoint
+by its OCID (``ocid1.generativeaiendpoint.oc1.<region>....``) and
+inference is routed to your cluster.
+
+Locus auto-detects DAC OCIDs and routes them through the SDK
+transport (``OCIModel``) — the V1 OpenAI-compatible transport can't
+speak ``DedicatedServingMode``. Both non-streaming ``complete()`` and
+real SSE ``stream()`` work end-to-end against a DAC.
+
+This tutorial covers:
+
+- Part 1: how DAC routing is decided (``ocid1.generativeaiendpoint.``
+  prefix → ``OCIModel``).
+- Part 2: configure an ``Agent`` against a DAC endpoint.
+- Part 3: drive ``complete()`` against the DAC with one prompt.
+- Part 4: drive ``stream()`` and watch SSE deltas come back.
+- Part 5: wire the DAC into a tool-using ``Agent`` so the model sitting
+  on dedicated capacity can call your @tool functions.
+
+Prerequisites:
+- ``oci`` SDK installed (``pip install -e ".[oci]"``).
+- An OCI profile with permission to invoke the DAC endpoint.
+- The DAC endpoint OCID, the compartment OCID, and the region.
+
+Set these env vars (kept out of the source so the tutorial works for
+any DAC):
+
+  export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.uk-london-1....
+  export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1....
+  export OCI_DAC_REGION=uk-london-1
+  export OCI_PROFILE=MY_DAC_PROFILE
+
+Without those env vars Parts 2-5 print the wiring snippet and skip.
+
+Difficulty: Intermediate
+"""
+
+from __future__ import annotations
+
+import asyncio
+import os
+
+
+# =============================================================================
+# Part 1: Auto-routing
+# =============================================================================
+
+
+def example_routing() -> None:
+    """How locus decides to use the SDK transport for DAC OCIDs."""
+    print("=== Part 1: Auto-routing ===\n")
+
+    print("locus.models.registry inspects the model id with three rules:")
+    print()
+    print("  1. ocid1.generativeaiendpoint.<region>....   → OCIModel (SDK)")
+    print("  2. cohere.command-r-*                         → OCIModel (SDK)")
+    print("  3. everything else                            → OCIOpenAIModel (V1)")
+    print()
+    print("So both calls route to the SDK transport:")
+    print()
+    print('  Agent(model="oci:cohere.command-r-plus-08-2024")  # rule 2')
+    print('  Agent(model="oci:ocid1.generativeaiendpoint....")  # rule 1')
+    print()
+    print("DAC needs the SDK transport because DedicatedServingMode")
+    print("(endpoint_id=...) is part of the OCI proprietary chat shape,")
+    print("not the OpenAI-compatible /v1/chat/completions endpoint.")
+
+
+# =============================================================================
+# Part 2: Configure an Agent against a DAC
+# =============================================================================
+
+
+def _dac_env_ready() -> bool:
+    return bool(
+        os.environ.get("OCI_DAC_ENDPOINT_OCID") and os.environ.get("OCI_DAC_COMPARTMENT_ID")
+    )
+
+
+def example_configure_agent() -> None:
+    """Build an Agent pointed at a DAC. Just like any other model."""
+    print("\n=== Part 2: Configure Agent against a DAC ===\n")
+
+    if not _dac_env_ready():
+        print("OCI_DAC_ENDPOINT_OCID / OCI_DAC_COMPARTMENT_ID not set.")
+        print()
+        print("Wiring (with the env vars set):")
+        print("""
+  from locus import Agent
+  from locus.models import get_model
+
+  # Pre-build the model — DAC needs provider-specific kwargs that
+  # Agent's strict AgentConfig doesn't accept on the keyword path.
+  region = os.environ["OCI_DAC_REGION"]
+  model = get_model(
+      f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}",
+      compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"],
+      profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"),
+      service_endpoint=(
+          f"https://inference.generativeai.{region}.oci.oraclecloud.com"
+      ),
+  )
+  agent = Agent(
+      model=model,
+      system_prompt="You are a concise assistant.",
+  )
+""")
+        return
+
+    from locus import Agent
+    from locus.models import get_model
+
+    region = os.environ.get("OCI_DAC_REGION", "us-chicago-1")
+    model = get_model(
+        f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}",
+        compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"],
+        profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"),
+        service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com",
+    )
+    agent = Agent(
+        model=model,
+        system_prompt="You are a concise assistant. Reply briefly.",
+        max_iterations=2,
+    )
+    print(f"Agent configured against DAC endpoint in {region}.")
+    print(f"  underlying model class:  {type(agent._model).__name__}")
+    print("  serving mode (DAC):       DedicatedServingMode (set by client)")
+
+
+# =============================================================================
+# Part 3: complete() — single round-trip
+# =============================================================================
+
+
+async def example_complete() -> None:
+    """Fire one chat at the DAC and print what comes back."""
+    print("\n=== Part 3: complete() against the DAC ===\n")
+
+    if not _dac_env_ready():
+        print("Skipping — env vars not set.")
+        return
+
+    from locus.core.messages import Message
+    from locus.models.providers.oci import OCIAuthType, OCIModel
+
+    region = os.environ.get("OCI_DAC_REGION", "us-chicago-1")
+    model = OCIModel(
+        model_id=os.environ["OCI_DAC_ENDPOINT_OCID"],
+        compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"],
+        profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"),
+        auth_type=OCIAuthType.API_KEY,
+        service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com",
+        max_tokens=128,
+    )
+
+    response = await model.complete(
+        messages=[
+            Message.user("In one sentence, what model are you?"),
+        ],
+        tools=None,
+    )
+    content = (response.message.content or "").strip()
+    print(f"Reply:        {content!r}")
+    print(f"usage:        {response.usage}")
+    print(f"stop_reason:  {response.stop_reason}")
+
+
+# =============================================================================
+# Part 4: stream() — real SSE deltas
+# =============================================================================
+
+
+async def example_stream() -> None:
+    """Stream from the DAC and print each delta as it arrives."""
+    print("\n=== Part 4: stream() against the DAC ===\n")
+
+    if not _dac_env_ready():
+        print("Skipping — env vars not set.")
+        return
+
+    from locus.core.messages import Message
+    from locus.models.providers.oci import OCIAuthType, OCIModel
+
+    region = os.environ.get("OCI_DAC_REGION", "us-chicago-1")
+    model = OCIModel(
+        model_id=os.environ["OCI_DAC_ENDPOINT_OCID"],
+        compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"],
+        profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"),
+        auth_type=OCIAuthType.API_KEY,
+        service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com",
+        max_tokens=64,
+    )
+
+    print("Streaming reply (chunks shown inline):")
+    print("  ", end="", flush=True)
+    async for event in model.stream(
+        messages=[Message.user("Count from 1 to 5, separated by commas.")],
+        tools=None,
+    ):
+        if event.content:
+            print(event.content, end="", flush=True)
+        if event.done:
+            break
+    print()
+
+
+# =============================================================================
+# Part 5: Agent + tool against the DAC
+# =============================================================================
+
+
+async def example_agent_with_tool() -> None:
+    """Wire a tool-using Agent on top of the DAC.
+
+    The DAC endpoint sees the same tool schema your on-demand models
+    do — locus passes the OpenAI-style tool definitions in the
+    ``GenericChatRequest.tools`` field. Whether the model on the
+    other end emits structured tool calls (OpenAI format,
+    ``message.tool_calls``) or text-format tool calls (Qwen's
+    ``<tool_call>{...}</tool_call>`` XML wrapper, etc.) depends on
+    the model and the deployment configuration:
+
+    - **OpenAI / Llama / Cohere on OCI** — emit structured
+      ``tool_calls``. Locus extracts them automatically.
+    - **Qwen on a DAC** — by default emits ``<tool_call>`` text
+      blocks. Locus's parser doesn't extract these, so
+      ``result.metrics.tool_calls`` will be 0 even though the model
+      "called" the tool in its content. Two options to fix:
+        (a) Configure the DAC to enable OpenAI-compatible tool-call
+            output (Qwen3 family supports this via
+            ``--enable-auto-tool-choice`` on the deployment).
+        (b) Wrap the agent with a parser that extracts the
+            ``<tool_call>`` blocks from ``result.message`` and
+            re-issues them as locus ToolCall objects.
+
+    This part of the tutorial just shows the wiring — what the model
+    does with it depends on the model.
+    """
+    print("\n=== Part 5: Agent + @tool against the DAC ===\n")
+
+    if not _dac_env_ready():
+        print("Skipping — env vars not set.")
+        return
+
+    from locus import Agent
+    from locus.models import get_model
+    from locus.tools.decorator import tool
+
+    @tool(name="add_two_numbers")
+    def add_two_numbers(a: int, b: int) -> int:
+        """Return the sum of two integers."""
+        return a + b
+
+    region = os.environ.get("OCI_DAC_REGION", "us-chicago-1")
+    model = get_model(
+        f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}",
+        compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"],
+        profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"),
+        service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com",
+    )
+    agent = Agent(
+        model=model,
+        tools=[add_two_numbers],
+        system_prompt="You can call add_two_numbers when asked to add. Reply briefly.",
+        max_iterations=4,
+    )
+
+    result = await asyncio.to_thread(
+        agent.run_sync,
+        "Use the add_two_numbers tool to add 7 and 35, then state the result in one sentence.",
+    )
+    print(f"final message:        {result.message.strip()[:200]}")
+    print(f"iterations:           {result.metrics.iterations}")
+    print(f"locus tool calls:     {result.metrics.tool_calls}")
+    if "<tool_call>" in (result.message or ""):
+        print()
+        print("Note: the model emitted a <tool_call> text block instead of a")
+        print("structured tool_call. See the docstring for how to handle this")
+        print("(deployment flag or post-processing parser).")
+
+
+# =============================================================================
+# Main
+# =============================================================================
+
+
+async def _async_main() -> None:
+    example_routing()
+    example_configure_agent()
+    await example_complete()
+    await example_stream()
+    await example_agent_with_tool()
+
+
+if __name__ == "__main__":
+    asyncio.run(_async_main())