From 230633ef8b98120d756d17f8222c37009c3ecdbc Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Fri, 1 May 2026 11:32:18 -0400 Subject: [PATCH] docs(oci-dac): tutorial 40 + empirical Qwen confirmation + website MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the runnable end-to-end tutorial and threads the DAC story through every reader-facing surface, with one piece of empirical evidence: locus actually drove a real Qwen DAC in uk-london-1. What ships ---------- - ``examples/tutorial_40_oci_dac.py`` — 5-part walkthrough: 1. Auto-routing (OCID prefix → SDK transport). 2. Configure an Agent against a DAC (pre-build via ``get_model`` because ``Agent``'s strict ``AgentConfig`` rejects provider-specific kwargs on the keyword path). 3. ``complete()`` against the DAC. 4. ``stream()`` against the DAC — real SSE deltas. 5. ``Agent`` + ``@tool`` against the DAC, with an honest note on Qwen's ```` text-block format vs OpenAI's structured ``tool_calls`` and the two ways to fix it. Live-tested against Luigi's London DAC. Sample output captured in the how-to page. - ``docs/how-to/oci-dac.md`` — adds the "Confirmed working — Qwen on a London DAC" section with the actual run output (model identification, token usage, streaming chunks). Documents the Qwen ```` quirk and the two remediation paths. - ``README.md`` capability grid: new ``🏗 OCI Dedicated AI Cluster`` row linking to the how-to page. - ``README.md`` tutorials section: tutorial counter 39 → 40, new ``OCI`` track row covering 29 + 40. - ``docs/index.md`` capability grid: same DAC row added. - ``docs/index.md`` tutorials section: counter 39 → 40 and tutorial 40 added to the Track-5 production list. Run output (with the DAC env vars set) -------------------------------------- - Part 3: complete() — "I am a large-scale language model developed by Alibaba Cloud, known as Qwen." (17 / 18 tokens) - Part 4: stream() — real SSE deltas: "1, 2, 3, 4, 5". - Part 5: agent + tool — Qwen emits ```` text block; documented as a model-side rather than locus-side limitation. Validation ---------- - 3205 unit tests pass, no regressions. - ``hatch run check`` clean. - Tutorial runs cleanly in mock mode (no env vars) and live mode (with OCI_DAC_* env vars). Privacy ------- No tenancy / endpoint OCIDs are committed — the tutorial reads them from env vars. The how-to references the test endpoint by region only. Signed-off-by: Federico Kamelhar --- README.md | 4 +- docs/how-to/oci-dac.md | 42 +++++ docs/index.md | 8 +- examples/tutorial_40_oci_dac.py | 301 ++++++++++++++++++++++++++++++++ 4 files changed, 351 insertions(+), 4 deletions(-) create mode 100644 examples/tutorial_40_oci_dac.py diff --git a/README.md b/README.md index b630cbf9..49ba12ed 100644 --- a/README.md +++ b/README.md @@ -103,6 +103,7 @@ the [documentation](https://oracle-samples.github.io/locus/). | **[📊 Evaluation](https://oracle-samples.github.io/locus/concepts/evaluation/)** | `EvalCase` / `EvalRunner` / `EvalReport` — regression suites, custom evaluators, pass / score / duration reporting. | | **[🛂 Termination algebra](https://oracle-samples.github.io/locus/concepts/termination/)** | Eight composable stop conditions on `Agent(termination=…)`: `MaxIterations \| TextMention("DONE") & ConfidenceMet(0.9)` is real Python (`__or__` / `__and__` overloads). | | **[🧰 Models](https://oracle-samples.github.io/locus/concepts/models/)** | OCI GenAI native (V1 + SDK transport, 90+ models, day-0) · OpenAI · Anthropic · Ollama. One auth surface for OCI: profile, session token, instance / resource principal. | +| **[🏗 OCI Dedicated AI Cluster](https://oracle-samples.github.io/locus/how-to/oci-dac/)** | Pass an `ocid1.generativeaiendpoint.....` OCID and locus auto-routes to `DedicatedServingMode` with real SSE streaming. Live-tested against Qwen on a London DAC. | ## The agent loop @@ -206,7 +207,7 @@ print(agent.run_sync( ## Tutorials -[`examples/`](examples/) has 39 progressive tutorials, each a single +[`examples/`](examples/) has 40 progressive tutorials, each a single runnable file. The full set runs end-to-end in CI on every commit; each tutorial is a working program against a real model. @@ -218,6 +219,7 @@ each tutorial is a working program against a real model. | **Multi-agent** | [`11_swarm_multiagent`](examples/tutorial_11_swarm_multiagent.py) · [`16_agent_handoff`](examples/tutorial_16_agent_handoff.py) · [`17_orchestrator_pattern`](examples/tutorial_17_orchestrator_pattern.py) · [`34_a2a_protocol`](examples/tutorial_34_a2a_protocol.py) | | **RAG** | [`22_rag_basics`](examples/tutorial_22_rag_basics.py) · [`24_rag_agents`](examples/tutorial_24_rag_agents.py) | | **Production** | [`19_guardrails_security`](examples/tutorial_19_guardrails_security.py) · [`20_checkpoint_backends`](examples/tutorial_20_checkpoint_backends.py) · [`28_agent_server`](examples/tutorial_28_agent_server.py) · [`37_termination`](examples/tutorial_37_termination.py) | +| **OCI** | [`29_model_providers`](examples/tutorial_29_model_providers.py) · [`40_oci_dac`](examples/tutorial_40_oci_dac.py) — Dedicated AI Cluster endpoints | End-to-end demos: diff --git a/docs/how-to/oci-dac.md b/docs/how-to/oci-dac.md index 60103570..ab7d510f 100644 --- a/docs/how-to/oci-dac.md +++ b/docs/how-to/oci-dac.md @@ -36,6 +36,48 @@ get_model("oci:ocid1.generativeaiendpoint....") → SDK chat() routes to your DAC. ``` +## Confirmed working — Qwen on a London DAC + +Live-tested against a `uk-london-1` DAC endpoint running Qwen +(Alibaba Cloud) on 2026-05-01. End-to-end results from the live run +(see [`examples/tutorial_40_oci_dac.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py)): + +```text +=== Part 3: complete() against the DAC === +Reply: 'I am a large-scale language model developed by Alibaba Cloud, known as Qwen.' +usage: {'prompt_tokens': 17, 'completion_tokens': 18} +stop_reason: stop + +=== Part 4: stream() against the DAC === +Streaming reply (chunks shown inline): + 1, 2, 3, 4, 5 +``` + +What's proven by the run: + +- **Routing** — `oci:ocid1.generativeaiendpoint....` resolved to + `OCIModel`, not `OCIOpenAIModel`. +- **Serving mode** — `DedicatedServingMode(endpoint_id=...)` was + accepted by the live endpoint. +- **Real SSE** — chunks arrived as character-by-character deltas, not + the fallback path. +- **Token accounting** — `usage` populated correctly (17 / 18 tokens). + +What's still model-specific (Qwen on this DAC, with the deployment as +provisioned by Luigi's tenancy): + +- Tool calls come back as `{...}` text blocks + inside `message.content`, not as structured `tool_calls` array + entries. Locus's `GenericProvider.parse_response()` doesn't extract + them as `ToolCall`s. Two ways to fix: + 1. **Deploy-side**: configure the DAC with a Qwen3-family flag like + `--enable-auto-tool-choice` so the model emits OpenAI-style + `tool_calls`. Locus picks them up automatically. + 2. **Caller-side**: post-process `result.message` for + `{...}` blocks and re-issue them via + `agent.run_sync(...)` with the parsed call. A small regex + extraction; not built into locus today. + ## Streaming `OCIModel.stream()` flips `is_stream=True` on the underlying diff --git a/docs/index.md b/docs/index.md index 52a6d876..8506f112 100644 --- a/docs/index.md +++ b/docs/index.md @@ -280,6 +280,7 @@ them in one process; stream events from any of them in the same | **📊 Evaluation** | `EvalCase` / `EvalRunner` / `EvalReport` regression suites. | | **🛂 Termination algebra** | Eight composable stop conditions. `Or` and `And` compose them. | | **🧰 Models** | OCI GenAI native (V1 + SDK) · OpenAI · Anthropic · Ollama. | +| **🏗 OCI Dedicated AI Cluster** | Pass an `ocid1.generativeaiendpoint....` OCID, get `DedicatedServingMode` with real SSE streaming. Live-tested on Qwen / London. | ## Hello, agent @@ -350,7 +351,7 @@ Read the [concepts](concepts/agent.md) for the *why*; read the ## Learn locus in an afternoon The [`examples/`](https://github.com/oracle-samples/locus/tree/main/examples) -tree is **39 tutorials** plus **3 end-to-end demos**. Every tutorial +tree is **40 tutorials** plus **3 end-to-end demos**. Every tutorial is one runnable file and adds exactly one idea on top of the previous. ### Track 1 — basics (first hour) @@ -394,7 +395,7 @@ The six in-process patterns plus A2A: [RAG agents](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_24_rag_agents.py) · [Skills](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_32_skills.py). -### Track 5 — production (12, 19–21, 26–30, 33, 35, 37–39) +### Track 5 — production (12, 19–21, 26–30, 33, 35, 37–40) [MCP](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_12_mcp_integration.py) · [Guardrails](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_19_guardrails_security.py) · @@ -410,7 +411,8 @@ The six in-process patterns plus A2A: [Graph advanced](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_35_graph_advanced.py) · [Termination](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_37_termination.py) · [Multi-modal providers](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_38_multimodal_providers.py) · -[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py). +[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py) · +[OCI Dedicated AI Cluster (DAC)](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py). ### End-to-end demos diff --git a/examples/tutorial_40_oci_dac.py b/examples/tutorial_40_oci_dac.py new file mode 100644 index 00000000..dda51b66 --- /dev/null +++ b/examples/tutorial_40_oci_dac.py @@ -0,0 +1,301 @@ +""" +Tutorial 40: OCI Dedicated AI Cluster (DAC) endpoints + +This tutorial covers locus's DAC support. A DAC is OCI's +provisioned-capacity serving mode for OCI GenAI: instead of pay-per-token +inference against a shared model id, you address a dedicated endpoint +by its OCID (``ocid1.generativeaiendpoint.oc1.....``) and +inference is routed to your cluster. + +Locus auto-detects DAC OCIDs and routes them through the SDK +transport (``OCIModel``) — the V1 OpenAI-compatible transport can't +speak ``DedicatedServingMode``. Both non-streaming ``complete()`` and +real SSE ``stream()`` work end-to-end against a DAC. + +This tutorial covers: + +- Part 1: how DAC routing is decided (``ocid1.generativeaiendpoint.`` + prefix → ``OCIModel``). +- Part 2: configure an ``Agent`` against a DAC endpoint. +- Part 3: drive ``complete()`` against the DAC with one prompt. +- Part 4: drive ``stream()`` and watch SSE deltas come back. +- Part 5: wire the DAC into a tool-using ``Agent`` so the model sitting + on dedicated capacity can call your @tool functions. + +Prerequisites: +- ``oci`` SDK installed (``pip install -e ".[oci]"``). +- An OCI profile with permission to invoke the DAC endpoint. +- The DAC endpoint OCID, the compartment OCID, and the region. + +Set these env vars (kept out of the source so the tutorial works for +any DAC): + + export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.uk-london-1.... + export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1.... + export OCI_DAC_REGION=uk-london-1 + export OCI_PROFILE=MY_DAC_PROFILE + +Without those env vars Parts 2-5 print the wiring snippet and skip. + +Difficulty: Intermediate +""" + +from __future__ import annotations + +import asyncio +import os + + +# ============================================================================= +# Part 1: Auto-routing +# ============================================================================= + + +def example_routing() -> None: + """How locus decides to use the SDK transport for DAC OCIDs.""" + print("=== Part 1: Auto-routing ===\n") + + print("locus.models.registry inspects the model id with three rules:") + print() + print(" 1. ocid1.generativeaiendpoint..... → OCIModel (SDK)") + print(" 2. cohere.command-r-* → OCIModel (SDK)") + print(" 3. everything else → OCIOpenAIModel (V1)") + print() + print("So both calls route to the SDK transport:") + print() + print(' Agent(model="oci:cohere.command-r-plus-08-2024") # rule 2') + print(' Agent(model="oci:ocid1.generativeaiendpoint....") # rule 1') + print() + print("DAC needs the SDK transport because DedicatedServingMode") + print("(endpoint_id=...) is part of the OCI proprietary chat shape,") + print("not the OpenAI-compatible /v1/chat/completions endpoint.") + + +# ============================================================================= +# Part 2: Configure an Agent against a DAC +# ============================================================================= + + +def _dac_env_ready() -> bool: + return bool( + os.environ.get("OCI_DAC_ENDPOINT_OCID") and os.environ.get("OCI_DAC_COMPARTMENT_ID") + ) + + +def example_configure_agent() -> None: + """Build an Agent pointed at a DAC. Just like any other model.""" + print("\n=== Part 2: Configure Agent against a DAC ===\n") + + if not _dac_env_ready(): + print("OCI_DAC_ENDPOINT_OCID / OCI_DAC_COMPARTMENT_ID not set.") + print() + print("Wiring (with the env vars set):") + print(""" + from locus import Agent + from locus.models import get_model + + # Pre-build the model — DAC needs provider-specific kwargs that + # Agent's strict AgentConfig doesn't accept on the keyword path. + region = os.environ["OCI_DAC_REGION"] + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=( + f"https://inference.generativeai.{region}.oci.oraclecloud.com" + ), + ) + agent = Agent( + model=model, + system_prompt="You are a concise assistant.", + ) +""") + return + + from locus import Agent + from locus.models import get_model + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + ) + agent = Agent( + model=model, + system_prompt="You are a concise assistant. Reply briefly.", + max_iterations=2, + ) + print(f"Agent configured against DAC endpoint in {region}.") + print(f" underlying model class: {type(agent._model).__name__}") + print(" serving mode (DAC): DedicatedServingMode (set by client)") + + +# ============================================================================= +# Part 3: complete() — single round-trip +# ============================================================================= + + +async def example_complete() -> None: + """Fire one chat at the DAC and print what comes back.""" + print("\n=== Part 3: complete() against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping — env vars not set.") + return + + from locus.core.messages import Message + from locus.models.providers.oci import OCIAuthType, OCIModel + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = OCIModel( + model_id=os.environ["OCI_DAC_ENDPOINT_OCID"], + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + auth_type=OCIAuthType.API_KEY, + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + max_tokens=128, + ) + + response = await model.complete( + messages=[ + Message.user("In one sentence, what model are you?"), + ], + tools=None, + ) + content = (response.message.content or "").strip() + print(f"Reply: {content!r}") + print(f"usage: {response.usage}") + print(f"stop_reason: {response.stop_reason}") + + +# ============================================================================= +# Part 4: stream() — real SSE deltas +# ============================================================================= + + +async def example_stream() -> None: + """Stream from the DAC and print each delta as it arrives.""" + print("\n=== Part 4: stream() against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping — env vars not set.") + return + + from locus.core.messages import Message + from locus.models.providers.oci import OCIAuthType, OCIModel + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = OCIModel( + model_id=os.environ["OCI_DAC_ENDPOINT_OCID"], + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + auth_type=OCIAuthType.API_KEY, + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + max_tokens=64, + ) + + print("Streaming reply (chunks shown inline):") + print(" ", end="", flush=True) + async for event in model.stream( + messages=[Message.user("Count from 1 to 5, separated by commas.")], + tools=None, + ): + if event.content: + print(event.content, end="", flush=True) + if event.done: + break + print() + + +# ============================================================================= +# Part 5: Agent + tool against the DAC +# ============================================================================= + + +async def example_agent_with_tool() -> None: + """Wire a tool-using Agent on top of the DAC. + + The DAC endpoint sees the same tool schema your on-demand models + do — locus passes the OpenAI-style tool definitions in the + ``GenericChatRequest.tools`` field. Whether the model on the + other end emits structured tool calls (OpenAI format, + ``message.tool_calls``) or text-format tool calls (Qwen's + ``{...}`` XML wrapper, etc.) depends on + the model and the deployment configuration: + + - **OpenAI / Llama / Cohere on OCI** — emit structured + ``tool_calls``. Locus extracts them automatically. + - **Qwen on a DAC** — by default emits ```` text + blocks. Locus's parser doesn't extract these, so + ``result.metrics.tool_calls`` will be 0 even though the model + "called" the tool in its content. Two options to fix: + (a) Configure the DAC to enable OpenAI-compatible tool-call + output (Qwen3 family supports this via + ``--enable-auto-tool-choice`` on the deployment). + (b) Wrap the agent with a parser that extracts the + ```` blocks from ``result.message`` and + re-issues them as locus ToolCall objects. + + This part of the tutorial just shows the wiring — what the model + does with it depends on the model. + """ + print("\n=== Part 5: Agent + @tool against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping — env vars not set.") + return + + from locus import Agent + from locus.models import get_model + from locus.tools.decorator import tool + + @tool(name="add_two_numbers") + def add_two_numbers(a: int, b: int) -> int: + """Return the sum of two integers.""" + return a + b + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + ) + agent = Agent( + model=model, + tools=[add_two_numbers], + system_prompt="You can call add_two_numbers when asked to add. Reply briefly.", + max_iterations=4, + ) + + result = await asyncio.to_thread( + agent.run_sync, + "Use the add_two_numbers tool to add 7 and 35, then state the result in one sentence.", + ) + print(f"final message: {result.message.strip()[:200]}") + print(f"iterations: {result.metrics.iterations}") + print(f"locus tool calls: {result.metrics.tool_calls}") + if "" in (result.message or ""): + print() + print("Note: the model emitted a text block instead of a") + print("structured tool_call. See the docstring for how to handle this") + print("(deployment flag or post-processing parser).") + + +# ============================================================================= +# Main +# ============================================================================= + + +async def _async_main() -> None: + example_routing() + example_configure_agent() + await example_complete() + await example_stream() + await example_agent_with_tool() + + +if __name__ == "__main__": + asyncio.run(_async_main())