diff --git a/README.md b/README.md index b630cbf9..49ba12ed 100644 --- a/README.md +++ b/README.md @@ -103,6 +103,7 @@ the [documentation](https://oracle-samples.github.io/locus/). | **[๐Ÿ“Š Evaluation](https://oracle-samples.github.io/locus/concepts/evaluation/)** | `EvalCase` / `EvalRunner` / `EvalReport` โ€” regression suites, custom evaluators, pass / score / duration reporting. | | **[๐Ÿ›‚ Termination algebra](https://oracle-samples.github.io/locus/concepts/termination/)** | Eight composable stop conditions on `Agent(termination=โ€ฆ)`: `MaxIterations \| TextMention("DONE") & ConfidenceMet(0.9)` is real Python (`__or__` / `__and__` overloads). | | **[๐Ÿงฐ Models](https://oracle-samples.github.io/locus/concepts/models/)** | OCI GenAI native (V1 + SDK transport, 90+ models, day-0) ยท OpenAI ยท Anthropic ยท Ollama. One auth surface for OCI: profile, session token, instance / resource principal. | +| **[๐Ÿ— OCI Dedicated AI Cluster](https://oracle-samples.github.io/locus/how-to/oci-dac/)** | Pass an `ocid1.generativeaiendpoint.....` OCID and locus auto-routes to `DedicatedServingMode` with real SSE streaming. Live-tested against Qwen on a London DAC. | ## The agent loop @@ -206,7 +207,7 @@ print(agent.run_sync( ## Tutorials -[`examples/`](examples/) has 39 progressive tutorials, each a single +[`examples/`](examples/) has 40 progressive tutorials, each a single runnable file. The full set runs end-to-end in CI on every commit; each tutorial is a working program against a real model. @@ -218,6 +219,7 @@ each tutorial is a working program against a real model. | **Multi-agent** | [`11_swarm_multiagent`](examples/tutorial_11_swarm_multiagent.py) ยท [`16_agent_handoff`](examples/tutorial_16_agent_handoff.py) ยท [`17_orchestrator_pattern`](examples/tutorial_17_orchestrator_pattern.py) ยท [`34_a2a_protocol`](examples/tutorial_34_a2a_protocol.py) | | **RAG** | [`22_rag_basics`](examples/tutorial_22_rag_basics.py) ยท [`24_rag_agents`](examples/tutorial_24_rag_agents.py) | | **Production** | [`19_guardrails_security`](examples/tutorial_19_guardrails_security.py) ยท [`20_checkpoint_backends`](examples/tutorial_20_checkpoint_backends.py) ยท [`28_agent_server`](examples/tutorial_28_agent_server.py) ยท [`37_termination`](examples/tutorial_37_termination.py) | +| **OCI** | [`29_model_providers`](examples/tutorial_29_model_providers.py) ยท [`40_oci_dac`](examples/tutorial_40_oci_dac.py) โ€” Dedicated AI Cluster endpoints | End-to-end demos: diff --git a/docs/how-to/oci-dac.md b/docs/how-to/oci-dac.md index 60103570..ab7d510f 100644 --- a/docs/how-to/oci-dac.md +++ b/docs/how-to/oci-dac.md @@ -36,6 +36,48 @@ get_model("oci:ocid1.generativeaiendpoint....") โ†’ SDK chat() routes to your DAC. ``` +## Confirmed working โ€” Qwen on a London DAC + +Live-tested against a `uk-london-1` DAC endpoint running Qwen +(Alibaba Cloud) on 2026-05-01. End-to-end results from the live run +(see [`examples/tutorial_40_oci_dac.py`](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py)): + +```text +=== Part 3: complete() against the DAC === +Reply: 'I am a large-scale language model developed by Alibaba Cloud, known as Qwen.' +usage: {'prompt_tokens': 17, 'completion_tokens': 18} +stop_reason: stop + +=== Part 4: stream() against the DAC === +Streaming reply (chunks shown inline): + 1, 2, 3, 4, 5 +``` + +What's proven by the run: + +- **Routing** โ€” `oci:ocid1.generativeaiendpoint....` resolved to + `OCIModel`, not `OCIOpenAIModel`. +- **Serving mode** โ€” `DedicatedServingMode(endpoint_id=...)` was + accepted by the live endpoint. +- **Real SSE** โ€” chunks arrived as character-by-character deltas, not + the fallback path. +- **Token accounting** โ€” `usage` populated correctly (17 / 18 tokens). + +What's still model-specific (Qwen on this DAC, with the deployment as +provisioned by Luigi's tenancy): + +- Tool calls come back as `{...}` text blocks + inside `message.content`, not as structured `tool_calls` array + entries. Locus's `GenericProvider.parse_response()` doesn't extract + them as `ToolCall`s. Two ways to fix: + 1. **Deploy-side**: configure the DAC with a Qwen3-family flag like + `--enable-auto-tool-choice` so the model emits OpenAI-style + `tool_calls`. Locus picks them up automatically. + 2. **Caller-side**: post-process `result.message` for + `{...}` blocks and re-issue them via + `agent.run_sync(...)` with the parsed call. A small regex + extraction; not built into locus today. + ## Streaming `OCIModel.stream()` flips `is_stream=True` on the underlying diff --git a/docs/index.md b/docs/index.md index 52a6d876..8506f112 100644 --- a/docs/index.md +++ b/docs/index.md @@ -280,6 +280,7 @@ them in one process; stream events from any of them in the same | **๐Ÿ“Š Evaluation** | `EvalCase` / `EvalRunner` / `EvalReport` regression suites. | | **๐Ÿ›‚ Termination algebra** | Eight composable stop conditions. `Or` and `And` compose them. | | **๐Ÿงฐ Models** | OCI GenAI native (V1 + SDK) ยท OpenAI ยท Anthropic ยท Ollama. | +| **๐Ÿ— OCI Dedicated AI Cluster** | Pass an `ocid1.generativeaiendpoint....` OCID, get `DedicatedServingMode` with real SSE streaming. Live-tested on Qwen / London. | ## Hello, agent @@ -350,7 +351,7 @@ Read the [concepts](concepts/agent.md) for the *why*; read the ## Learn locus in an afternoon The [`examples/`](https://github.com/oracle-samples/locus/tree/main/examples) -tree is **39 tutorials** plus **3 end-to-end demos**. Every tutorial +tree is **40 tutorials** plus **3 end-to-end demos**. Every tutorial is one runnable file and adds exactly one idea on top of the previous. ### Track 1 โ€” basics (first hour) @@ -394,7 +395,7 @@ The six in-process patterns plus A2A: [RAG agents](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_24_rag_agents.py) ยท [Skills](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_32_skills.py). -### Track 5 โ€” production (12, 19โ€“21, 26โ€“30, 33, 35, 37โ€“39) +### Track 5 โ€” production (12, 19โ€“21, 26โ€“30, 33, 35, 37โ€“40) [MCP](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_12_mcp_integration.py) ยท [Guardrails](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_19_guardrails_security.py) ยท @@ -410,7 +411,8 @@ The six in-process patterns plus A2A: [Graph advanced](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_35_graph_advanced.py) ยท [Termination](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_37_termination.py) ยท [Multi-modal providers](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_38_multimodal_providers.py) ยท -[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py). +[GSAR typed grounding](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_39_gsar_typed_grounding.py) ยท +[OCI Dedicated AI Cluster (DAC)](https://github.com/oracle-samples/locus/blob/main/examples/tutorial_40_oci_dac.py). ### End-to-end demos diff --git a/examples/tutorial_40_oci_dac.py b/examples/tutorial_40_oci_dac.py new file mode 100644 index 00000000..dda51b66 --- /dev/null +++ b/examples/tutorial_40_oci_dac.py @@ -0,0 +1,301 @@ +""" +Tutorial 40: OCI Dedicated AI Cluster (DAC) endpoints + +This tutorial covers locus's DAC support. A DAC is OCI's +provisioned-capacity serving mode for OCI GenAI: instead of pay-per-token +inference against a shared model id, you address a dedicated endpoint +by its OCID (``ocid1.generativeaiendpoint.oc1.....``) and +inference is routed to your cluster. + +Locus auto-detects DAC OCIDs and routes them through the SDK +transport (``OCIModel``) โ€” the V1 OpenAI-compatible transport can't +speak ``DedicatedServingMode``. Both non-streaming ``complete()`` and +real SSE ``stream()`` work end-to-end against a DAC. + +This tutorial covers: + +- Part 1: how DAC routing is decided (``ocid1.generativeaiendpoint.`` + prefix โ†’ ``OCIModel``). +- Part 2: configure an ``Agent`` against a DAC endpoint. +- Part 3: drive ``complete()`` against the DAC with one prompt. +- Part 4: drive ``stream()`` and watch SSE deltas come back. +- Part 5: wire the DAC into a tool-using ``Agent`` so the model sitting + on dedicated capacity can call your @tool functions. + +Prerequisites: +- ``oci`` SDK installed (``pip install -e ".[oci]"``). +- An OCI profile with permission to invoke the DAC endpoint. +- The DAC endpoint OCID, the compartment OCID, and the region. + +Set these env vars (kept out of the source so the tutorial works for +any DAC): + + export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.uk-london-1.... + export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1.... + export OCI_DAC_REGION=uk-london-1 + export OCI_PROFILE=MY_DAC_PROFILE + +Without those env vars Parts 2-5 print the wiring snippet and skip. + +Difficulty: Intermediate +""" + +from __future__ import annotations + +import asyncio +import os + + +# ============================================================================= +# Part 1: Auto-routing +# ============================================================================= + + +def example_routing() -> None: + """How locus decides to use the SDK transport for DAC OCIDs.""" + print("=== Part 1: Auto-routing ===\n") + + print("locus.models.registry inspects the model id with three rules:") + print() + print(" 1. ocid1.generativeaiendpoint..... โ†’ OCIModel (SDK)") + print(" 2. cohere.command-r-* โ†’ OCIModel (SDK)") + print(" 3. everything else โ†’ OCIOpenAIModel (V1)") + print() + print("So both calls route to the SDK transport:") + print() + print(' Agent(model="oci:cohere.command-r-plus-08-2024") # rule 2') + print(' Agent(model="oci:ocid1.generativeaiendpoint....") # rule 1') + print() + print("DAC needs the SDK transport because DedicatedServingMode") + print("(endpoint_id=...) is part of the OCI proprietary chat shape,") + print("not the OpenAI-compatible /v1/chat/completions endpoint.") + + +# ============================================================================= +# Part 2: Configure an Agent against a DAC +# ============================================================================= + + +def _dac_env_ready() -> bool: + return bool( + os.environ.get("OCI_DAC_ENDPOINT_OCID") and os.environ.get("OCI_DAC_COMPARTMENT_ID") + ) + + +def example_configure_agent() -> None: + """Build an Agent pointed at a DAC. Just like any other model.""" + print("\n=== Part 2: Configure Agent against a DAC ===\n") + + if not _dac_env_ready(): + print("OCI_DAC_ENDPOINT_OCID / OCI_DAC_COMPARTMENT_ID not set.") + print() + print("Wiring (with the env vars set):") + print(""" + from locus import Agent + from locus.models import get_model + + # Pre-build the model โ€” DAC needs provider-specific kwargs that + # Agent's strict AgentConfig doesn't accept on the keyword path. + region = os.environ["OCI_DAC_REGION"] + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=( + f"https://inference.generativeai.{region}.oci.oraclecloud.com" + ), + ) + agent = Agent( + model=model, + system_prompt="You are a concise assistant.", + ) +""") + return + + from locus import Agent + from locus.models import get_model + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + ) + agent = Agent( + model=model, + system_prompt="You are a concise assistant. Reply briefly.", + max_iterations=2, + ) + print(f"Agent configured against DAC endpoint in {region}.") + print(f" underlying model class: {type(agent._model).__name__}") + print(" serving mode (DAC): DedicatedServingMode (set by client)") + + +# ============================================================================= +# Part 3: complete() โ€” single round-trip +# ============================================================================= + + +async def example_complete() -> None: + """Fire one chat at the DAC and print what comes back.""" + print("\n=== Part 3: complete() against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping โ€” env vars not set.") + return + + from locus.core.messages import Message + from locus.models.providers.oci import OCIAuthType, OCIModel + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = OCIModel( + model_id=os.environ["OCI_DAC_ENDPOINT_OCID"], + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + auth_type=OCIAuthType.API_KEY, + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + max_tokens=128, + ) + + response = await model.complete( + messages=[ + Message.user("In one sentence, what model are you?"), + ], + tools=None, + ) + content = (response.message.content or "").strip() + print(f"Reply: {content!r}") + print(f"usage: {response.usage}") + print(f"stop_reason: {response.stop_reason}") + + +# ============================================================================= +# Part 4: stream() โ€” real SSE deltas +# ============================================================================= + + +async def example_stream() -> None: + """Stream from the DAC and print each delta as it arrives.""" + print("\n=== Part 4: stream() against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping โ€” env vars not set.") + return + + from locus.core.messages import Message + from locus.models.providers.oci import OCIAuthType, OCIModel + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = OCIModel( + model_id=os.environ["OCI_DAC_ENDPOINT_OCID"], + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + auth_type=OCIAuthType.API_KEY, + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + max_tokens=64, + ) + + print("Streaming reply (chunks shown inline):") + print(" ", end="", flush=True) + async for event in model.stream( + messages=[Message.user("Count from 1 to 5, separated by commas.")], + tools=None, + ): + if event.content: + print(event.content, end="", flush=True) + if event.done: + break + print() + + +# ============================================================================= +# Part 5: Agent + tool against the DAC +# ============================================================================= + + +async def example_agent_with_tool() -> None: + """Wire a tool-using Agent on top of the DAC. + + The DAC endpoint sees the same tool schema your on-demand models + do โ€” locus passes the OpenAI-style tool definitions in the + ``GenericChatRequest.tools`` field. Whether the model on the + other end emits structured tool calls (OpenAI format, + ``message.tool_calls``) or text-format tool calls (Qwen's + ``{...}`` XML wrapper, etc.) depends on + the model and the deployment configuration: + + - **OpenAI / Llama / Cohere on OCI** โ€” emit structured + ``tool_calls``. Locus extracts them automatically. + - **Qwen on a DAC** โ€” by default emits ```` text + blocks. Locus's parser doesn't extract these, so + ``result.metrics.tool_calls`` will be 0 even though the model + "called" the tool in its content. Two options to fix: + (a) Configure the DAC to enable OpenAI-compatible tool-call + output (Qwen3 family supports this via + ``--enable-auto-tool-choice`` on the deployment). + (b) Wrap the agent with a parser that extracts the + ```` blocks from ``result.message`` and + re-issues them as locus ToolCall objects. + + This part of the tutorial just shows the wiring โ€” what the model + does with it depends on the model. + """ + print("\n=== Part 5: Agent + @tool against the DAC ===\n") + + if not _dac_env_ready(): + print("Skipping โ€” env vars not set.") + return + + from locus import Agent + from locus.models import get_model + from locus.tools.decorator import tool + + @tool(name="add_two_numbers") + def add_two_numbers(a: int, b: int) -> int: + """Return the sum of two integers.""" + return a + b + + region = os.environ.get("OCI_DAC_REGION", "us-chicago-1") + model = get_model( + f"oci:{os.environ['OCI_DAC_ENDPOINT_OCID']}", + compartment_id=os.environ["OCI_DAC_COMPARTMENT_ID"], + profile_name=os.environ.get("OCI_PROFILE", "DEFAULT"), + service_endpoint=f"https://inference.generativeai.{region}.oci.oraclecloud.com", + ) + agent = Agent( + model=model, + tools=[add_two_numbers], + system_prompt="You can call add_two_numbers when asked to add. Reply briefly.", + max_iterations=4, + ) + + result = await asyncio.to_thread( + agent.run_sync, + "Use the add_two_numbers tool to add 7 and 35, then state the result in one sentence.", + ) + print(f"final message: {result.message.strip()[:200]}") + print(f"iterations: {result.metrics.iterations}") + print(f"locus tool calls: {result.metrics.tool_calls}") + if "" in (result.message or ""): + print() + print("Note: the model emitted a text block instead of a") + print("structured tool_call. See the docstring for how to handle this") + print("(deployment flag or post-processing parser).") + + +# ============================================================================= +# Main +# ============================================================================= + + +async def _async_main() -> None: + example_routing() + example_configure_agent() + await example_complete() + await example_stream() + await example_agent_with_tool() + + +if __name__ == "__main__": + asyncio.run(_async_main())