feat: OCI Dedicated AI Cluster (DAC) endpoint support by fede-kamel · Pull Request #24 · oracle-samples/locus

fede-kamel · 2026-05-01T12:58:46Z

Summary

OCI GenAI exposes two serving modes — on-demand (pay-per-token, shared model id) and Dedicated AI Cluster (provisioned capacity, addressed by `ocid1.generativeaiendpoint.oc1.....` OCID). Locus already used `DedicatedServingMode` when `OCIClient` saw an OCID-shaped model id, but the registry routed every non-Cohere-R model id through the V1 OpenAI-compatible transport — which can't speak DAC. So passing a DAC OCID via `Agent(model="oci:...")` fell through to V1 and silently failed.

This PR closes the gap.

Routing

`locus.models.registry._make_oci` now matches DAC OCIDs first:

Model id pattern	Transport
`ocid1.generativeaiendpoint.....`	`OCIModel` (SDK)
`cohere.command-r-*`	`OCIModel` (SDK)
everything else	`OCIOpenAIModel` (V1)

`examples/config.py::_pick_oci_transport` mirrors the same rule.

Streaming

`OCIModel.stream` previously fell through to `complete()` and hand-chunked. Now it sets `is_stream=True` on the chat request, calls the SDK's `client.chat()`, and iterates the SSE event stream. Each event is parsed by the provider's existing `parse_stream_chunk` (Generic for Llama / OpenAI / xAI / Mistral / Gemini; Cohere for R-series) into `(content_delta, tool_calls_delta, is_done)`.

Defensive: any failure (including DAC endpoints that reject `is_stream`) falls back to non-streaming and yields a single chunk — never hard-fails the stream.

Tests

`tests/unit/test_oci_dac.py` — 12 unit tests:

`get_model("oci:ocid1.generativeaiendpoint....")` returns `OCIModel`.
Cohere R-series still routes to `OCIModel` (regression).
`oci:openai.gpt-5.5` continues to route to `OCIOpenAIModel` (regression).
`OCIClient.get_serving_mode` returns `DedicatedServingMode` for endpoint OCIDs and `OnDemandServingMode` for plain model ids.
`GenericProvider.parse_stream_chunk` handles text deltas, finish reasons, tool-call deltas, malformed tool args.
`CohereProvider.parse_stream_chunk` handles text deltas and final-event tool calls.
`examples/config.py::_pick_oci_transport` returns `"sdk"` for DAC OCIDs.

All fixtures use synthetic placeholder OCIDs — no real tenancy / endpoint identifiers in the codebase.

Docs

`docs/how-to/oci-dac.md` — when to use DAC, how to wire it, auth options, streaming behaviour, common failures.
`mkdocs.yml` adds it under `How-to → OCI Dedicated AI Cluster (DAC)`.

Validation

3205 unit tests pass (12 new), no regressions.
`hatch run check` clean — format-check + ruff + mypy across `src/tests/examples` (369 files).

Test plan

CI green (`CI Success` aggregator).
Live endpoint testing left to whoever has access to a test DAC — the unit tests + the working OCID-shaped fixture are the non-live guarantee the wire-up is correct. Once a tester confirms inference works against a real DAC endpoint, can follow up with a gated live integration test.

Usage

```python
from locus import Agent

agent = Agent(
model="oci:ocid1.generativeaiendpoint.oc1.....",
compartment_id="ocid1.compartment.oc1...",
profile_name="DEFAULT",
)
```

That's it — same one-line API as on-demand. Streaming works automatically.

OCI GenAI exposes two serving modes — on-demand (pay-per-token, shared model id) and Dedicated AI Cluster (provisioned capacity, addressed by ``ocid1.generativeaiendpoint.oc1.<region>....`` OCID). Locus already used ``DedicatedServingMode`` when ``OCIClient`` saw an OCID-shaped model id, but the registry routed every non-Cohere-R model id through the V1 OpenAI-compatible transport — which can't speak DAC. So passing a DAC OCID via ``Agent(model="oci:...")`` fell through to V1 and silently failed. Routing ------- ``locus.models.registry._make_oci`` now matches DAC OCIDs first: ocid1.generativeaiendpoint.<region>.... → OCIModel (SDK transport) cohere.command-r-* → OCIModel (SDK transport) everything else → OCIOpenAIModel (V1) ``examples/config.py::_pick_oci_transport`` mirrors the same rule so the env-var-driven tutorial workflow picks the right transport when ``LOCUS_MODEL_ID`` is a DAC endpoint OCID. Streaming --------- ``OCIModel.stream`` previously fell through to a single ``complete()`` call and hand-chunked the result. Now it sets ``is_stream=True`` on the chat request, calls the SDK's ``client.chat()``, and iterates the SSE event stream that comes back. Each event is parsed by the provider's existing ``parse_stream_chunk`` (Generic for Llama / OpenAI / xAI / Mistral / Gemini, Cohere for Command-R-series) into ``(content_delta, tool_calls_delta, is_done)``. Both serving modes (on-demand and DAC) and both request shapes are covered. Defensive: any failure during the streaming chat (including DAC endpoints that reject ``is_stream``) falls back to the non-streaming path and yields a single chunk with the full content, so a mis-configured endpoint never hard-fails the stream. Tests ----- ``tests/unit/test_oci_dac.py`` — 12 unit tests: - ``get_model("oci:ocid1.generativeaiendpoint....")`` returns ``OCIModel``. - Cohere R-series still routes to ``OCIModel`` (regression). - ``oci:openai.gpt-5.5`` continues to route to ``OCIOpenAIModel`` (regression). - ``OCIClient.get_serving_mode`` returns ``DedicatedServingMode`` for endpoint OCIDs and ``OnDemandServingMode`` for plain model ids. - ``GenericProvider.parse_stream_chunk`` handles text deltas, finish reasons, tool-call deltas, and malformed tool args. - ``CohereProvider.parse_stream_chunk`` handles text deltas and final-event tool calls. - ``examples/config.py::_pick_oci_transport`` returns ``"sdk"`` for DAC OCIDs. All test fixtures use synthetic placeholder OCIDs — no real tenancy / endpoint identifiers are committed (CLAUDE.md privacy rule). Docs ---- - ``docs/how-to/oci-dac.md`` — when to use DAC, how to wire it, auth options, streaming behaviour, common failures, and cross-references to the source files. - ``mkdocs.yml`` adds the new how-to page under ``How-to → OCI Dedicated AI Cluster (DAC)``. Validation ---------- - 3205 unit tests pass (12 new), no regressions. - ``hatch run check`` clean: format-check + ruff + mypy across ``src/tests/examples`` (369 files). - Live endpoint testing left to whoever has access to the test DAC — the unit tests + a working OCID-shaped fixture are the non-live guarantee that the routing + streaming wire-up is correct. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Three live tests in tests/integration/test_oci_dac_live.py that fire real inference at a DAC endpoint when configured, and skip cleanly otherwise: - test_dac_complete_returns_content — non-streaming chat returns non-empty content from the DAC. - test_dac_stream_yields_chunks — streaming chat yields ≥1 content chunk + done event. Robust to endpoints that reject is_stream (the OCIModel.stream fallback path keeps the assertion meaningful). - test_dac_via_get_model_routes_to_oci_model — verifies the registry routing actually returns an OCIModel for a DAC OCID end-to-end. Activation: export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.<region>.... export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1.... export OCI_DAC_REGION=uk-london-1 export OCI_PROFILE=MY_DAC_PROFILE pytest tests/integration/test_oci_dac_live.py -v OCIDs are read from env vars, never committed (CLAUDE.md privacy rule). The tests stay informative regardless of which model is behind the DAC — qwen, llama, command-a — since they probe layer behaviour, not model behaviour. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 1, 2026

fede-kamel merged commit 3852358 into main May 1, 2026
10 checks passed

fede-kamel mentioned this pull request May 1, 2026

docs(oci-dac): tutorial 40 + empirical Qwen confirmation + website #25

Merged

fede-kamel deleted the feat/oci-dac-support branch May 13, 2026 04:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OCI Dedicated AI Cluster (DAC) endpoint support#24

feat: OCI Dedicated AI Cluster (DAC) endpoint support#24
fede-kamel merged 2 commits into
mainfrom
feat/oci-dac-support

fede-kamel commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented May 1, 2026

Summary

Routing

Streaming

Tests

Docs

Validation

Test plan

Usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant