Skip to content

feat: OCI Dedicated AI Cluster (DAC) endpoint support#24

Merged
fede-kamel merged 2 commits into
mainfrom
feat/oci-dac-support
May 1, 2026
Merged

feat: OCI Dedicated AI Cluster (DAC) endpoint support#24
fede-kamel merged 2 commits into
mainfrom
feat/oci-dac-support

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Summary

OCI GenAI exposes two serving modes — on-demand (pay-per-token, shared model id) and Dedicated AI Cluster (provisioned capacity, addressed by `ocid1.generativeaiendpoint.oc1.....` OCID). Locus already used `DedicatedServingMode` when `OCIClient` saw an OCID-shaped model id, but the registry routed every non-Cohere-R model id through the V1 OpenAI-compatible transport — which can't speak DAC. So passing a DAC OCID via `Agent(model="oci:...")` fell through to V1 and silently failed.

This PR closes the gap.

Routing

`locus.models.registry._make_oci` now matches DAC OCIDs first:

Model id pattern Transport
`ocid1.generativeaiendpoint.....` `OCIModel` (SDK)
`cohere.command-r-*` `OCIModel` (SDK)
everything else `OCIOpenAIModel` (V1)

`examples/config.py::_pick_oci_transport` mirrors the same rule.

Streaming

`OCIModel.stream` previously fell through to `complete()` and hand-chunked. Now it sets `is_stream=True` on the chat request, calls the SDK's `client.chat()`, and iterates the SSE event stream. Each event is parsed by the provider's existing `parse_stream_chunk` (Generic for Llama / OpenAI / xAI / Mistral / Gemini; Cohere for R-series) into `(content_delta, tool_calls_delta, is_done)`.

Defensive: any failure (including DAC endpoints that reject `is_stream`) falls back to non-streaming and yields a single chunk — never hard-fails the stream.

Tests

`tests/unit/test_oci_dac.py` — 12 unit tests:

  • `get_model("oci:ocid1.generativeaiendpoint....")` returns `OCIModel`.
  • Cohere R-series still routes to `OCIModel` (regression).
  • `oci:openai.gpt-5.5` continues to route to `OCIOpenAIModel` (regression).
  • `OCIClient.get_serving_mode` returns `DedicatedServingMode` for endpoint OCIDs and `OnDemandServingMode` for plain model ids.
  • `GenericProvider.parse_stream_chunk` handles text deltas, finish reasons, tool-call deltas, malformed tool args.
  • `CohereProvider.parse_stream_chunk` handles text deltas and final-event tool calls.
  • `examples/config.py::_pick_oci_transport` returns `"sdk"` for DAC OCIDs.

All fixtures use synthetic placeholder OCIDs — no real tenancy / endpoint identifiers in the codebase.

Docs

  • `docs/how-to/oci-dac.md` — when to use DAC, how to wire it, auth options, streaming behaviour, common failures.
  • `mkdocs.yml` adds it under `How-to → OCI Dedicated AI Cluster (DAC)`.

Validation

  • 3205 unit tests pass (12 new), no regressions.
  • `hatch run check` clean — format-check + ruff + mypy across `src/tests/examples` (369 files).

Test plan

  • CI green (`CI Success` aggregator).
  • Live endpoint testing left to whoever has access to a test DAC — the unit tests + the working OCID-shaped fixture are the non-live guarantee the wire-up is correct. Once a tester confirms inference works against a real DAC endpoint, can follow up with a gated live integration test.

Usage

```python
from locus import Agent

agent = Agent(
model="oci:ocid1.generativeaiendpoint.oc1.....",
compartment_id="ocid1.compartment.oc1...",
profile_name="DEFAULT",
)
```

That's it — same one-line API as on-demand. Streaming works automatically.

OCI GenAI exposes two serving modes — on-demand (pay-per-token,
shared model id) and Dedicated AI Cluster (provisioned capacity,
addressed by ``ocid1.generativeaiendpoint.oc1.<region>....`` OCID).
Locus already used ``DedicatedServingMode`` when ``OCIClient`` saw
an OCID-shaped model id, but the registry routed every non-Cohere-R
model id through the V1 OpenAI-compatible transport — which can't
speak DAC. So passing a DAC OCID via ``Agent(model="oci:...")``
fell through to V1 and silently failed.

Routing
-------
``locus.models.registry._make_oci`` now matches DAC OCIDs first:

  ocid1.generativeaiendpoint.<region>....    → OCIModel (SDK transport)
  cohere.command-r-*                          → OCIModel (SDK transport)
  everything else                             → OCIOpenAIModel (V1)

``examples/config.py::_pick_oci_transport`` mirrors the same rule
so the env-var-driven tutorial workflow picks the right transport
when ``LOCUS_MODEL_ID`` is a DAC endpoint OCID.

Streaming
---------
``OCIModel.stream`` previously fell through to a single ``complete()``
call and hand-chunked the result. Now it sets ``is_stream=True`` on
the chat request, calls the SDK's ``client.chat()``, and iterates
the SSE event stream that comes back. Each event is parsed by the
provider's existing ``parse_stream_chunk`` (Generic for Llama /
OpenAI / xAI / Mistral / Gemini, Cohere for Command-R-series) into
``(content_delta, tool_calls_delta, is_done)``. Both serving modes
(on-demand and DAC) and both request shapes are covered.

Defensive: any failure during the streaming chat (including DAC
endpoints that reject ``is_stream``) falls back to the non-streaming
path and yields a single chunk with the full content, so a
mis-configured endpoint never hard-fails the stream.

Tests
-----
``tests/unit/test_oci_dac.py`` — 12 unit tests:

- ``get_model("oci:ocid1.generativeaiendpoint....")`` returns
  ``OCIModel``.
- Cohere R-series still routes to ``OCIModel`` (regression).
- ``oci:openai.gpt-5.5`` continues to route to ``OCIOpenAIModel``
  (regression).
- ``OCIClient.get_serving_mode`` returns ``DedicatedServingMode`` for
  endpoint OCIDs and ``OnDemandServingMode`` for plain model ids.
- ``GenericProvider.parse_stream_chunk`` handles text deltas, finish
  reasons, tool-call deltas, and malformed tool args.
- ``CohereProvider.parse_stream_chunk`` handles text deltas and
  final-event tool calls.
- ``examples/config.py::_pick_oci_transport`` returns ``"sdk"`` for
  DAC OCIDs.

All test fixtures use synthetic placeholder OCIDs — no real tenancy /
endpoint identifiers are committed (CLAUDE.md privacy rule).

Docs
----
- ``docs/how-to/oci-dac.md`` — when to use DAC, how to wire it,
  auth options, streaming behaviour, common failures, and
  cross-references to the source files.
- ``mkdocs.yml`` adds the new how-to page under
  ``How-to → OCI Dedicated AI Cluster (DAC)``.

Validation
----------
- 3205 unit tests pass (12 new), no regressions.
- ``hatch run check`` clean: format-check + ruff + mypy across
  ``src/tests/examples`` (369 files).
- Live endpoint testing left to whoever has access to the test DAC —
  the unit tests + a working OCID-shaped fixture are the non-live
  guarantee that the routing + streaming wire-up is correct.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 1, 2026
Three live tests in tests/integration/test_oci_dac_live.py that fire
real inference at a DAC endpoint when configured, and skip cleanly
otherwise:

- test_dac_complete_returns_content — non-streaming chat returns
  non-empty content from the DAC.
- test_dac_stream_yields_chunks — streaming chat yields ≥1 content
  chunk + done event. Robust to endpoints that reject is_stream
  (the OCIModel.stream fallback path keeps the assertion meaningful).
- test_dac_via_get_model_routes_to_oci_model — verifies the registry
  routing actually returns an OCIModel for a DAC OCID end-to-end.

Activation:

  export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.<region>....
  export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1....
  export OCI_DAC_REGION=uk-london-1
  export OCI_PROFILE=MY_DAC_PROFILE
  pytest tests/integration/test_oci_dac_live.py -v

OCIDs are read from env vars, never committed (CLAUDE.md privacy
rule). The tests stay informative regardless of which model is
behind the DAC — qwen, llama, command-a — since they probe layer
behaviour, not model behaviour.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel merged commit 3852358 into main May 1, 2026
10 checks passed
@fede-kamel fede-kamel deleted the feat/oci-dac-support branch May 13, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant