feat: OCI Dedicated AI Cluster (DAC) endpoint support#24
Merged
Conversation
OCI GenAI exposes two serving modes — on-demand (pay-per-token,
shared model id) and Dedicated AI Cluster (provisioned capacity,
addressed by ``ocid1.generativeaiendpoint.oc1.<region>....`` OCID).
Locus already used ``DedicatedServingMode`` when ``OCIClient`` saw
an OCID-shaped model id, but the registry routed every non-Cohere-R
model id through the V1 OpenAI-compatible transport — which can't
speak DAC. So passing a DAC OCID via ``Agent(model="oci:...")``
fell through to V1 and silently failed.
Routing
-------
``locus.models.registry._make_oci`` now matches DAC OCIDs first:
ocid1.generativeaiendpoint.<region>.... → OCIModel (SDK transport)
cohere.command-r-* → OCIModel (SDK transport)
everything else → OCIOpenAIModel (V1)
``examples/config.py::_pick_oci_transport`` mirrors the same rule
so the env-var-driven tutorial workflow picks the right transport
when ``LOCUS_MODEL_ID`` is a DAC endpoint OCID.
Streaming
---------
``OCIModel.stream`` previously fell through to a single ``complete()``
call and hand-chunked the result. Now it sets ``is_stream=True`` on
the chat request, calls the SDK's ``client.chat()``, and iterates
the SSE event stream that comes back. Each event is parsed by the
provider's existing ``parse_stream_chunk`` (Generic for Llama /
OpenAI / xAI / Mistral / Gemini, Cohere for Command-R-series) into
``(content_delta, tool_calls_delta, is_done)``. Both serving modes
(on-demand and DAC) and both request shapes are covered.
Defensive: any failure during the streaming chat (including DAC
endpoints that reject ``is_stream``) falls back to the non-streaming
path and yields a single chunk with the full content, so a
mis-configured endpoint never hard-fails the stream.
Tests
-----
``tests/unit/test_oci_dac.py`` — 12 unit tests:
- ``get_model("oci:ocid1.generativeaiendpoint....")`` returns
``OCIModel``.
- Cohere R-series still routes to ``OCIModel`` (regression).
- ``oci:openai.gpt-5.5`` continues to route to ``OCIOpenAIModel``
(regression).
- ``OCIClient.get_serving_mode`` returns ``DedicatedServingMode`` for
endpoint OCIDs and ``OnDemandServingMode`` for plain model ids.
- ``GenericProvider.parse_stream_chunk`` handles text deltas, finish
reasons, tool-call deltas, and malformed tool args.
- ``CohereProvider.parse_stream_chunk`` handles text deltas and
final-event tool calls.
- ``examples/config.py::_pick_oci_transport`` returns ``"sdk"`` for
DAC OCIDs.
All test fixtures use synthetic placeholder OCIDs — no real tenancy /
endpoint identifiers are committed (CLAUDE.md privacy rule).
Docs
----
- ``docs/how-to/oci-dac.md`` — when to use DAC, how to wire it,
auth options, streaming behaviour, common failures, and
cross-references to the source files.
- ``mkdocs.yml`` adds the new how-to page under
``How-to → OCI Dedicated AI Cluster (DAC)``.
Validation
----------
- 3205 unit tests pass (12 new), no regressions.
- ``hatch run check`` clean: format-check + ruff + mypy across
``src/tests/examples`` (369 files).
- Live endpoint testing left to whoever has access to the test DAC —
the unit tests + a working OCID-shaped fixture are the non-live
guarantee that the routing + streaming wire-up is correct.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Three live tests in tests/integration/test_oci_dac_live.py that fire real inference at a DAC endpoint when configured, and skip cleanly otherwise: - test_dac_complete_returns_content — non-streaming chat returns non-empty content from the DAC. - test_dac_stream_yields_chunks — streaming chat yields ≥1 content chunk + done event. Robust to endpoints that reject is_stream (the OCIModel.stream fallback path keeps the assertion meaningful). - test_dac_via_get_model_routes_to_oci_model — verifies the registry routing actually returns an OCIModel for a DAC OCID end-to-end. Activation: export OCI_DAC_ENDPOINT_OCID=ocid1.generativeaiendpoint.oc1.<region>.... export OCI_DAC_COMPARTMENT_ID=ocid1.compartment.oc1.... export OCI_DAC_REGION=uk-london-1 export OCI_PROFILE=MY_DAC_PROFILE pytest tests/integration/test_oci_dac_live.py -v OCIDs are read from env vars, never committed (CLAUDE.md privacy rule). The tests stay informative regardless of which model is behind the DAC — qwen, llama, command-a — since they probe layer behaviour, not model behaviour. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OCI GenAI exposes two serving modes — on-demand (pay-per-token, shared model id) and Dedicated AI Cluster (provisioned capacity, addressed by `ocid1.generativeaiendpoint.oc1.....` OCID). Locus already used `DedicatedServingMode` when `OCIClient` saw an OCID-shaped model id, but the registry routed every non-Cohere-R model id through the V1 OpenAI-compatible transport — which can't speak DAC. So passing a DAC OCID via `Agent(model="oci:...")` fell through to V1 and silently failed.
This PR closes the gap.
Routing
`locus.models.registry._make_oci` now matches DAC OCIDs first:
`examples/config.py::_pick_oci_transport` mirrors the same rule.
Streaming
`OCIModel.stream` previously fell through to `complete()` and hand-chunked. Now it sets `is_stream=True` on the chat request, calls the SDK's `client.chat()`, and iterates the SSE event stream. Each event is parsed by the provider's existing `parse_stream_chunk` (Generic for Llama / OpenAI / xAI / Mistral / Gemini; Cohere for R-series) into `(content_delta, tool_calls_delta, is_done)`.
Defensive: any failure (including DAC endpoints that reject `is_stream`) falls back to non-streaming and yields a single chunk — never hard-fails the stream.
Tests
`tests/unit/test_oci_dac.py` — 12 unit tests:
All fixtures use synthetic placeholder OCIDs — no real tenancy / endpoint identifiers in the codebase.
Docs
Validation
Test plan
Usage
```python
from locus import Agent
agent = Agent(
model="oci:ocid1.generativeaiendpoint.oc1.....",
compartment_id="ocid1.compartment.oc1...",
profile_name="DEFAULT",
)
```
That's it — same one-line API as on-demand. Streaming works automatically.