Skip to content

fix(oci): rebuild signer on refresh + sign a real PreparedRequest (closes #285)#286

Merged
fede-kamel merged 2 commits into
mainfrom
fix/oci-instance-principal-token-refresh
May 29, 2026
Merged

fix(oci): rebuild signer on refresh + sign a real PreparedRequest (closes #285)#286
fede-kamel merged 2 commits into
mainfrom
fix/oci-instance-principal-token-refresh

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Closes #285.

Bug

OCI instance/resource-principal auth on the V1 OpenAI-compat
(OCIChatCompletionsModel) and Responses (OCIResponsesModel)
transports returned 401 INVALID_AUTHENTICATION_INFO on every call
once the federation security-token TTL (~20 min) elapsed in a long-lived
process. A freshly-started process worked, and the native OCI SDK
transport was unaffected — so the failure only showed up after a worker
had been up long enough for its first token to expire, and only a
process restart recovered it.

Two divergences from the canonical
oracle-samples/oci-genai-auth-python
reference, both fixed here:

  1. In-place token refresh → rebuild. _refresh_callable_for
    returned the signer's own refresh_security_token bound method for
    principal signers, mutating a cached federation client. It now
    rebuilds a brand-new signer (dispatched on auth_type);
    OCIRequestSigner swaps it in on the periodic timer and on
    401-retry, re-reading credentials from the instance metadata service.
    The same wiring is applied to the Responses transport, which had the
    identical latent bug.
  2. Duck-typed proxy → real PreparedRequest. OCIRequestSigner._sign
    now builds a requests.Request(...).prepare() and signs that — the
    object do_request_sign is written and tested against — instead of a
    hand-rolled stand-in.

Chore

  • Add requests.* to the mypy ignore_missing_imports override list,
    matching the existing oci.* / openai.* entries (_signing.py now
    imports requests directly; it is already a transitive oci dep).
  • Update unit tests to assert rebuild-on-refresh and PreparedRequest
    signing; CHANGELOG entry under [Unreleased].

Verification

  • Unit: full OCI suite green.

  • All three transports under instance principal against a live OCI
    GenAI endpoint — native SDK, V1 OpenAI-compat, Responses — all return
    a valid completion with the patched code.

  • A/B certification across the token-expiry boundary. One long-lived
    process held two signers side by side (so their tokens aged together)
    and issued a request on each every ~4 min. Lane A = previous
    in-place refresh; Lane B = this PR's rebuild-on-refresh:

    elapsed Lane A (old, in-place) Lane B (this PR, rebuild)
    0 min ok ok
    4 min ok ok
    8 min ok ok
    12 min ok (post 10-min refresh) ok
    16 min ok ok
    20 min 401 INVALID_AUTHENTICATION ok
    24 min 401 INVALID_AUTHENTICATION ok

    Lane A begins failing at exactly the ~20-min token TTL and stays
    failing (the reported production symptom); Lane B sails through the
    expiry boundary. This is the controlled before/after the fix targets.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 29, 2026
@fede-kamel
Copy link
Copy Markdown
Contributor Author

Certification complete — held across two token-expiry cycles

The A/B run continued past the first boundary through a second
~20-min token TTL. Lane B (this PR) stayed green the entire ~44 min;
Lane A (previous in-place refresh) failed continuously from the first
expiry and never recovered:

elapsed Lane A (old) Lane B (this PR)
16 min ok ok
20 min 401 ok (1st expiry)
24 min 401 ok
28 min 401 ok
32 min 401 ok
36 min 401 ok
40 min 401 ok (2nd expiry)
44 min 401 ok

Confirms the fix sustains correct signing across multiple federation-token
refresh cycles, not just one.

…oses #285)

Instance/resource-principal auth on the V1 OpenAI-compat
(OCIChatCompletionsModel) and Responses (OCIResponsesModel) transports
returned 401 INVALID_AUTHENTICATION_INFO on every call after the ~20-min
federation-token TTL in a long-lived process, while a fresh process
worked. Two causes, both aligned with oracle-samples/oci-genai-auth-python:

- _refresh_callable_for now rebuilds a brand-new signer on refresh
  (dispatched on auth_type) instead of returning the in-place
  refresh_security_token bound method; OCIRequestSigner swaps in the
  freshly-minted signer on the periodic timer and on 401-retry.
- OCIRequestSigner._sign builds a real requests.PreparedRequest instead
  of a hand-rolled duck-type. requests is already a transitive oci dep.

Same auth_type wiring applied to the Responses transport. Tests updated;
all three transports verified under instance principal against live OCI.

chore(oci): add requests.* to mypy ignore_missing_imports

Mirrors the existing oci.*/openai.* untyped-import overrides now that
_signing.py imports requests directly.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel force-pushed the fix/oci-instance-principal-token-refresh branch from 1729897 to 9cb011c Compare May 29, 2026 00:27
Unrelated to the OCI auth change in this PR but surfaced by the same CI
lint run: a newer redis-py stub types keys() as list[bytes | str], which
trips mypy on list_threads()'s declared list[str] return. The client sets
decode_responses=True, so keys are str at runtime — cast accordingly.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel merged commit da032c9 into main May 29, 2026
10 checks passed
@fede-kamel fede-kamel deleted the fix/oci-instance-principal-token-refresh branch May 29, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCI instance/resource-principal auth 401s in long-lived processes (in-place token refresh + duck-typed signing)

1 participant