fix(oci): rebuild signer on refresh + sign a real PreparedRequest (closes #285)#286
Merged
Merged
Conversation
Contributor
Author
Certification complete — held across two token-expiry cyclesThe A/B run continued past the first boundary through a second
Confirms the fix sustains correct signing across multiple federation-token |
…oses #285) Instance/resource-principal auth on the V1 OpenAI-compat (OCIChatCompletionsModel) and Responses (OCIResponsesModel) transports returned 401 INVALID_AUTHENTICATION_INFO on every call after the ~20-min federation-token TTL in a long-lived process, while a fresh process worked. Two causes, both aligned with oracle-samples/oci-genai-auth-python: - _refresh_callable_for now rebuilds a brand-new signer on refresh (dispatched on auth_type) instead of returning the in-place refresh_security_token bound method; OCIRequestSigner swaps in the freshly-minted signer on the periodic timer and on 401-retry. - OCIRequestSigner._sign builds a real requests.PreparedRequest instead of a hand-rolled duck-type. requests is already a transitive oci dep. Same auth_type wiring applied to the Responses transport. Tests updated; all three transports verified under instance principal against live OCI. chore(oci): add requests.* to mypy ignore_missing_imports Mirrors the existing oci.*/openai.* untyped-import overrides now that _signing.py imports requests directly. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
1729897 to
9cb011c
Compare
Unrelated to the OCI auth change in this PR but surfaced by the same CI lint run: a newer redis-py stub types keys() as list[bytes | str], which trips mypy on list_threads()'s declared list[str] return. The client sets decode_responses=True, so keys are str at runtime — cast accordingly. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #285.
Bug
OCI instance/resource-principal auth on the V1 OpenAI-compat
(
OCIChatCompletionsModel) and Responses (OCIResponsesModel)transports returned
401 INVALID_AUTHENTICATION_INFOon every callonce the federation security-token TTL (~20 min) elapsed in a long-lived
process. A freshly-started process worked, and the native OCI SDK
transport was unaffected — so the failure only showed up after a worker
had been up long enough for its first token to expire, and only a
process restart recovered it.
Two divergences from the canonical
oracle-samples/oci-genai-auth-pythonreference, both fixed here:
_refresh_callable_forreturned the signer's own
refresh_security_tokenbound method forprincipal signers, mutating a cached federation client. It now
rebuilds a brand-new signer (dispatched on
auth_type);OCIRequestSignerswaps it in on the periodic timer and on401-retry, re-reading credentials from the instance metadata service.
The same wiring is applied to the Responses transport, which had the
identical latent bug.
PreparedRequest.OCIRequestSigner._signnow builds a
requests.Request(...).prepare()and signs that — theobject
do_request_signis written and tested against — instead of ahand-rolled stand-in.
Chore
requests.*to the mypyignore_missing_importsoverride list,matching the existing
oci.*/openai.*entries (_signing.pynowimports
requestsdirectly; it is already a transitiveocidep).PreparedRequestsigning; CHANGELOG entry under
[Unreleased].Verification
Unit: full OCI suite green.
All three transports under instance principal against a live OCI
GenAI endpoint — native SDK, V1 OpenAI-compat, Responses — all return
a valid completion with the patched code.
A/B certification across the token-expiry boundary. One long-lived
process held two signers side by side (so their tokens aged together)
and issued a request on each every ~4 min. Lane A = previous
in-place refresh; Lane B = this PR's rebuild-on-refresh:
Lane A begins failing at exactly the ~20-min token TTL and stays
failing (the reported production symptom); Lane B sails through the
expiry boundary. This is the controlled before/after the fix targets.