Add signed batch machine run context / ADR#1772
Conversation
MikeNeilson
left a comment
There was a problem hiding this comment.
I'm being a bit pushy here. AND I may in fact be wrong about what we can accomplish with Keycloak and thus may end up having to go this route (there's nothing fundmentally wrong with the design... other that too much reliance on properties, though that's an initial design problem not a new problem) but I think if we can get Keycloak to handle the load the transition is a lot simpler.
| final String email = claims.get(EMAIL_CLAIM, String.class); | ||
| return dao.createUser(preferredUserName, oidcPrincipal, givenName, email); | ||
| DataApiPrincipal dataApiPrincipal = dao.createUser(preferredUserName, oidcPrincipal, givenName, email); | ||
| BatchJobContext.prepareContext(ctx, dataApiPrincipal); |
There was a problem hiding this comment.
If the "batch" user is getting created randomly here, we have an issue, In the context of a batch process this should be a failure.
| public static final String REQUESTED_BY_ATTR = "BatchRequestedBy"; | ||
| public static final String DISPATCH_SOURCE_ATTR = "BatchDispatchSource"; | ||
|
|
||
| public static final String SECRET_PROPERTY = "cwms.dataapi.batch.jobContext.secret"; |
There was a problem hiding this comment.
why is Batch becoming an issuer of secrets?
My understanding is that Keycloak would provide the JWT and CDA would just consume it.
| if (username == null) { | ||
| return false; | ||
| } | ||
| String machineUsers = readSetting(MACHINE_USERS_PROPERTY, DEFAULT_MACHINE_USERS); |
There was a problem hiding this comment.
CDA should not need to know anything about this, the JWT provided by Keycloak can embed a claim of "machine-auth" or something and decisions made from that.
There was a problem hiding this comment.
Initially when I did this I was trying to figure out how to make this work with only one service account in keycloak, I now have it where we can have a service account per office
Now Batch is not the issuer of that context in the preferred path. Keycloak mints the normal access token for a per-office service account.
| throw new CwmsAuthException("Batch job context missing run_as_office", | ||
| HttpServletResponse.SC_UNAUTHORIZED); | ||
| } | ||
| ctx.attribute(RUN_AS_OFFICE_ATTR, office.toUpperCase(Locale.ROOT)); |
There was a problem hiding this comment.
While appropriate to put in to a future logging context... these shouldn't need to be part of the Request/Response attributes. Downstream components needing to know that is a definite code smell.
There was a problem hiding this comment.
CDA now consumes this from the validated Keycloak/OIDC access token via claims like machine_auth=true and run_as_office=<office>, rather than relying on Batch to issue a separate signed context.
The remaining CDA-side knowledge is just the claim contract and normal user/principal lookup. It still rejects unregistered machine principals instead of auto-creating them. Locally I verified this with per-office Keycloak service accounts that mint the claims directly into the access token.
|
|
||
| Provide CWMS Data API with a trusted batch run context for jobs that execute through a shared machine identity. | ||
|
|
||
| Batch runtimes will authenticate to CDA with a service account (via Keycloak). Each job will also provide a signed context token that identifies the authorized job launch context, including the office for which the scheduler or API approved the run. |
There was a problem hiding this comment.
Setting up additional signing is going to be difficult to get right, and will involve yet-more CDK changes. Granted they won't be difficult.
I would like us to first determine if we can, in some way, setup keycloak to be able to receive the required information (such as the office, and specific job identification) and return it back in the already signed JWT access token.
Some of the other information can, and should, still be provided but it doesn't really required singing it's just informational. The office would be used to set the session context (or by the future authorization system) to appropriately limit operations.
e.g. office + job identification can be readily tied to a policy to appropriately limit operations..
There was a problem hiding this comment.
Addressed this some here
…ime/cda-job-context # Conflicts: # docs/source/decisions/index.rst
|
CDA now takes a validated OIDC claims Stoped auto creating machine principle After checking the Keycloak paths, I do not think we can get dynamic per-job values like Keycloak has a way to do this via an SPI/provider, such as a custom protocol mapper or script mapper deployed into Keycloak. Keycloak SPI docs https://www.keycloak.org/docs/latest/server_development/index.html#_providers I do not think CWBI is likely to accept that operationally. If they did, we would need an owner and process for packaging, deploying, versioning, and maintaining that custom provider inside their Keycloak infrastructure. Using a signed I'm working right now to verify all this works completely on localhost and make sure it works as expected |
Summary
This is the parent PR for the CWMS Batch Events M2M auth and dynamic runtime work.
CDA now supports the preferred production shape when Keycloak can mint the batch machine context directly into the normal access token. CDA consumes validated OIDC claims such as
machine_authandrun_as_office, while still rejecting unregistered machine principals instead of auto-creating them.The signed
X-CWMS-Job-Contextpath remains in the design as a fallback for cases where Keycloak cannot provide the needed machine-run context without a custom extension. The current rollout avoids a Keycloak SPI by using office-scoped scheduler and runner service accounts.Related Draft PRs
CDA_BEARER_TOKENandBATCH_JOB_CONTEXT_TOKEN.Blocked by repository permissions / fork policy:
cwms-batchbatch-dynamic-runtime/cdk-runtime-jobdefscwbi-dev-infrastructure/cwms-batch.airflowbatch-dynamic-runtime/airflow-batch-events-authcwbi-dev-infrastructure/airflow.Diagrams
System Overview
Editable source:
batch-m2m-overview.drawioEnd User UI Flow
Editable source:
batch-ui-job-flow.drawioAirflow Scheduled Flow
Editable source:
batch-airflow-scheduler-flow.drawioValidation
CDA:
JAVA_HOME=C:\Program Files\Java\jdk-21./gradlew.bat :cwms-data-api:test --tests cwms.cda.security.BatchJobContextTest --no-daemon --stacktraceJAVA_HOME=C:\Program Files\Java\jdk-21./gradlew.bat :cwms-data-api:integrationTests --tests cwms.cda.api.auth.OpenIdConnectTestIT --no-daemon --stacktraceclient_credentialstokens can carrymachine_auth/run_as_office, CDA rejects unregistered machine principals, and CDA accepts registered office-scoped machine principals.Cross-repo/local E2E evidence:
timeoutMinutes=1and a 90-second sleep failed locally withLocal executor timeout after 60 seconds.Checklist