Skip to content

feat(airflow): OIDC auth + package hardening (p08)#47

Merged
abir-oumghar merged 5 commits into
mainfrom
feat/airflow-core
May 11, 2026
Merged

feat(airflow): OIDC auth + package hardening (p08)#47
abir-oumghar merged 5 commits into
mainfrom
feat/airflow-core

Conversation

@abir-oumghar
Copy link
Copy Markdown
Contributor

@abir-oumghar abir-oumghar commented Apr 29, 2026

The content of #38 was already pushed directly to main via commit 162f490. This PR adds what was still missing from the integration-airflow branch.

Keycloak OIDC login — conditional OIDC auth for the Airflow webserver, activates when context.airflow.auth.enabled: true and method: oidc. Credentials come from a K8s secret.

Hardcoded DB password removed — replaced pass: airflow123 in the Helm values with data.metadataSecretName pointing to a secret provisioned by local-secrets-provider. Password no longer appears in the package.

specPatchByModule removed — the release was overriding the package ingress with a hardcoded host and silently dropping the second entry. Package handles it correctly, moved proxy-body-size: "0" annotation into the package directly.

Redundant addons release removedaddons/airflow.yaml deleted; airflow now deploys on-demand from the catalog only (same pattern as jupyterhub, superset, trinodb).

Spark scheduler RBAC fix — DAG tasks running under LocalExecutor execute in the scheduler pod under the airflow-{release}-main-scheduler ServiceAccount, which had no permissions on sparkapplications.k8s.io (was bound only to the spark SA). The spark-rbac chart now also binds spark-role to the namespace ServiceAccount group, restoring end-to-end Spark DAG execution.

- Add OIDC/Keycloak authentication support in Airflow webserver
  (conditional on context.airflow.auth.enabled + method=oidc)
- Inject DB credentials via secretKeyRef env vars instead of hardcoded
  user/pass in data.metadataConnection
- Set AIRFLOW__DATABASE__SQL_ALCHEMY_CONN from secret at runtime so the
  password never appears in the package definition
- Add nginx.ingress.kubernetes.io/proxy-body-size: "0" to ingress
- Change dagsSource default from 'local' (unimplemented) to 'git'
- Remove specPatchByModule from the release (package handles ingress correctly)
- Bump package tag to 2.9.3-p08
- Add .gitignore rules to exclude OIDC secret files from commits
The previous approach (extraEnv + placeholder user/pass) caused the
migration job to crash: the chart generates airflow-metadata secret from
data.metadataConnection, migration job uses that secret, and K8s resolves
duplicate env vars with first-wins — so placeholder credentials were used.

Fix: use data.metadataSecretName pointing to creds-airflow-metadata, a
secret provisioned by local-secrets-provider with the full SQLAlchemy
connection string. The chart skips generating its own secret entirely,
so all pods (including migration) get correct credentials with no
plaintext password in the package values.

- airflow: p08 -> p09, data.metadataSecretName: creds-airflow-metadata
- local-secrets-provider: p03 -> p04, add creds-airflow-metadata secret
URL: https://airflow-{{ .Release.spec.targetNamespace }}.{{ .Context.ingress.suffix }}
Alternate URL: https://airflow.{{ .Context.ingress.suffix }}
Default credentials: admin / admin
Sandbox Keycloak credentials: adm / adm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a good thing to hard code and make visible credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch removed in 4154fcf. The line now just points users to authenticate via Keycloak with the credentials provisioned in their sandbox

Replace the explicit 'adm / adm' line in the post-deploy usage text with
a generic instruction to authenticate via Keycloak using sandbox-provisioned
credentials. Bumps package to 2.9.3-p10.
Airflow is already exposed as an on-demand catalog package
(Workflows -> airflow in default-context.yaml), like jupyterhub,
superset and trinodb. The addons release was deploying it twice:
once at sandbox boot and once on user request via the okdp-server UI.
Keeping only the catalog entry aligns Airflow with the rest of the
user-facing services.

Verified locally: after removing the file, kubectl apply -f addons/
no longer recreates the airflow Release while the catalog entry and
other Releases remain intact.
Airflow DAGs under LocalExecutor run in the scheduler pod
(airflow-{release}-main-scheduler SA), not the spark SA —
SparkApplication submissions returned 403. Add
system:serviceaccounts:<namespace> as a second subject of the
RoleBinding so the scheduler SA inherits spark-role.

- spark-rbac chart: 1.0.0 -> 1.0.1
- spark-rbac package: p02 -> p03
- addon: pin to p03
@abir-oumghar abir-oumghar merged commit 7b98d2f into main May 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants