feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2096
Closed
velo wants to merge 1 commit into
Closed
feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2096velo wants to merge 1 commit into
velo wants to merge 1 commit into
Conversation
Signed-off-by: Marvin Froeder <marvin@datasqrl.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2096 +/- ##
============================================
+ Coverage 13.55% 13.62% +0.07%
- Complexity 835 840 +5
============================================
Files 605 605
Lines 17259 17264 +5
Branches 2084 2085 +1
============================================
+ Hits 2339 2352 +13
+ Misses 14700 14692 -8
Partials 220 220 ☔ View full report in Codecov by Sentry. |
Collaborator
Author
|
Reopened as #2098 (this PR's branch was deleted, so GitHub won't reopen it in place). #2098 carries the corrected fix: use the default credential chain — |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When the Vert.x server reads an S3-backed Iceberg table through the DuckDB query engine, the request fails with
HTTP 403 Forbidden.Root cause: in-cluster pods (EKS) authenticate to AWS via IRSA — the pod gets only a projected web-identity token (
AWS_ROLE_ARN+AWS_WEB_IDENTITY_TOKEN_FILE), no staticAWS_ACCESS_KEY_ID/SECRET. DuckDB'shttpfsextension does not perform the STS web-identity exchange on its own; it reads only env-var / config-file / instance credentials. With none of those present it sends an anonymous S3 request and S3 returns 403. (Flink writes to the same bucket succeed because Flink's S3 layer does honor the web-identity token.)This is DuckDB-specific. Production Iceberg reads go through Snowflake (which authenticates to S3 via its own storage integration), so this is not a customer-facing regression — but it blocks using the lightweight DuckDB engine for Iceberg-on-S3 in-cluster.
Change
Add an opt-in
use-credential-chainflag to the DuckDB engine config. When enabled, the connection init SQL additionally runs:PROVIDER credential_chaindelegates to the AWS SDK default provider chain, andstsis the web-identity provider that backs IRSA — so DuckDB obtains temporary credentials from the projected service-account token, exactly like Flink. The flag defaults tofalse, preserving today's behavior.Files
DuckDbExtensions.java— emitLOAD aws+ credential-chainCREATE SECRETwhen the flag is set; extracted a@VisibleForTesting String buildInitSql(extensionDir)overload so the init SQL is unit-testable without env-var juggling.JdbcConfig.DuckDbConfig— newuse-credential-chainboolean.packageSchema.json— register the new key.Dockerfile.duckdb-extensions—INSTALL awsso the extension is bundled in the image (thecredential_chainprovider requires it).documentation/docs/configuration-engine/duckdb.md— document the flag + the IRSA rationale.DuckDbExtensionsTest.java— new unit test covering default vs. credential-chain init SQL, statement ordering, and all-flags-on.Validation
DuckDbExtensionsTest— 4 tests, green (built under JDK 17).IcebergDeploymentITincloud-compilation(feature/iceberg-data-management), which currently fails its GraphQL assertion on exactly this 403; this PR is the upstream fix that unblocks it.Draft until a maintainer confirms the secret name / chain ordering convention and the image-bundle approach for the
awsextension.