feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2098
Draft
velo wants to merge 1 commit into
Draft
feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2098velo wants to merge 1 commit into
velo wants to merge 1 commit into
Conversation
Signed-off-by: Marvin Froeder <marvin@datasqrl.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2098 +/- ##
============================================
+ Coverage 13.55% 13.62% +0.07%
- Complexity 835 840 +5
============================================
Files 605 605
Lines 17259 17264 +5
Branches 2084 2085 +1
============================================
+ Hits 2339 2352 +13
+ Misses 14700 14692 -8
Partials 220 220 ☔ View full report in Codecov by Sentry. |
Contributor
|
@velo This change makes sense to me. Note, that this is currently blocked on a DuckDB fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reopens/supersedes #2096 (that PR's branch was deleted so it can't be reopened in place).
Problem
When the Vert.x server reads an S3-backed Iceberg table through the DuckDB engine from an in-cluster pod (EKS IRSA), the query fails with
HTTP 403 Forbidden / AccessDenied — No credentials are provided.DuckDB's
httpfsonly auto-detects env-var credentials (AWS_ACCESS_KEY_ID/SECRET/SESSION_TOKEN) or an explicit credential secret. It does not use the projected web-identity token that backs IRSA, so on an IRSA-only pod it sends an anonymous request → 403. (Flink writes succeed because Flink's S3 layer does the web-identity exchange.)This was proven directly inside a running server pod: anonymous
iceberg_scan('s3://…')→403 No credentials; after adding the credential-chain secret below, the same read returned 1,810,576 rows.Change
Opt-in
use-credential-chainflag on the DuckDB engine. When enabled, the connection init SQL adds:LOAD aws; CREATE OR REPLACE SECRET sqrl_s3_credential_chain (TYPE S3, PROVIDER credential_chain);PROVIDER credential_chain(with no explicitCHAIN) delegates to the AWS SDK default provider chain, which includes the web-identity provider used by IRSA — so DuckDB obtains temporary creds from the projected service-account token, exactly like Flink.The flag defaults to
false, preserving today's behavior.Files
DuckDbExtensions.java— emitLOAD aws+ default-chainCREATE SECRETwhen the flag is set;@VisibleForTesting buildInitSql(extensionDir)overload for unit testing.JdbcConfig.DuckDbConfig— newuse-credential-chainboolean.packageSchema.json— register the new key.Dockerfile.duckdb-extensions—INSTALL awsso the extension is bundled (thecredential_chainprovider requires it; confirmed it is not bundled today).documentation/docs/configuration-engine/duckdb.md— document the flag + IRSA rationale.DuckDbExtensionsTest.java— covers default vs credential-chain init SQL, ordering, all-flags, and that no explicitCHAINis emitted.Validation
DuckDbExtensionsTest— 4 tests green (JDK 17).credential_chainsecret was validated live in a server pod (IRSA-only): private-bucketiceberg_scanreturned 1.8M rows with the secret, 403 without it.IcebergDeploymentITin cloud-compilation has the full datagen→Iceberg(Glue/S3FileIO/s3://)→Flink-write path green; this PR closes the read-side gap. Settinguse-credential-chain: trueon the duckdb engine turns that IT's GraphQL read green.Draft until a maintainer confirms the secret name + bundling the
awsextension.