Skip to content

feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2098

Draft
velo wants to merge 1 commit into
mainfrom
feat/duckdb-s3-credential-chain
Draft

feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2098
velo wants to merge 1 commit into
mainfrom
feat/duckdb-s3-credential-chain

Conversation

@velo
Copy link
Copy Markdown
Collaborator

@velo velo commented May 26, 2026

Reopens/supersedes #2096 (that PR's branch was deleted so it can't be reopened in place).

Problem

When the Vert.x server reads an S3-backed Iceberg table through the DuckDB engine from an in-cluster pod (EKS IRSA), the query fails with HTTP 403 Forbidden / AccessDenied — No credentials are provided.

DuckDB's httpfs only auto-detects env-var credentials (AWS_ACCESS_KEY_ID/SECRET/SESSION_TOKEN) or an explicit credential secret. It does not use the projected web-identity token that backs IRSA, so on an IRSA-only pod it sends an anonymous request → 403. (Flink writes succeed because Flink's S3 layer does the web-identity exchange.)

This was proven directly inside a running server pod: anonymous iceberg_scan('s3://…')403 No credentials; after adding the credential-chain secret below, the same read returned 1,810,576 rows.

Change

Opt-in use-credential-chain flag on the DuckDB engine. When enabled, the connection init SQL adds:

LOAD aws;
CREATE OR REPLACE SECRET sqrl_s3_credential_chain (TYPE S3, PROVIDER credential_chain);

PROVIDER credential_chain (with no explicit CHAIN) delegates to the AWS SDK default provider chain, which includes the web-identity provider used by IRSA — so DuckDB obtains temporary creds from the projected service-account token, exactly like Flink.

Note: an explicit CHAIN 'env;config;sts;…' does not work — DuckDB maps sts to AssumeRole and rejects it without an ASSUME_ROLE_ARN. The default chain is required. (This is the correction vs the original #2096.)

The flag defaults to false, preserving today's behavior.

Files

  • DuckDbExtensions.java — emit LOAD aws + default-chain CREATE SECRET when the flag is set; @VisibleForTesting buildInitSql(extensionDir) overload for unit testing.
  • JdbcConfig.DuckDbConfig — new use-credential-chain boolean.
  • packageSchema.json — register the new key.
  • Dockerfile.duckdb-extensionsINSTALL aws so the extension is bundled (the credential_chain provider requires it; confirmed it is not bundled today).
  • documentation/docs/configuration-engine/duckdb.md — document the flag + IRSA rationale.
  • DuckDbExtensionsTest.java — covers default vs credential-chain init SQL, ordering, all-flags, and that no explicit CHAIN is emitted.

Validation

  • DuckDbExtensionsTest — 4 tests green (JDK 17).
  • The credential_chain secret was validated live in a server pod (IRSA-only): private-bucket iceberg_scan returned 1.8M rows with the secret, 403 without it.
  • Downstream: IcebergDeploymentIT in cloud-compilation has the full datagen→Iceberg(Glue/S3FileIO/s3://)→Flink-write path green; this PR closes the read-side gap. Setting use-credential-chain: true on the duckdb engine turns that IT's GraphQL read green.

Draft until a maintainer confirms the secret name + bundling the aws extension.

Signed-off-by: Marvin Froeder <marvin@datasqrl.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 13.62%. Comparing base (33ba2e3) to head (df10424).
⚠️ Report is 8 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../main/java/com/datasqrl/util/DuckDbExtensions.java 83.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2098      +/-   ##
============================================
+ Coverage     13.55%   13.62%   +0.07%     
- Complexity      835      840       +5     
============================================
  Files           605      605              
  Lines         17259    17264       +5     
  Branches       2084     2085       +1     
============================================
+ Hits           2339     2352      +13     
+ Misses        14700    14692       -8     
  Partials        220      220              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mbroecheler
Copy link
Copy Markdown
Contributor

@velo This change makes sense to me.
@ferenc-csaky One thing we want to look into is how we support the different clouds with this one to load things dynamically.

Note, that this is currently blocked on a DuckDB fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants