Skip to content

feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2096

Closed
velo wants to merge 1 commit into
mainfrom
feat/duckdb-s3-credential-chain
Closed

feat: DuckDB S3 credential-chain support for IRSA-backed Iceberg reads#2096
velo wants to merge 1 commit into
mainfrom
feat/duckdb-s3-credential-chain

Conversation

@velo
Copy link
Copy Markdown
Collaborator

@velo velo commented May 26, 2026

Problem

When the Vert.x server reads an S3-backed Iceberg table through the DuckDB query engine, the request fails with HTTP 403 Forbidden.

Root cause: in-cluster pods (EKS) authenticate to AWS via IRSA — the pod gets only a projected web-identity token (AWS_ROLE_ARN + AWS_WEB_IDENTITY_TOKEN_FILE), no static AWS_ACCESS_KEY_ID/SECRET. DuckDB's httpfs extension does not perform the STS web-identity exchange on its own; it reads only env-var / config-file / instance credentials. With none of those present it sends an anonymous S3 request and S3 returns 403. (Flink writes to the same bucket succeed because Flink's S3 layer does honor the web-identity token.)

This is DuckDB-specific. Production Iceberg reads go through Snowflake (which authenticates to S3 via its own storage integration), so this is not a customer-facing regression — but it blocks using the lightweight DuckDB engine for Iceberg-on-S3 in-cluster.

Change

Add an opt-in use-credential-chain flag to the DuckDB engine config. When enabled, the connection init SQL additionally runs:

LOAD aws;
CREATE OR REPLACE SECRET sqrl_s3_credential_chain
  (TYPE S3, PROVIDER credential_chain, CHAIN 'env;config;sts;sso;instance;process');

PROVIDER credential_chain delegates to the AWS SDK default provider chain, and sts is the web-identity provider that backs IRSA — so DuckDB obtains temporary credentials from the projected service-account token, exactly like Flink. The flag defaults to false, preserving today's behavior.

Files

  • DuckDbExtensions.java — emit LOAD aws + credential-chain CREATE SECRET when the flag is set; extracted a @VisibleForTesting String buildInitSql(extensionDir) overload so the init SQL is unit-testable without env-var juggling.
  • JdbcConfig.DuckDbConfig — new use-credential-chain boolean.
  • packageSchema.json — register the new key.
  • Dockerfile.duckdb-extensionsINSTALL aws so the extension is bundled in the image (the credential_chain provider requires it).
  • documentation/docs/configuration-engine/duckdb.md — document the flag + the IRSA rationale.
  • DuckDbExtensionsTest.java — new unit test covering default vs. credential-chain init SQL, statement ordering, and all-flags-on.

Validation

  • DuckDbExtensionsTest — 4 tests, green (built under JDK 17).
  • google-java-format + license strict-check pass.
  • End-to-end S3-read validation is exercised downstream by IcebergDeploymentIT in cloud-compilation (feature/iceberg-data-management), which currently fails its GraphQL assertion on exactly this 403; this PR is the upstream fix that unblocks it.

Draft until a maintainer confirms the secret name / chain ordering convention and the image-bundle approach for the aws extension.

Signed-off-by: Marvin Froeder <marvin@datasqrl.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 13.62%. Comparing base (33ba2e3) to head (9298024).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../main/java/com/datasqrl/util/DuckDbExtensions.java 83.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2096      +/-   ##
============================================
+ Coverage     13.55%   13.62%   +0.07%     
- Complexity      835      840       +5     
============================================
  Files           605      605              
  Lines         17259    17264       +5     
  Branches       2084     2085       +1     
============================================
+ Hits           2339     2352      +13     
+ Misses        14700    14692       -8     
  Partials        220      220              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@velo velo closed this May 26, 2026
@velo velo deleted the feat/duckdb-s3-credential-chain branch May 26, 2026 11:54
@velo
Copy link
Copy Markdown
Collaborator Author

velo commented May 26, 2026

Reopened as #2098 (this PR's branch was deleted, so GitHub won't reopen it in place). #2098 carries the corrected fix: use the default credential chain — CREATE SECRET (TYPE S3, PROVIDER credential_chain) with no explicit CHAIN — since DuckDB rejects CHAIN '...sts...' without an ASSUME_ROLE_ARN. Validated live in a server pod: private-bucket iceberg_scan returned 1.8M rows with the secret, 403 without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant