feat: Add hudi-azure-bundle#18472
Conversation
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Style & Readability Review — One code reuse issue: URI parsing and container validation logic is duplicated between readObject() and writeObject() methods.
| } else { | ||
| logger.error("Error reading JSON config file: {}", filePath, e); | ||
| } | ||
| return Option.empty(); |
There was a problem hiding this comment.
🤖 nit: this URI parsing and container validation (lines 288–297) is duplicated from readObject(). Could you extract into a private helper method?
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for contributing! The overall structure of AzureStorageLockClient is clean and follows the S3/GCS pattern well. There's one functional bug worth addressing in the ETag handling before merging: the quote-stripping in readCurrentLockFile creates an inconsistency with the ETag format expected by BlobRequestConditions.setIfMatch(), which will break the "take over expired lock" scenario. Details in the inline comment.
| String eTag = response.getHeaders().getValue("ETag"); | ||
| if (eTag != null) { | ||
| // Azure returns ETags wrapped in quotes, remove them | ||
| eTag = eTag.replaceAll("^\"|\"$", ""); |
There was a problem hiding this comment.
🤖 This quote-stripping creates an ETag format inconsistency that breaks conditional writes for the expired-lock takeover path. BlockBlobItem.getETag() (used in createOrUpdateLockFileInternal) returns the ETag WITH surrounding double-quotes (e.g. "0x8D4A"), which is exactly what BlobRequestConditions.setIfMatch() expects — it passes the value directly to the If-Match header, so the quotes must be present for a valid HTTP conditional request. By stripping them here, any setIfMatch call using an ETag sourced from readCurrentLockFile sends If-Match: 0x8D4A (unquoted), while a call using an ETag from a write sends If-Match: "0x8D4A" (correctly quoted). Removing the replaceAll and keeping the raw header value would align both paths and avoid this failure mode.
260619a to
e2d8a0a
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR turns hudi-azure-bundle into a proper shaded fat jar with Netty/Reactor relocation, mirroring the AWS and GCP bundles. One question on a build-config divergence from the sibling bundles worth a second look. Please take a look at the inline comment, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of spots where a short explanatory comment would save future readers some head-scratching.
| <resource> | ||
| <directory>src/main/resources</directory> | ||
| </resource> | ||
| <resource> |
There was a problem hiding this comment.
🤖 Is there a reason to add src/test/resources to the main <resources> block? Maven's standard layout already exposes it for the test phase, and adding it here will package any test resources into the production shaded jar. The sibling hudi-aws-bundle and hudi-gcp-bundle only declare src/main/resources. Could you drop this entry unless something specifically requires it?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| <exclude>META-INF/*.RSA</exclude> | ||
| <exclude>META-INF/services/javax.*</exclude> | ||
| <exclude>**/*.proto</exclude> | ||
| <exclude>hbase-webapps/**</exclude> |
There was a problem hiding this comment.
🤖 nit: it might be worth adding a brief inline comment here noting which transitive dependency pulls in hbase-webapps/ — without it, a future reader will wonder why HBase UI resources are being excluded from an Azure bundle.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| <directory>src/main/resources</directory> | ||
| </resource> | ||
| <resource> | ||
| <directory>src/test/resources</directory> |
There was a problem hiding this comment.
🤖 nit: including src/test/resources in the main build resources is unusual — could you add a comment explaining why it's needed here (e.g. which runtime config files live there that the bundle requires)?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18472 +/- ##
============================================
- Coverage 68.14% 68.13% -0.01%
+ Complexity 29105 29098 -7
============================================
Files 2518 2518
Lines 141221 141221
Branches 17534 17534
============================================
- Hits 96235 96226 -9
- Misses 37070 37075 +5
- Partials 7916 7920 +4
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
#18471
When running Hudi Spark jobs on Azure (ADLS Gen2), the Azure Storage SDK's Netty and Reactor dependencies conflict with Spark's bundled Netty, causing runtime
NoSuchMethodErrorandStacklessClosedChannelExceptionduring lock acquisition. Specifically,reactor-netty-httpcallsHttpClientCodec.<init>(HttpDecoderConfig, boolean, boolean)— a constructor that only exists in Netty 4.1.94+ — but Spark's older NettyHttpClientCodecis loaded instead. This makes the Azure-basedStorageBasedLockProvider(added in #17951) unusable in Spark environments.Additionally, there is no pre-built bundle for Azure dependencies analogous to
hudi-aws-bundleandhudi-gcp-bundle, forcing users to manually manage Azure SDK, Reactor, and Netty jars on the classpath.Summary and Changelog
hudi-azure-bundlemodule (packaging/hudi-azure-bundle) into a shaded fat jar that packages all Azure-specific dependencies (Azure SDK, Azure identity deps, Reactor + reactor-netty, Netty, Reactive Streams) into a single self-contained artifact, following the same pattern ashudi-aws-bundleandhudi-gcp-bundle.io.netty.*,io.projectreactor.*,reactor.*, andorg.reactivestreams.*are relocated underorg.apache.hudi.*to eliminate classpath conflicts with Spark's bundled Netty.com.nimbusds:*,net.minidev:*) soDefaultAzureCredentialand related auth providers work out-of-the-box.hbase-webapps/**filter exclude,src/test/resourcesresource directory, and an Avro compile dependency.Note: This PR was rebased onto current master. The original
AzureStorageLockClientcommits were dropped because master already has an implementation of that class via #17951; this PR now contains only the bundle module additions on top of master's existinghudi-azure-bundleskeleton.Impact
StorageBasedLockProviderto work reliably on Azure/ADLS Gen2 in Spark environments.reactor-netty,reactor-core,reactive-streams, andnetty-resolver-dnsjars on the Spark classpath — everything is self-contained in the bundle.Risk Level
Low
packaging/hudi-azure-bundle/pom.xml(the module skeleton master already added) to include and relocate the extra dependencies.Documentation Update
none
Contributor's checklist