Skip to content

feat: Add Azure-based storage lock#17951

Merged
yihua merged 14 commits into
apache:masterfrom
chrevanthreddy:master
Apr 18, 2026
Merged

feat: Add Azure-based storage lock#17951
yihua merged 14 commits into
apache:masterfrom
chrevanthreddy:master

Conversation

@chrevanthreddy
Copy link
Copy Markdown
Contributor

@chrevanthreddy chrevanthreddy commented Jan 19, 2026

Describe the issue this Pull Request addresses

Adds Azure Blob Storage-based distributed lock provider for Hudi tables on ADLS Gen2 and Azure Blob Storage, extending the existing S3 and GCS storage-based lock implementations (RFC-91).

Summary and Changelog

Add azure lock storage provider for ADLS. Extension to S3/GS object storage lock providers.

New module: hudi-azure

  • AzureStorageLockClientStorageLockClient implementation using Azure Blob conditional requests (ETag-based optimistic concurrency control)
    • If-None-Match: * for lock creation (fail if blob already exists)
    • If-Match: <etag> for lock renewal/expiry (fail if modified by another writer)
  • AzureCredentialFactory — Credential resolution with precedence: connection string → SAS token → user-assigned managed identity → service principal → DefaultAzureCredential (lazy singleton via holder pattern)
  • AzureStorageLockConfigConfigProperty-based configuration extending HoodieConfig, with sinceVersion("1.2.0") on all new keys under hoodie.write.lock.azure.*
  • Supported URI schemes: abfs://, abfss://, wasb://, wasbs://, https://, http://
  • DFS-to-Blob endpoint conversion (dfs.core.windows.netblob.core.windows.net) since lock operations use the Blob Storage REST API
  • BlobServiceClient caching via ConcurrentHashMap for secondary endpoints (audit file operations)
  • SDK retries disabled — Hudi manages retries at the StorageBasedLockProvider level
  • ETag canonicalization (canonicalizeEtag) ensures consistent double-quoted format across read/write paths, with fail-fast on null/empty/malformed ETags

New module: packaging/hudi-azure-bundle

  • Shaded fat JAR including hudi-azure, Azure SDK, Reactor, and Netty with relocations to avoid classpath conflicts

New dependencies

Dependency Version Scope
azure-storage-blob 12.26.0 compile
azure-identity 1.12.2 compile
Jackson overrides 2.13.5 test only
Testcontainers (Azurite) inherited test only

Modified: StorageSchemes

  • Registers AzureStorageLockClient for wasb, wasbs, abfs, abfss schemes

New tests are added:

  • Unit tests (TestAzureStorageLockClient): lock create/update with If-None-Match/If-Match precondition verification via ArgumentCaptor, ETag fallback from BlockBlobItem, ETag canonicalization (null/empty/malformed), HTTP error code mapping (412→ACQUIRED_BY_OTHERS, 409/429/5xx→UNKNOWN_ERROR, 400→rethrown), BlobServiceClient caching for secondary endpoints, readObject/writeObject success and failure paths, constructor validation for multiple URI schemes
  • URI parsing tests (TestAzureStorageLockClientUriParsing): abfs://, abfss://, wasb://, wasbs://, https://, http:// schemes with DFS-to-Blob host conversion, plus negative tests for missing scheme/authority/path, invalid formats, unsupported schemes, empty containers, and edge cases (single-segment paths, deep paths, special characters, hyphenated containers)
  • Integration tests (ITAzureStorageLockClientAzurite): end-to-end create → read → wrong-ETag-update flow against Azurite (Azure Storage emulator) via Testcontainers. Run with -Pazure-integration-tests
  • Lock provider base tests (TestAzureStorageBasedLockProvider): extends StorageBasedLockProviderTestBase (disabled by default, requires Azurite)
  • TestStorageSchemes: updated assertions for abfs/wasb lock provider registration

Impact

Adds new feature for azure to existing LP providers

  • Enables storage-based locking (OCC) for Hudi tables on Azure, matching the existing S3 and GCS lock provider capabilities
  • No changes to existing lock providers or core lock logic
  • New module only — no modifications to existing modules beyond StorageSchemes registration and root pom.xml module additions

Risk Level

low — additive feature in a new module. Core lock protocol (StorageBasedLockProvider) is unchanged.

Documentation Update

New config keys documented via ConfigProperty.withDocumentation():

  • hoodie.write.lock.azure.connection.string
  • hoodie.write.lock.azure.sas.token
  • hoodie.write.lock.azure.managed.identity.client.id
  • hoodie.write.lock.azure.client.tenant.id
  • hoodie.write.lock.azure.client.id
  • hoodie.write.lock.azure.client.secret

Contributor's checklist

  • [x ] Read through contributor's guide
  • [x ] Enough context is provided in the sections above
  • [x ] Adequate tests were added if applicable

@github-actions github-actions Bot added the size:XL PR with lines of changes > 1000 label Jan 19, 2026
@chrevanthreddy
Copy link
Copy Markdown
Contributor Author

chrevanthreddy commented Jan 20, 2026

26/01/20 03:44:01 INFO StorageBasedLockProvider: Owner 8d5f9d59-b590-46bf-b0f1-a6f492e179f6: Lock file path abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.locks/table_lock.json, Thread Thread[#1966,SparkConnectExecuteThread_opId=a36945a5-b174-400c-be28-17c37fac9988,5,main], Storage based lock state FAILED_TO_ACQUIRE, Lock already held by a47d30ef-f27f-4554-ab54-2ee872f5a806
26/01/20 03:44:02 INFO ActiveTimelineV2: Created new file for toInstant: abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/timeline/20260120034324550_20260120034401813.deltacommit
26/01/20 03:44:02 INFO ActiveTimelineV2: Completed [==>20260120034324550__deltacommit__INFLIGHT]
26/01/20 03:44:02 INFO HoodieTableConfig: Loading table properties from abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/hoodie.properties
26/01/20 03:44:02 INFO HoodieSparkIndexClient: Registering or updating index: column_stats of type: column_stats
26/01/20 03:44:02 INFO HoodieIndexUtils: Registering index column_stats of using column_stats
26/01/20 03:44:02 INFO SessionHolder: Session with userId: uid and sessionId: eb178038-e69c-46fb-8bb8-3e4920be4098 accessed,time 1768880642806 ms.
26/01/20 03:44:02 INFO SessionHolder: Session with userId: uid and sessionId: eb178038-e69c-46fb-8bb8-3e4920be4098 accessed,time 1768880642806 ms.
26/01/20 03:44:02 INFO SessionHolder: Session with userId: uid and sessionId: eb178038-e69c-46fb-8bb8-3e4920be4098 accessed,time 1768880642806 ms.
26/01/20 03:44:03 INFO StorageBasedLockProvider: Owner 8d5f9d59-b590-46bf-b0f1-a6f492e179f6: Lock file path abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.locks/table_lock.json, Thread Thread[#1966,SparkConnectExecuteThread_opId=a36945a5-b174-400c-be28-17c37fac9988,5,main], Storage based lock state FAILED_TO_ACQUIRE, Lock already held by a47d30ef-f27f-4554-ab54-2ee872f5a806
26/01/20 03:44:03 INFO FSUtils: Removed directory at abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.temp/20260120034324550
26/01/20 03:44:03 INFO HoodieHeartbeatClient: Stopping heartbeat for instant 20260120034324550
26/01/20 03:44:03 INFO HoodieHeartbeatClient: Stopped heartbeat for instant 20260120034324550
26/01/20 03:44:03 INFO HeartbeatUtils: Deleted the heartbeat for instant 20260120034324550
26/01/20 03:44:03 INFO HoodieHeartbeatClient: Deleted heartbeat file for instant 20260120034324550
26/01/20 03:44:03 INFO BaseHoodieWriteClient: Committed 20260120034324550
26/01/20 03:44:03 INFO TransactionManager: State change ending for action instant Option{val=[==>20260120034324550__deltacommit__INFLIGHT]}
26/01/20 03:44:03 INFO StorageBasedLockProvider: Owner a47d30ef-f27f-4554-ab54-2ee872f5a806: Lock file path abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.locks/table_lock.json, Thread Thread[#1965,SparkConnectExecuteThread_opId=2f499ea1-de10-4b91-a08f-fdd22ba1a72e,5,main], Storage based lock state RELEASED
26/01/20 03:44:03 INFO LockManager: Released connection created for acquiring lock
26/01/20 03:44:03 INFO TransactionManager: State change ended for action instant Option{val=[==>20260120034324550__deltacommit__INFLIGHT]}
26/01/20 03:44:03 INFO MapPartitionsRDD: Removing RDD 572 from persistence list
26/01/20 03:44:03 INFO BlockManager: Removing RDD 572
26/01/20 03:44:03 INFO MapPartitionsRDD: Removing RDD 591 from persistence list
26/01/20 03:44:03 INFO BlockManager: Removing RDD 591
26/01/20 03:44:03 INFO BaseHoodieWriteClient: Start to archive synchronously.
26/01/20 03:44:03 INFO TransactionManager: State change starting for Optional.empty with latest completed action instant Optional.empty
26/01/20 03:44:03 INFO LockManager: LockProvider org.apache.hudi.client.transaction.lock.StorageBasedLockProvider
26/01/20 03:44:03 INFO StorageBasedLockProvider: Instantiated new storage-based lock provider, owner: 21830948-aa12-497a-ad44-96490bd668df, lockfilePath: abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.locks/table_lock.json
26/01/20 03:44:04 INFO StorageBasedLockProvider: Owner 21830948-aa12-497a-ad44-96490bd668df: Lock file path abfss://<container>.dfs.core.windows.net/lakehousetables/hudi_nbcc_test_v4/.hoodie/.locks/table_lock.json, Thread Thread[#1965,SparkConnectExecuteThread_opId=2f499ea1-de10-4b91-a08f-fdd22ba1a72e,5,main], Storage based lock state ACQUIRED

@vinothchandar
Copy link
Copy Markdown
Member

@chrevanthreddy can we fix the PR description.. it ll fail validation.

I approved the workflows to run now

@chrevanthreddy chrevanthreddy changed the title Add hudi azure based storage lock feat: Add hudi azure based storage lock Jan 26, 2026
@chrevanthreddy
Copy link
Copy Markdown
Contributor Author

Changed title, added description here too.

@apache apache deleted a comment from hudi-bot Feb 10, 2026
@chrevanthreddy
Copy link
Copy Markdown
Contributor Author

@vinothchandar @bhasudha I merged this PR and Alex PR together. Testing looks good to me. Let me know the next steps.

@vinothchandar
Copy link
Copy Markdown
Member

@chrevanthreddy I will queue this up for review..

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! I left a few inline comments around correctness and naming.

Comment on lines +304 to +308
public Option<String> readObject(String filePath, boolean checkExistsFirst) {
try {
AzureLocation location = parseAzureLocation(filePath);
AzureLocation lockLocation = parseAzureLocation(lockFileUri);
BlobServiceClient svc = location.blobEndpoint.equals(lockLocation.blobEndpoint)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both readObject and writeObject call parseAzureLocation and potentially createDefaultBlobServiceClient() on every invocation. If these are called frequently (e.g., audit logging), this creates a new BlobServiceClient each time for non-matching endpoints. Have you considered caching the secondary service client?

public Option<String> readObject(String filePath, boolean checkExistsFirst) {
try {
AzureLocation location = parseAzureLocation(filePath);
AzureLocation lockLocation = parseAzureLocation(lockFileUri);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock file URI has already been parsed in the constructor. Could we reuse that instead of reparsing?

Comment on lines +219 to +220
logger.error("OwnerId: {}, Unexpected error while writing lock file: {}", ownerId, lockFileUri, e);
return Pair.of(LockUpsertResult.UNKNOWN_ERROR, Option.empty());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other exceptions, like S3 and GCS-based implementation, it should let the exception throw without return.

public boolean writeObject(String filePath, String content) {
try {
AzureLocation location = parseAzureLocation(filePath);
AzureLocation lockLocation = parseAzureLocation(lockFileUri);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, could we avoid this?

Comment thread hudi-azure/pom.xml
Comment thread packaging/hudi-azure-bundle/pom.xml
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 53.01205% with 117 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.84%. Comparing base (ec04479) to head (d2d7cb2).
⚠️ Report is 1138 commits behind head on master.

Files with missing lines Patch % Lines
...azure/transaction/lock/AzureStorageLockClient.java 67.72% 49 Missing and 12 partials ⚠️
...org/apache/hudi/config/AzureStorageLockConfig.java 0.00% 37 Missing ⚠️
...hudi/azure/credentials/AzureCredentialFactory.java 0.00% 19 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17951      +/-   ##
============================================
+ Coverage     61.43%   68.84%   +7.41%     
- Complexity    23082    28333    +5251     
============================================
  Files          2108     2467     +359     
  Lines        127636   135839    +8203     
  Branches      14534    16480    +1946     
============================================
+ Hits          78409    93521   +15112     
+ Misses        42873    34914    -7959     
- Partials       6354     7404    +1050     
Flag Coverage Δ
common-and-other-modules 44.66% <53.01%> (?)
hadoop-mr-java-client 44.77% <100.00%> (?)
spark-client-hadoop-common 48.41% <100.00%> (?)
spark-java-tests 48.92% <100.00%> (?)
spark-scala-tests 45.44% <100.00%> (?)
utilities 38.18% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...n/java/org/apache/hudi/storage/StorageSchemes.java 85.71% <100.00%> (ø)
...hudi/azure/credentials/AzureCredentialFactory.java 0.00% <0.00%> (ø)
...org/apache/hudi/config/AzureStorageLockConfig.java 0.00% <0.00%> (ø)
...azure/transaction/lock/AzureStorageLockClient.java 67.72% <67.72%> (ø)

... and 4528 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions on code, the security..

I am relying on @yihua to review the logic. Ethan - please lmk if you'd like me to take over

Comment on lines +154 to +172
private static Functions.Function1<AzureLocation, BlobServiceClient> createDefaultBlobServiceClient() {
return (location) -> {
Properties props = location.props;
BlobServiceClientBuilder builder = new BlobServiceClientBuilder();
configureAzureClientOptions(builder, props);

String connectionString = props == null ? null : props.getProperty(AZURE_CONNECTION_STRING);
if (connectionString != null && !connectionString.trim().isEmpty()) {
return builder.connectionString(connectionString).buildClient();
}

builder.endpoint(location.blobEndpoint);
String sasToken = props == null ? null : props.getProperty(AZURE_SAS_TOKEN);
if (sasToken != null && !sasToken.trim().isEmpty()) {
String cleaned = sasToken.startsWith("?") ? sasToken.substring(1) : sasToken;
return builder.credential(new AzureSasCredential(cleaned)).buildClient();
}

return builder.credential(new DefaultAzureCredentialBuilder().build()).buildClient();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use different credential builder here for typical azure production setups? (not very fam myself there).

Asking since IIRC HoodieAWSConfig supports role ARN, access key, secret key, session token etc..

Comment thread hudi-azure/pom.xml
Comment thread packaging/hudi-azure-bundle/pom.xml
Comment thread pom.xml Outdated
.key(AZURE_BASED_LOCK_PROPERTY_PREFIX + "sas.token")
.noDefaultValue()
.markAdvanced()
.withDocumentation("For Azure based lock provider, optional SAS token used for "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document the minimum permissions required on the token. Read, Write, Create?

Comment thread pom.xml Outdated
@yihua
Copy link
Copy Markdown
Contributor

yihua commented Mar 26, 2026

Hi @chrevanthreddy any update on addressing the comments in this PR?

@chrevanthreddy chrevanthreddy requested a review from yihua April 15, 2026 14:19
Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for addressing the feedback! All three prior issues from my review are resolved: null ETag in readCurrentLockFile is now validated via the new canonicalizeEtag() helper, the TOCTOU-prone ADLSStorageLockClient is deleted, and DefaultAzureCredential uses lazy initialization. The ETag canonicalization logic correctly handles null, empty, quoted, unquoted, and malformed values — nice touch adding the quote normalization. The switch to BlobUrlParts.parse() for HTTP(S) URIs and getStringWithAltKeys for config lookups are clean improvements. One note: @vinothchandar's question about the parquet version downgrade in pom.xml is still open — please address that in the thread.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

CodeRabbit Walkthrough: This pull request introduces Azure Blob Storage and ADLS support for Hudi's distributed locking mechanism. It adds a new hudi-azure module with credential management, lock file operations via conditional writes, configuration properties, comprehensive tests, and a shaded bundle for packaging. Azure storage schemes are registered with the new lock implementation.

Greptile Summary: This PR adds Azure Blob Storage / ADLS Gen2 based distributed locking for Hudi tables, mirroring the approach taken for S3 and GCS. It introduces AzureStorageLockClient (using Azure Blob conditional writes with ETag-based If-Match / If-None-Match), AzureStorageLockConfig for auth/connection configuration, AzureCredentialFactory for credential resolution, and wires the abfs, abfss, wasb, wasbs URI schemes into StorageSchemes. The implementation is well-structured with comprehensive unit tests and an optional Azurite integration test.

Key changes:

  • New AzureStorageLockClient implementing StorageLockClient with conditional-write semantics (412 → ACQUIRED_BY_OTHERS, 409/429/5xx → UNKNOWN_ERROR)
  • Supports five auth modes: connection string, SAS token, managed identity, service principal, DefaultAzureCredential
  • URI parsing for abfs[s]://, wasb[s]://, and https://http:// (Azurite) schemes
  • StorageSchemes enum updated to register Azure lock client class for wasb, wasbs, abfs, abfss
  • One logic concern: readCurrentLockFile can throw HoodieLockException for null/malformed ETags, but StorageBasedLockProvider doesn't catch that exception around its call — callers should be aware
  • One unused public constant: AZURE_SAS_TOKEN in AzureStorageLockClient duplicates AzureStorageLockConfig.AZURE_SAS_TOKEN.key() and is not used internally

Greptile Confidence Score: 4/5
Safe to merge with one targeted fix recommended: readCurrentLockFile should handle HoodieLockException consistently

The implementation is solid and well-tested. The main concern is the exception-handling inconsistency in readCurrentLockFile — throwing HoodieLockException for null/malformed ETags instead of returning UNKNOWN_ERROR (as tryUpsertLockFile does), while the StorageBasedLockProvider caller has no catch around that call. In normal Azure operation this path is never hit, so it does not affect the happy path. The fix is straightforward and low-risk. The rest of the code (auth factory, URI parsing, conditional writes, caching) is correct and comprehensively tested.

hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java — specifically the readCurrentLockFile exception handling and the redundant AZURE_SAS_TOKEN constant

Sequence Diagram (CodeRabbit):

sequenceDiagram
    participant Client
    participant AzureStorageLockClient
    participant CredentialFactory
    participant BlobServiceClient
    participant AzureStorage["Azure Blob Storage"]

    Client->>AzureStorageLockClient: tryUpsertLockFile(newLockData, previousLockFile)
    AzureStorageLockClient->>CredentialFactory: getAzureCredential(props)
    CredentialFactory->>CredentialFactory: resolve identity or service principal
    CredentialFactory-->>AzureStorageLockClient: TokenCredential
    AzureStorageLockClient->>BlobServiceClient: uploadBlockBlob(content, condition)
    alt No prior lock
        BlobServiceClient->>AzureStorage: PUT with If-None-Match: *
    else Prior lock exists
        BlobServiceClient->>AzureStorage: PUT with If-Match: <ETag>
    end
    alt Success
        AzureStorage-->>BlobServiceClient: 201/200 + ETag
        BlobServiceClient-->>AzureStorageLockClient: BlockBlobItem
        AzureStorageLockClient-->>Client: (ACQUIRED, StorageLockFile)
    else Precondition failed
        AzureStorage-->>BlobServiceClient: 412
        BlobServiceClient-->>AzureStorageLockClient: BlobStorageException
        AzureStorageLockClient-->>Client: (ACQUIRED_BY_OTHERS, None)
    end
Loading

Sequence Diagram (CodeRabbit):

sequenceDiagram
    participant Client
    participant AzureStorageLockClient
    participant BlobServiceClient
    participant AzureStorage["Azure Blob Storage"]

    Client->>AzureStorageLockClient: readCurrentLockFile()
    AzureStorageLockClient->>BlobServiceClient: downloadStream(blobPath)
    BlobServiceClient->>AzureStorage: GET blob
    alt Blob exists
        AzureStorage-->>BlobServiceClient: 200 + stream + ETag header
        BlobServiceClient-->>AzureStorageLockClient: stream + headers
        AzureStorageLockClient->>AzureStorageLockClient: normalize ETag
        AzureStorageLockClient-->>Client: (SUCCESS, StorageLockFile)
    else Blob not found
        AzureStorage-->>BlobServiceClient: 404
        BlobServiceClient-->>AzureStorageLockClient: BlobStorageException
        AzureStorageLockClient-->>Client: (NOT_EXISTS, None)
    end
Loading

Sequence Diagram (Greptile):

sequenceDiagram
    participant P as StorageBasedLockProvider
    participant C as AzureStorageLockClient
    participant B as BlobClient (Azure SDK)
    participant Z as Azure Blob Storage

    Note over P,Z: Lock Acquisition (tryUpsertLockFile)
    P->>C: tryUpsertLockFile(newLockData, previousLockFile)
    C->>B: uploadWithResponse(BlobParallelUploadOptions, ...)
    Note over C,B: Conditional headers:<br/>If-None-Match:* (create)<br/>If-Match: etag (update)
    B->>Z: PUT blob with conditional header
    alt 200 OK
        Z-->>B: ETag in response
        B-->>C: Response BlockBlobItem
        C->>C: canonicalizeEtag(etag)
        C-->>P: (SUCCESS, StorageLockFile)
    else 412 Precondition Failed
        Z-->>B: 412
        B-->>C: BlobStorageException(412)
        C-->>P: (ACQUIRED_BY_OTHERS, empty)
    else 409/429/5xx
        Z-->>B: error
        B-->>C: BlobStorageException
        C-->>P: (UNKNOWN_ERROR, empty)
    end

    Note over P,Z: Lock Read (readCurrentLockFile)
    P->>C: readCurrentLockFile()
    C->>B: downloadContentWithResponse(...)
    B->>Z: GET blob
    alt 200 OK
        Z-->>B: blob content + ETag header
        B-->>C: Response BinaryData
        C->>C: canonicalizeEtag(etag)
        Note over C: Throws HoodieLockException<br/>if ETag null/malformed
        C-->>P: (SUCCESS, StorageLockFile)
    else 404 Not Found
        Z-->>B: 404
        B-->>C: BlobStorageException(404)
        C-->>P: (NOT_EXISTS, empty)
    else 429/5xx
        Z-->>B: error
        B-->>C: BlobStorageException
        C-->>P: (UNKNOWN_ERROR, empty)
    end
Loading

CodeRabbit: yihua#45 (review)
Greptile: yihua#45 (review)

Comment thread hudi-azure/pom.xml
Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Minor nits are non-blocking.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Nice cleanup — the only changes since last review are import reorganization and removal of the unused AZURE_SAS_TOKEN public constant in AzureStorageLockClient. No functional changes. All prior findings from my earlier review (null ETag handling, removal of ADLSStorageLockClient, lazy DefaultAzureCredential init) remain resolved. Open items from other reviewers (Vinoth's questions about parquet version bump, DefaultAzureCredential vs. production credential setup, throwing on UNKNOWN case at line 322, documenting minimum SAS permissions) are still pending responses in the thread — worth addressing before merge, but no new issues introduced by this patch.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua changed the title feat: Add hudi azure based storage lock feat: Add Azure-based storage lock Apr 18, 2026
@yihua yihua merged commit eaaae8a into apache:master Apr 18, 2026
58 of 60 checks passed
@linliu-code linliu-code mentioned this pull request May 16, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants