Skip to content

[OSS PR #17951] feat: Add hudi azure based storage lock#45

Open
yihua wants to merge 11 commits into
masterfrom
oss-17951
Open

[OSS PR #17951] feat: Add hudi azure based storage lock#45
yihua wants to merge 11 commits into
masterfrom
oss-17951

Conversation

@yihua
Copy link
Copy Markdown
Owner

@yihua yihua commented Apr 15, 2026

Mirror of apache#17951 for automated bot review.

Original author: @chrevanthreddy
Base branch: master

Summary by CodeRabbit

  • New Features

    • Added support for Azure Storage Blob (ABFS/WASB) for distributed writer locking
    • Azure credential management supporting managed identity and service principal authentication
    • New configuration properties for Azure storage connection and authentication
    • Azure storage schemes now report storage-lock support so Azure URIs are recognized by lock APIs
  • Packaging

    • Added an Azure-focused bundle artifact for simplified Azure deployments
  • Tests

    • Added unit and integration tests validating Azure lock behavior and URI handling

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c0bd404-57b0-4cd7-bcc2-4ab732814a6a

📥 Commits

Reviewing files that changed from the base of the PR and between 2ef504f and 2992185.

📒 Files selected for processing (1)
  • hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java

📝 Walkthrough

Walkthrough

Adds a new hudi-azure module that implements Azure-backed storage locks, credential resolution, config properties, tests, and packaging; registers Azure schemes as supporting storage locks in the core storage schemes enum.

Changes

Cohort / File(s) Summary
Module & Packaging
hudi-azure/pom.xml, packaging/hudi-azure-bundle/pom.xml, pom.xml
New Maven module and bundle descriptors for Azure support and bundling; integration-test profile and shading/packaging configuration.
Credentials
hudi-azure/src/main/java/org/apache/hudi/azure/credentials/AzureCredentialFactory.java
Factory resolving Azure TokenCredential from managed identity, service principal, or DefaultAzureCredential (lazy holder).
Azure Lock Client
hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java
New StorageLockClient implementation using conditional blob uploads/downloads, ETag normalization, error-to-result mapping, URI parsing, BlobServiceClient creation/caching, and read/write helpers.
Config
hudi-azure/src/main/java/org/apache/hudi/config/AzureStorageLockConfig.java
New ConfigProperty entries for Azure auth: connection string, SAS token, managed identity client id, and service-principal tenant/id/secret.
Unit Tests
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java, .../TestAzureStorageLockClientUriParsing.java
Unit tests for conditional write semantics, ETag handling, error mapping, URI parsing, and client behaviors.
Integration & Provider Tests
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/ITAzureStorageLockClientAzurite.java, .../TestAzureStorageBasedLockProvider.java
Azurite/Testcontainers integration tests and a disabled provider-level test scaffolding.
Storage Schemes
hudi-io/src/main/java/org/apache/hudi/storage/StorageSchemes.java
Azure scheme enum entries (WASB/WASBS/ABFS/ABFSS) now reference AzureStorageLockClient as the lock implementation.
Test Expectation Update
hudi-hadoop-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java
Adjusted tests to expect Azure schemes to report lock implementation presence.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant AzureStorageLockClient
    participant AzureCredentialFactory
    participant BlobServiceClient
    participant AzureBlob as "Azure Blob Storage"

    Client->>AzureStorageLockClient: tryUpsertLockFile(newLockData, previousLockFile)
    AzureStorageLockClient->>AzureCredentialFactory: getAzureCredential(props)
    AzureCredentialFactory-->>AzureStorageLockClient: TokenCredential
    AzureStorageLockClient->>BlobServiceClient: uploadBlockBlob(content, condition)
    alt No prior lock
        BlobServiceClient->>AzureBlob: PUT (If-None-Match: *)
    else Prior lock
        BlobServiceClient->>AzureBlob: PUT (If-Match: <ETag>)
    end
    alt Success
        AzureBlob-->>BlobServiceClient: 200/201 + ETag
        BlobServiceClient-->>AzureStorageLockClient: response (ETag)
        AzureStorageLockClient-->>Client: (ACQUIRED, StorageLockFile)
    else Precondition failed (412)
        AzureBlob-->>BlobServiceClient: 412
        BlobServiceClient-->>AzureStorageLockClient: BlobStorageException
        AzureStorageLockClient-->>Client: (ACQUIRED_BY_OTHERS, None)
    end
Loading
sequenceDiagram
    participant Client
    participant AzureStorageLockClient
    participant BlobServiceClient
    participant AzureBlob as "Azure Blob Storage"

    Client->>AzureStorageLockClient: readCurrentLockFile()
    AzureStorageLockClient->>BlobServiceClient: downloadStream(blobPath)
    BlobServiceClient->>AzureBlob: GET blob
    alt Blob exists
        AzureBlob-->>BlobServiceClient: 200 + stream + ETag
        BlobServiceClient-->>AzureStorageLockClient: stream + headers
        AzureStorageLockClient->>AzureStorageLockClient: normalize ETag + parse content
        AzureStorageLockClient-->>Client: (SUCCESS, StorageLockFile)
    else Blob not found (404)
        AzureBlob-->>BlobServiceClient: 404
        BlobServiceClient-->>AzureStorageLockClient: BlobStorageException
        AzureStorageLockClient-->>Client: (NOT_EXISTS, None)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Poem

🐰 Azure blobs hum soft and bright,
I hop and stitch each lock by night,
ETags snug in quoted strings,
Credentials bloom on gentle wings,
Hudi sleeps while races cease.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main objective of the changeset: adding Azure-based storage lock support to Hudi.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch oss-17951

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

This PR adds Azure Blob Storage / ADLS Gen2 based distributed locking for Hudi tables, mirroring the approach taken for S3 and GCS. It introduces AzureStorageLockClient (using Azure Blob conditional writes with ETag-based If-Match / If-None-Match), AzureStorageLockConfig for auth/connection configuration, AzureCredentialFactory for credential resolution, and wires the abfs, abfss, wasb, wasbs URI schemes into StorageSchemes. The implementation is well-structured with comprehensive unit tests and an optional Azurite integration test.

Key changes:

  • New AzureStorageLockClient implementing StorageLockClient with conditional-write semantics (412 → ACQUIRED_BY_OTHERS, 409/429/5xx → UNKNOWN_ERROR)
  • Supports five auth modes: connection string, SAS token, managed identity, service principal, DefaultAzureCredential
  • URI parsing for abfs[s]://, wasb[s]://, and https://http:// (Azurite) schemes
  • StorageSchemes enum updated to register Azure lock client class for wasb, wasbs, abfs, abfss
  • One logic concern: readCurrentLockFile can throw HoodieLockException for null/malformed ETags, but StorageBasedLockProvider doesn't catch that exception around its call — callers should be aware
  • One unused public constant: AZURE_SAS_TOKEN in AzureStorageLockClient duplicates AzureStorageLockConfig.AZURE_SAS_TOKEN.key() and is not used internally

Confidence Score: 4/5

Safe to merge with one targeted fix recommended: readCurrentLockFile should handle HoodieLockException consistently

The implementation is solid and well-tested. The main concern is the exception-handling inconsistency in readCurrentLockFile — throwing HoodieLockException for null/malformed ETags instead of returning UNKNOWN_ERROR (as tryUpsertLockFile does), while the StorageBasedLockProvider caller has no catch around that call. In normal Azure operation this path is never hit, so it does not affect the happy path. The fix is straightforward and low-risk. The rest of the code (auth factory, URI parsing, conditional writes, caching) is correct and comprehensively tested.

hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java — specifically the readCurrentLockFile exception handling and the redundant AZURE_SAS_TOKEN constant

Important Files Changed

Filename Overview
hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java Core lock client implementation — contains a logic concern where readCurrentLockFile() throws HoodieLockException for missing/malformed ETags rather than returning UNKNOWN_ERROR, inconsistent with tryUpsertLockFile; also an unused public AZURE_SAS_TOKEN constant
hudi-azure/src/main/java/org/apache/hudi/config/AzureStorageLockConfig.java Config class for Azure lock settings — clean, well-documented ConfigProperty definitions for all five auth modes
hudi-azure/src/main/java/org/apache/hudi/azure/credentials/AzureCredentialFactory.java Credential factory with lazy DefaultAzureCredential holder pattern — correctly resolves managed identity, service principal, and default credential chain
hudi-io/src/main/java/org/apache/hudi/storage/StorageSchemes.java Registers AzureStorageLockClient for wasb, wasbs, abfs, abfss URI schemes — straightforward and correct
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java Comprehensive unit tests covering upsert success/failure paths, ETag canonicalization, and secondary client caching — good coverage
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/ITAzureStorageLockClientAzurite.java Integration test using Testcontainers + Azurite — correctly disabled by default and uses well-known public Azurite dev key
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClientUriParsing.java URI parsing tests covering all supported schemes including Azurite HTTP endpoint — well-structured and thorough
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageBasedLockProvider.java Disabled integration test stub extending StorageBasedLockProviderTestBase — correctly marked @disabled for environments without Azurite
hudi-hadoop-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java Updated StorageSchemes test to verify Azure schemes now have lock implementations registered

Sequence Diagram

sequenceDiagram
    participant P as StorageBasedLockProvider
    participant C as AzureStorageLockClient
    participant B as BlobClient (Azure SDK)
    participant Z as Azure Blob Storage

    Note over P,Z: Lock Acquisition (tryUpsertLockFile)
    P->>C: tryUpsertLockFile(newLockData, previousLockFile)
    C->>B: uploadWithResponse(BlobParallelUploadOptions, ...)
    Note over C,B: Conditional headers:<br/>If-None-Match:* (create)<br/>If-Match: etag (update)
    B->>Z: PUT blob with conditional header
    alt 200 OK
        Z-->>B: ETag in response
        B-->>C: Response BlockBlobItem
        C->>C: canonicalizeEtag(etag)
        C-->>P: (SUCCESS, StorageLockFile)
    else 412 Precondition Failed
        Z-->>B: 412
        B-->>C: BlobStorageException(412)
        C-->>P: (ACQUIRED_BY_OTHERS, empty)
    else 409/429/5xx
        Z-->>B: error
        B-->>C: BlobStorageException
        C-->>P: (UNKNOWN_ERROR, empty)
    end

    Note over P,Z: Lock Read (readCurrentLockFile)
    P->>C: readCurrentLockFile()
    C->>B: downloadContentWithResponse(...)
    B->>Z: GET blob
    alt 200 OK
        Z-->>B: blob content + ETag header
        B-->>C: Response BinaryData
        C->>C: canonicalizeEtag(etag)
        Note over C: Throws HoodieLockException<br/>if ETag null/malformed
        C-->>P: (SUCCESS, StorageLockFile)
    else 404 Not Found
        Z-->>B: 404
        B-->>C: BlobStorageException(404)
        C-->>P: (NOT_EXISTS, empty)
    else 429/5xx
        Z-->>B: error
        B-->>C: BlobStorageException
        C-->>P: (UNKNOWN_ERROR, empty)
    end
Loading

Reviews (1): Last reviewed commit: "Revert parquet version" | Re-trigger Greptile

Comment on lines +239 to +252
@Override
public Pair<LockGetResult, Option<StorageLockFile>> readCurrentLockFile() {
try {
Response<BinaryData> response = lockBlobClient.downloadContentWithResponse(null, null, null, Context.NONE);
//Check for null or empty ETag and inconsistent quotes
String eTag = canonicalizeEtag(
response.getHeaders() != null ? response.getHeaders().getValue("ETag") : null,
"download");
StorageLockFile lockFile = StorageLockFile.createFromStream(response.getValue().toStream(), eTag);
return Pair.of(LockGetResult.SUCCESS, Option.of(lockFile));
} catch (BlobStorageException e) {
return Pair.of(handleGetStorageException(e), Option.empty());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 readCurrentLockFile can throw HoodieLockException past an unguarded caller

canonicalizeEtag (called at line 244) throws HoodieLockException when the ETag is null, empty, or malformed. Only BlobStorageException is caught here, so that unchecked exception escapes the method. StorageBasedLockProvider.readCurrentLockFile (line 326 in StorageBasedLockProvider.java) calls this with no surrounding try-catch, meaning a corrupt Azure HTTP response will propagate as an unhandled exception and surface as a lock-acquisition failure rather than being gracefully retried.

By contrast, tryUpsertLockFile catches every Exception and maps it to LockUpsertResult.UNKNOWN_ERROR. The same pattern applied here would keep behaviour consistent with the interface's return-value contract:

Suggested change
@Override
public Pair<LockGetResult, Option<StorageLockFile>> readCurrentLockFile() {
try {
Response<BinaryData> response = lockBlobClient.downloadContentWithResponse(null, null, null, Context.NONE);
//Check for null or empty ETag and inconsistent quotes
String eTag = canonicalizeEtag(
response.getHeaders() != null ? response.getHeaders().getValue("ETag") : null,
"download");
StorageLockFile lockFile = StorageLockFile.createFromStream(response.getValue().toStream(), eTag);
return Pair.of(LockGetResult.SUCCESS, Option.of(lockFile));
} catch (BlobStorageException e) {
return Pair.of(handleGetStorageException(e), Option.empty());
}
}
@Override
public Pair<LockGetResult, Option<StorageLockFile>> readCurrentLockFile() {
try {
Response<BinaryData> response = lockBlobClient.downloadContentWithResponse(null, null, null, Context.NONE);
//Check for null or empty ETag and inconsistent quotes
String eTag = canonicalizeEtag(
response.getHeaders() != null ? response.getHeaders().getValue("ETag") : null,
"download");
StorageLockFile lockFile = StorageLockFile.createFromStream(response.getValue().toStream(), eTag);
return Pair.of(LockGetResult.SUCCESS, Option.of(lockFile));
} catch (BlobStorageException e) {
return Pair.of(handleGetStorageException(e), Option.empty());
} catch (HoodieLockException e) {
logger.error("OwnerId: {}, Malformed or missing ETag in Azure response for lock file: {}", ownerId, lockFileUri, e);
throw e;
} catch (Exception e) {
logger.error("OwnerId: {}, Unexpected error while reading lock file: {}", ownerId, lockFileUri, e);
return Pair.of(LockGetResult.UNKNOWN_ERROR, Option.empty());
}
}

At minimum, the explicit re-throw of HoodieLockException (with logging) makes the failure mode obvious at the call site and clearly documents the intentional "fail fast on protocol violation" behaviour.

private static final int RATE_LIMIT_ERROR_CODE = 429;
private static final int INTERNAL_SERVER_ERROR_CODE_MIN = 500;

public static final String AZURE_SAS_TOKEN = AzureStorageLockConfig.AZURE_SAS_TOKEN.key();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant public constant pollutes API

AZURE_SAS_TOKEN duplicates AzureStorageLockConfig.AZURE_SAS_TOKEN.key() and is never used internally — the class references AzureStorageLockConfig.AZURE_SAS_TOKEN directly at line 177. Exposing it as a public static final string in this class creates a secondary API surface that callers might depend on, making future refactoring harder without a clear benefit.

Remove the constant entirely; anyone needing the key should use AzureStorageLockConfig.AZURE_SAS_TOKEN.key() directly.

Comment on lines +371 to +378
private BlobServiceClient getBlobServiceClient(AzureLocation location) {
if (location.blobEndpoint.equals(lockBlobEndpoint)) {
return blobServiceClient;
}
return secondaryBlobServiceClients.computeIfAbsent(
location.blobEndpoint,
endpoint -> blobServiceClientSupplier.apply(location.withProperties(clientProperties)));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Lambda parameter endpoint is shadowed and unused

In the computeIfAbsent lambda, the parameter endpoint (the cache key, equal to location.blobEndpoint) is never referenced — the outer location variable is captured instead. The captured location happens to carry the same endpoint, so correctness is preserved, but the unused parameter is misleading.

Suggested change
private BlobServiceClient getBlobServiceClient(AzureLocation location) {
if (location.blobEndpoint.equals(lockBlobEndpoint)) {
return blobServiceClient;
}
return secondaryBlobServiceClients.computeIfAbsent(
location.blobEndpoint,
endpoint -> blobServiceClientSupplier.apply(location.withProperties(clientProperties)));
}
private BlobServiceClient getBlobServiceClient(AzureLocation location) {
if (location.blobEndpoint.equals(lockBlobEndpoint)) {
return blobServiceClient;
}
AzureLocation locationWithProps = location.withProperties(clientProperties);
return secondaryBlobServiceClients.computeIfAbsent(
location.blobEndpoint,
endpoint -> blobServiceClientSupplier.apply(locationWithProps));
}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java (2)

264-276: Remove unused variables in test.

data and json are declared but never used in this test since the exception is thrown during ETag validation before the response body is processed.

♻️ Proposed cleanup
   `@Test`
   void testReadCurrentLockFile_missingEtag_throwsHoodieLockException() {
-    StorageLockData data = new StorageLockData(false, 1700000000000L, "testOwner");
-    byte[] json = StorageLockFile.toByteArray(data);
     BlobDownloadContentResponse response = mock(BlobDownloadContentResponse.class);
     when(response.getHeaders()).thenReturn(null);
     when(mockBlobClient.downloadContentWithResponse(isNull(), isNull(), isNull(), eq(Context.NONE))).thenReturn(response);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java`
around lines 264 - 276, In
testReadCurrentLockFile_missingEtag_throwsHoodieLockException (class
TestAzureStorageLockClient) remove the unused local variables StorageLockData
data and byte[] json since the test triggers the exception before the response
body is used; keep the mocked BlobDownloadContentResponse and existing when(...)
stubbing and assertions unchanged so the test still validates the missing ETag
behavior.

278-290: Remove unused variables in test.

Same as above - data and json are unused.

♻️ Proposed cleanup
   `@Test`
   void testReadCurrentLockFile_emptyEtag_throwsHoodieLockException() {
-    StorageLockData data = new StorageLockData(false, 1700000000000L, "testOwner");
-    byte[] json = StorageLockFile.toByteArray(data);
     BlobDownloadContentResponse response = mock(BlobDownloadContentResponse.class);
     when(response.getHeaders()).thenReturn(new HttpHeaders().set("ETag", ""));
     when(mockBlobClient.downloadContentWithResponse(isNull(), isNull(), isNull(), eq(Context.NONE))).thenReturn(response);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java`
around lines 278 - 290, In the test method
testReadCurrentLockFile_emptyEtag_throwsHoodieLockException remove the unused
local variables StorageLockData data and byte[] json (they are created but never
used) so the test only sets up the mocked BlobDownloadContentResponse and
asserts the HoodieLockException; delete the declarations of data and json to
clean up the test and keep the rest of the setup (mocking response.getHeaders
and mockBlobClient.downloadContentWithResponse) and assertions unchanged.
hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClientUriParsing.java (1)

25-91: Good coverage of URI parsing happy paths.

The tests thoroughly cover all supported schemes. Consider adding negative test cases for error handling (e.g., missing container, missing path, unsupported scheme) to ensure parseAzureLocation throws appropriate IllegalArgumentExceptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClientUriParsing.java`
around lines 25 - 91, Add negative unit tests in
TestAzureStorageLockClientUriParsing that call
AzureStorageLockClient.parseAzureLocation and assert it throws
IllegalArgumentException for invalid URIs: (1) missing container (e.g.,
"https://account.blob.core.windows.net//path"), (2) missing blob path (e.g.,
"abfs://container@account.dfs.core.windows.net/"), and (3) unsupported scheme
(e.g., "ftp://account/container/path"); use assertThrows to verify the exception
and give each test a descriptive name like testParseMissingContainerThrows,
testParseMissingPathThrows, testParseUnsupportedSchemeThrows so failures point
to the specific invalid-case handling in parseAzureLocation.
hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java (1)

213-237: Consider documenting the asymmetric exception handling for create vs update.

On HttpResponseException (non-BlobStorage), creates fail fast by re-throwing while updates return UNKNOWN_ERROR. This is intentional per the comment, but adding a brief inline explanation of why this distinction matters would help future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java`
around lines 213 - 237, Add a short inline comment inside tryUpsertLockFile
explaining the asymmetric handling of HttpResponseException: when
previousLockFile.isPresent() is false (create) we re-throw the
HttpResponseException to fail fast because a create failure likely indicates a
systemic or permission issue that should not be masked, whereas on updates we
return LockUpsertResult.UNKNOWN_ERROR to allow higher-level retry/coordination
logic to handle potential transient conflicts; place this comment immediately
above the if (!previousLockFile.isPresent()) branch so maintainers can quickly
see the rationale for the different behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hudi-azure/pom.xml`:
- Line 37: The pom's azure.jackson.version property is pinned to vulnerable
2.13.5; update the property azure.jackson.version to a secure release
(preferably 2.18.6 or later) so all Jackson artifacts that inherit this property
(the Jackson dependencies declared under the pom) are upgraded; if you cannot
move off 2.13.x for Azure SDK compatibility, set azure.jackson.version to the
NES patched coordinate 2.13.5-jackson-core-2.13.7 instead; ensure you run a full
build to confirm compatibility after changing the azure.jackson.version
property.

---

Nitpick comments:
In
`@hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java`:
- Around line 213-237: Add a short inline comment inside tryUpsertLockFile
explaining the asymmetric handling of HttpResponseException: when
previousLockFile.isPresent() is false (create) we re-throw the
HttpResponseException to fail fast because a create failure likely indicates a
systemic or permission issue that should not be masked, whereas on updates we
return LockUpsertResult.UNKNOWN_ERROR to allow higher-level retry/coordination
logic to handle potential transient conflicts; place this comment immediately
above the if (!previousLockFile.isPresent()) branch so maintainers can quickly
see the rationale for the different behavior.

In
`@hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java`:
- Around line 264-276: In
testReadCurrentLockFile_missingEtag_throwsHoodieLockException (class
TestAzureStorageLockClient) remove the unused local variables StorageLockData
data and byte[] json since the test triggers the exception before the response
body is used; keep the mocked BlobDownloadContentResponse and existing when(...)
stubbing and assertions unchanged so the test still validates the missing ETag
behavior.
- Around line 278-290: In the test method
testReadCurrentLockFile_emptyEtag_throwsHoodieLockException remove the unused
local variables StorageLockData data and byte[] json (they are created but never
used) so the test only sets up the mocked BlobDownloadContentResponse and
asserts the HoodieLockException; delete the declarations of data and json to
clean up the test and keep the rest of the setup (mocking response.getHeaders
and mockBlobClient.downloadContentWithResponse) and assertions unchanged.

In
`@hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClientUriParsing.java`:
- Around line 25-91: Add negative unit tests in
TestAzureStorageLockClientUriParsing that call
AzureStorageLockClient.parseAzureLocation and assert it throws
IllegalArgumentException for invalid URIs: (1) missing container (e.g.,
"https://account.blob.core.windows.net//path"), (2) missing blob path (e.g.,
"abfs://container@account.dfs.core.windows.net/"), and (3) unsupported scheme
(e.g., "ftp://account/container/path"); use assertThrows to verify the exception
and give each test a descriptive name like testParseMissingContainerThrows,
testParseMissingPathThrows, testParseUnsupportedSchemeThrows so failures point
to the specific invalid-case handling in parseAzureLocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b61e873-30be-4385-9d15-fb672d921c73

📥 Commits

Reviewing files that changed from the base of the PR and between 35e2bbf and 2ef504f.

📒 Files selected for processing (12)
  • hudi-azure/pom.xml
  • hudi-azure/src/main/java/org/apache/hudi/azure/credentials/AzureCredentialFactory.java
  • hudi-azure/src/main/java/org/apache/hudi/azure/transaction/lock/AzureStorageLockClient.java
  • hudi-azure/src/main/java/org/apache/hudi/config/AzureStorageLockConfig.java
  • hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/ITAzureStorageLockClientAzurite.java
  • hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageBasedLockProvider.java
  • hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClient.java
  • hudi-azure/src/test/java/org/apache/hudi/azure/transaction/lock/TestAzureStorageLockClientUriParsing.java
  • hudi-hadoop-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java
  • hudi-io/src/main/java/org/apache/hudi/storage/StorageSchemes.java
  • packaging/hudi-azure-bundle/pom.xml
  • pom.xml

Comment thread hudi-azure/pom.xml
<azure.storage.blob.version>12.26.0</azure.storage.blob.version>
<azure.identity.version>1.12.2</azure.identity.version>
<!-- Keep Jackson consistent for Azure SDK integration tests (Hudi core pins older Jackson versions). -->
<azure.jackson.version>2.13.5</azure.jackson.version>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify vulnerabilities for jackson-core:2.13.5 from OSV
curl -s https://api.osv.dev/v1/query \
  -H "Content-Type: application/json" \
  -d '{"package":{"ecosystem":"Maven","name":"com.fasterxml.jackson.core:jackson-core"},"version":"2.13.5"}' \
  | jq '{vuln_ids: [.vulns[].id], summaries: [.vulns[].summary]}'

Repository: yihua/hudi

Length of output: 339


🏁 Script executed:

cat -n hudi-azure/pom.xml | head -150

Repository: yihua/hudi

Length of output: 6482


🌐 Web query:

Jackson 2.13 latest patched version CVE GHSA-72hv-8253-57qq

💡 Result:

The official FasterXML Jackson 2.13 branch latest version is 2.13.5 (released around 2023), with no further patches. The GHSA-72hv-8253-57qq vulnerability (Number Length Constraint Bypass in Async Parser, published Feb 26, 2026, affects jackson-core 2.0.0-2.18.5 and others, patched officially in 2.18.6+) is not patched in official 2.13.x releases, as the branch is closed. Third-party patches exist: - HeroDevs NES: jackson-core 2.13.5-jackson-core-2.13.7 (Mar 3, 2026), explicitly patches GHSA-72hv-8253-57qq, available via their Maven registry (https://registry.nes.herodevs.com/maven). - sassoftware/jackson: Custom branch 2.13.5-CVE-2025-52999-sonatype-2022-6438 remediates related issues but not confirmed for this specific GHSA. For official sources, upgrade to Jackson 2.18.6+ is recommended. HeroDevs provides the latest patched 2.13 variant if stuck on 2.13.

Citations:


Upgrade Jackson from 2.13.5 to address known vulnerabilities (GHSA-72hv-8253-57qq, GHSA-h46c-h94j-95f3).

azure.jackson.version is pinned to 2.13.5, which contains two high-severity DoS vulnerabilities. The Jackson 2.13.x branch is no longer maintained; upgrade to 2.18.6 or later if Azure SDK compatibility permits. If locked on 2.13.x, use HeroDevs NES patched version 2.13.5-jackson-core-2.13.7 (published Mar 3, 2026, which explicitly patches GHSA-72hv-8253-57qq).

Applies to all Jackson artifacts in lines 118–147 that inherit this version.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-azure/pom.xml` at line 37, The pom's azure.jackson.version property is
pinned to vulnerable 2.13.5; update the property azure.jackson.version to a
secure release (preferably 2.18.6 or later) so all Jackson artifacts that
inherit this property (the Jackson dependencies declared under the pom) are
upgraded; if you cannot move off 2.13.x for Azure SDK compatibility, set
azure.jackson.version to the NES patched coordinate 2.13.5-jackson-core-2.13.7
instead; ensure you run a full build to confirm compatibility after changing the
azure.jackson.version property.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yihua has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants