Skip to content

Conversation

@aryangupta1998
Copy link
Contributor

What changes were proposed in this pull request?

DirectoryDeletingService should use rocksdb deleteRange instead of creating individual tombstones which can cause seek time issue. But in the presence of snapshots the deleteRange APi should stitch continuous key ranges together that are reclaimable and not issue a blind deleteRange which could lead to incorrect reclaimation of the entry and lead to unreference orphan blocks when the snapshots are deleted.

DeleteRange APIs on FileTable, DirectoryTable, KeyTable can be used by background garbage collection services and should never be used by user facing APIs like keyDelete as that can cause issues in snapshot correctness.

E.g.

Dir1/Key1(Reclaimable) Dir1/Key2(Reclaimable) Dir1/key3(Not Reclaimable) Dir1/Key4(Reclaimable) Dir1/Key5(Reclaimable)

Then DirectoryDeletingService should issue 2 delete range like
[Dir1/Key1..Dir1/Key2] (Both inclusive)
[Dir1/Key4..Dir1/Key5]
In terms of rocksdb deleteRange where the end key range is exclusive this would be equivalent to
[Dir1/Key1..Dir1/Key3) and [Dir1/Key4..lexicographicalHigherString(Dir1/)]

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13311

How was this patch tested?

Testes via UT.

@swamirishi swamirishi self-requested a review December 3, 2025 17:58
Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the patch @aryangupta1998 Change this one fix I am reviewing other parts in the meanwhile

@adoroszlai adoroszlai marked this pull request as draft December 3, 2025 19:19
@adoroszlai
Copy link
Contributor

Please wait for clean CI run in fork before opening PR.

@aryangupta1998 aryangupta1998 marked this pull request as ready for review December 3, 2025 23:14
Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 please add unit tests for KeyManagerImpl and DirectoryDeletingService changes

@swamirishi swamirishi marked this pull request as draft December 4, 2025 01:15
@swamirishi
Copy link
Contributor

swamirishi commented Dec 4, 2025

@aryangupta1998 Don't make it ready for review until you get a +1 from reviewers

@smengcl smengcl self-requested a review December 4, 2025 16:29
@jojochuang jojochuang requested a review from smengcl December 4, 2025 16:29
@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Dec 4, 2025
@adoroszlai adoroszlai changed the title HDDS-13311. Directory Deleting Service can use deleteRange to delete subDirectories and subFiles HDDS-13311. Directory Deleting Service can use deleteRange for subDirectories and subFiles Dec 7, 2025
@jojochuang
Copy link
Contributor

@rnblough @ptlrs

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 It seems not feasible to use delete range, as parallel new item can get added in middle within range, and can delete those unexpected entry.
cc: @swamirishi

@aryangupta1998 aryangupta1998 marked this pull request as ready for review December 9, 2025 15:37
Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests with mocking layer

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aryangupta1998 Thanks for working on this, can you also add a metric to specify deleteRange requests.

validateNonOverlappingRanges();
}

private void validateNonOverlappingRanges() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Pass keyRanges in the argument this might throw NPE if someone moves this method in the constructor later on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this validation? Things need not be always sorted it depends on the rocksdb bytewise comparator order. Rocksdb would guarantee us that already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50,
In Ozone, RocksDB uses the default bytewise comparator, which compares keys by their raw byte sequence, so string keys are stored and returned in lexicographically sorted order. That means the ranges we construct from the iterator are already ordered. Also, the deleteRangeWithBatch() API treats the end key as exclusive, so even in the corner case where endKey = getLexicographicallyHigherString(path) for one range and the next iterator entry yields startKey = getLexicographicallyHigherString(path), the ranges do not overlap ([start, end) then [end, …)), which is safe. Based on this, I removed validateNonOverlappingRanges(), as RocksDB’s ordering plus our range construction logic already guarantees non‑overlapping, sorted ranges.

@adoroszlai adoroszlai marked this pull request as draft December 17, 2025 12:43
@swamirishi
Copy link
Contributor

We need to wait for #9553 since deleteRange patch has been reverted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants