Skip to content

Conversation

@sumitagrawl
Copy link
Contributor

@sumitagrawl sumitagrawl commented Mar 27, 2025

What changes were proposed in this pull request?

Granular locking framework for OBS for existing flow. Its just framework binding code flow but pending to call the lock in respective flow.

locking is added for external request at entry point. This provides execution of request at leader and existing flow simultaneouly without impacting for cache.

  • refer obs-locking.md for locking added for obs request (HDDS-11898. design doc leader side execution)
  • refer leader-execution.md for Step-by-step integration of existing request (interoperability)

Next PR will include:

Parent Jira:
https://issues.apache.org/jira/browse/HDDS-11900

Its Parent for Epic;
https://issues.apache.org/jira/browse/HDDS-11897

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12356

How was this patch tested?

  • UT cases added for lock

@sumitagrawl sumitagrawl added the om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897 label Mar 27, 2025
@sumitagrawl sumitagrawl marked this pull request as ready for review March 27, 2025 13:25
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl , thanks for the update! The code looks better. Please see the comments/questions inlined.

* Manage locking of volume, bucket, keys and others.
*/
public class OmLockOperation {
private static final long LOCK_TIMEOUT_DEFAULT = 10 * 60 * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it needs a 10min timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea here is,

  • Hadoop RPC thread pool ozone.om.grpc.read.thread.num is configured default 10, so it may be blocked if do not timeout.
  • number of request waiting keeps increasing and hence memory

So IMO, a server should timeout request at appropriate time. May be lesser value as configurable or context of usages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OM currently does not have lock timeout. We do not see any problems. Let's don't fix this non-occurring problem here for simplicity reason. If we see the problem, we can add lock timeout. @kerneltime , what do you think?

Initially, I thought lock time would complicate the code a lot. Now, I find out that It is actually not very complicated. Thanks @errose28 for pointing it out.

Comment on lines +95 to +96
Lock lockObj = fileStripedLock.get(key).writeLock();
boolean b = lockObj.tryLock(lockTimeout, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not mixing getLock and acquireLock together. It makes the code complicated and error prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it first get() lock object based on name belonging to the bucket, which internally it generate hash and return.
Next is, tryLock() to lock
I am not getting the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, how would you release the lock? get it again and then release? Why getting the same lock twice?

long startTime = Time.monotonicNowNanos();
try {
OmLockInfo.LockLevel level = lockInfo.getLevel();
switch (level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, above are better to avoid switch. will take up with finalized solution.

@sumitagrawl
Copy link
Contributor Author

Code is restructured and new PR is raised, closing this PR.
#8217

@sumitagrawl sumitagrawl closed this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants