-
Notifications
You must be signed in to change notification settings - Fork 114
[controller] Add metrics and logs to detect number of threads working on the operations of the same store at the same period #2385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… on the operations of the same store at the same period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds observability features to detect when multiple threads in the admin consumption executor are working on operations from the same store concurrently. This is an undesirable situation that reduces throughput since operations for the same store are processed sequentially. The PR introduces metrics and logging to monitor this condition.
Changes:
- Added violation tracking metric to count stores with multiple concurrent processing threads
- Added thread-safe counter tracking for inflight threads per store
- Added comprehensive unit tests for the new tracking functionality
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| AdminExecutionTask.java | Adds taskOnStart/taskOnFinish methods to track inflight threads per store and log violations when multiple threads process the same store |
| AdminConsumptionStats.java | Introduces new violation_stores_count gauge metric and recordViolationStoresCount method |
| AdminConsumptionTask.java | Initializes inflightThreadsByStore map and passes it to AdminExecutionTask |
| AdminExecutionTaskTest.java | New comprehensive test suite covering no-violation, violation, and concurrent execution scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ice-controller/src/main/java/com/linkedin/venice/controller/stats/AdminConsumptionStats.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
...ice-controller/src/main/java/com/linkedin/venice/controller/stats/AdminConsumptionStats.java
Outdated
Show resolved
Hide resolved
...ice-controller/src/main/java/com/linkedin/venice/controller/stats/AdminConsumptionStats.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
...ller/src/test/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTaskTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
...ller/src/test/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTaskTest.java
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ice-controller/src/main/java/com/linkedin/venice/controller/stats/AdminConsumptionStats.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Outdated
Show resolved
Hide resolved
...ice-controller/src/main/java/com/linkedin/venice/controller/stats/AdminConsumptionStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem Statement
There were issues before that multiple threads in the admin consumption executor were working on the operations from the same store. This can reduce the throughput because for the same store, the operations will be done sequentially. Ideally, each thread should work on different store in the same period. This PR adds metircs and logs to detect if there are multiple threads work on the same store at the same time.
Solution
Code changes
testNoViolationInflightCounter
testNoViolationInflightCounterWithCancellationException
testConcurrentExecutionWithCancellationException
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Does this PR introduce any user-facing or breaking changes?