Skip to content

Doc update for slow operation detector [CTT-1152]#2173

Open
celenmelike wants to merge 3 commits into
hazelcast:mainfrom
celenmelike:ctt-1152
Open

Doc update for slow operation detector [CTT-1152]#2173
celenmelike wants to merge 3 commits into
hazelcast:mainfrom
celenmelike:ctt-1152

Conversation

@celenmelike
Copy link
Copy Markdown
Contributor

Clarify that operations around or below 1 second cannot be reliably detected by SlowOperationDetector.
Add recommendation to use a JVM profiler for shorter operations.

@celenmelike celenmelike self-assigned this Apr 30, 2026
@celenmelike celenmelike requested a review from a team as a code owner April 30, 2026 13:38
@celenmelike celenmelike added the documentation Improvements or additions to documentation label Apr 30, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 30, 2026

Deploy Preview for hardcore-allen-f5257d ready!

Name Link
🔨 Latest commit 5c851dd
🔍 Latest deploy log https://app.netlify.com/projects/hardcore-allen-f5257d/deploys/6a0329e50c307a0008665dfe
😎 Deploy Preview https://deploy-preview-2173--hardcore-allen-f5257d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@celenmelike celenmelike changed the title Doc update in slow operation detector [CTT-1152] Doc update for slow operation detector [CTT-1152] Apr 30, 2026
operation details, start time and duration of each slow invocation. All collected data is available in
the xref:{page-latest-supported-mc}@management-center:monitor-imdg:monitor-members.adoc[Management Center].

The `SlowOperationDetector` is designed to detect relatively long-running operations. Due to its internal scanning mechanism, it cannot reliably detect operations around or below ~1 second (1000 ms). Even if you configure a threshold lower than 1000 ms, such operation cannot be detected. If you need to detect operations that take less than 1 second, you should use a JVM profiler. However, use of third-party profiling tools is at your own risk. We do not provide support or warranties for any external profiler tools.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `SlowOperationDetector` is designed to detect relatively long-running operations. Due to its internal scanning mechanism, it cannot reliably detect operations around or below ~1 second (1000 ms). Even if you configure a threshold lower than 1000 ms, such operation cannot be detected. If you need to detect operations that take less than 1 second, you should use a JVM profiler. However, use of third-party profiling tools is at your own risk. We do not provide support or warranties for any external profiler tools.
The `SlowOperationDetector` is designed to detect relatively long-running operations. Operations are checked every 1,000ms, and due to this scanning mechanism, it cannot reliably detect operations taking less than 1 second (1,000ms). If the slow operation threshold is configured lower than 1,000ms, it may miss operations if they are not running at the time of each scan. If you need to detect operations that take less than 1 second, you should use a JVM profiler. However, use of third-party profiling tools are done at your own risk and should not be used in production environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 78fe7f0

include::clusters:partial$ucn-migrate-tip.adoc[]

The defaults catch extremely slow operations but you should set this much lower, say to 1ms, at development time to catch entry processors that could be problematic in production. These are good candidates for our optimizations.
The defaults catch extremely slow operations, but you can set this lower at development time to catch entry processors that could be problematic in production. However, operations with very short execution times (lower than ~1 second) may not always be detected. Entry processors that are detected as slow are good candidates for our optimizations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should keep this in the docs at all or just remove it? The slow operation detector is not useful for this case imo

Copy link
Copy Markdown
Contributor

@k-jamroz k-jamroz May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting that we had this recommendation before ("you should set this much lower"). It might still make sense in dev, but we should strongly emphasise the randomness of detection and lack of guarantee of reporting all offending operations. In dev as a safety net it is fine, for production or metrics definitely not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated here: 5c851dd. Is it ok for you to keep it like that?

|int
|Defines a threshold above which a running operation in `OperationService` is considered to be slow.
These operations log a warning and are shown in the Management Center with detailed information, e.g., stacktrace.
The value should be set at least 1000 ms for reliable results. The detector cannot detect operations under 1000 ms.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite accurate - it can detect operations under this, it's just a gamble as to whether they are present at the time of captures being made (making results unreliable)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The value should be set at least 1000 ms for reliable results. The detector cannot detect operations under 1000 ms.
The value should be set at least 1000 ms. The detector cannot reliably detect all operations taking under 1000 ms.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 78fe7f0

Comment thread docs/modules/cluster-performance/pages/threading.adoc Outdated
@@ -150,7 +150,7 @@ We have a slow user code detector which can be used to log a warning controlled

include::clusters:partial$ucn-migrate-tip.adoc[]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really related, but UCD deprecation warning feels misplaced in the middle of slow operation detector description

Copy link
Copy Markdown
Contributor

@k-jamroz k-jamroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Discussion about profilers is more general

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants