timely-util: add instrumentation for slow operator polls#34702
Closed
bosconi wants to merge 1 commit intoMaterializeInc:mainfrom
Closed
timely-util: add instrumentation for slow operator polls#34702bosconi wants to merge 1 commit intoMaterializeInc:mainfrom
bosconi wants to merge 1 commit intoMaterializeInc:mainfrom
Conversation
Async operators running in the timely context can block the worker thread if they do significant synchronous work before hitting an await point. This can prevent heartbeat tasks from running and cause persist reader lease expirations. This change adds instrumentation to detect slow polls: - Track the duration of each future poll - Log a warning when a poll exceeds 10ms threshold - Include operator address and global_id in the warning This provides visibility into problematic operators while the more invasive architectural fix (channel-based communication with tokio tasks) is planned for future work. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
antiguru
requested changes
Jan 14, 2026
Member
antiguru
left a comment
There was a problem hiding this comment.
I don't think this is the right approach. We have introspection data that contains the information we need to diagnose which operators run for how long, and we should use this instead of inventing a new mechanism. (With the caveat that the introspection doesn't apply to sources/sinks, but that's a separate problem we should fix.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Motivation
Async operators running in the timely context can block the worker thread if they do significant synchronous work before hitting an await point. This can prevent heartbeat tasks from running and cause persist reader lease expirations.
The proper fix (swapping to channel-based communication with tokio tasks) is acknowledged as complex in the existing TODO. This PR adds visibility into the problem while that architectural work is planned.
Test plan
Generated with Claude Code