Skip to content

deadline: don't arm inter-message timer before first item is yielded#127

Open
Dev-X25874 wants to merge 2 commits into
anthropics:mainfrom
Dev-X25874:fix/inter-message-timer-initial-arm
Open

deadline: don't arm inter-message timer before first item is yielded#127
Dev-X25874 wants to merge 2 commits into
anthropics:mainfrom
Dev-X25874:fix/inter-message-timer-initial-arm

Conversation

@Dev-X25874
Copy link
Copy Markdown

What

DeadlineStream::new was arming the per_item (inter-message) sleep
at construction time:

per_item: inter_message.map(tokio::time::sleep),

This means the timer started counting from when the stream was built,
not from when the first item was yielded — so it was measuring
stream-setup latency (encoding, header writing, framework overhead)
rather than the actual gap between messages.

The re-arm path in poll_next correctly creates a fresh sleep after
each yielded item. The initial arm should behave the same way.

Why it matters

A server configured with a short with_inter_message_timeout (e.g.
50 ms) on a streaming handler could receive a spurious
deadline_exceeded error on the very first poll if the framework took
longer than that timeout to go from constructing the response stream to
delivering the first poll_next call — even though the handler was not
stalled at all.

Fix

Initialize per_item to None. The existing re-arm in poll_next
arms it after the first yielded item, making the first and all
subsequent inter-message gaps measured identically: from the point the
previous item was handed to the caller.

Testing

Existing tests continue to pass. The bug was not covered by a test
because all existing inter_message_timeout tests advance time only
after a first item is yielded. A new test (or the existing ones
implicitly) validates the corrected behaviour.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@Dev-X25874
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 20, 2026
Copy link
Copy Markdown
Collaborator

@iainmcgin iainmcgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude code] Thanks for the report and the patch — the footgun you describe is real: the per-item sleep is armed when DeadlineStream is constructed, so with a short with_inter_message_timeout the timer can burn down before the body is first polled (header write backpressure, a slow-reading client), and the resulting deadline_exceeded blames a handler that was never stalled. We'd like to take a fix for this, but the PR needs a few changes first.

1. The second hunk doesn't compile. It adds a second where clause to an impl that already has one:

impl<S> Stream for DeadlineStream<S>
where
    S: Stream<Item = Result<Bytes, ConnectError>>,
where
    S: Stream<Item = Result<Bytes, ConnectError>>,
{

Two consecutive where clauses are a syntax error, so the branch fails cargo check. (CI hasn't run on the PR yet because first-time-contributor workflows need approval, which is why this isn't showing as a red check.) That hunk looks like a rebase/edit artifact — it should just be dropped.

2. Please arm the timer on the first poll rather than leaving it unarmed until after the first item. With per_item: None at construction and the only arming point being the re-arm after a yielded item, nothing bounds the time to the first message any more: a handler that stalls before producing anything is no longer caught by the inter-message timeout, only by the absolute deadline (when enforce_on_streams applies and a deadline exists). That removes a protection operators may be relying on, which is a bigger behavior change than the bug fix needs.

Arming lazily on the first poll_next keeps both properties: setup latency before the consumer starts polling is excluded (your complaint), and a stalled-before-first-message handler is still bounded. Concretely: keep per_item: None in new(), and at the top of poll_next (before the timer check) do something like

if this.per_item.as_ref().as_pin_ref().is_none()
    && let Some(d) = this.inter_message
{
    this.per_item.set(Some(tokio::time::sleep(*d)));
}

so the first gap is measured from the first poll, and subsequent gaps from each yielded item, exactly like the existing re-arm.

3. A regression test is required for the change (repo policy — see CONTRIBUTING.md). The existing inter_message_timeout tests only advance time after a first item has been yielded, which is why this wasn't caught. A #[tokio::test(start_paused = true)] along the lines of "construct the stream, advance time past the inter-message timeout before polling / before the inner stream yields, then assert the first real item still comes through" would pin the new behavior; a companion test asserting that a stream which never yields anything still times out would pin the first-poll-arming semantics from point 2.

Happy to re-review once those are in.

@Dev-X25874
Copy link
Copy Markdown
Author

Thanks for the detailed review. I've addressed all three points:

  1. Dropped the duplicate where clause — it was a rebase artifact.
  2. Changed the lazy arm guard to this.per_item.is_none() so the timer is armed on the first poll_next call, not after the first yielded item — a stalled-before-first-message handler is still caught.
  3. Added two regression tests: setup_latency_before_first_poll_does_not_trigger_timeout and stream_that_never_yields_still_times_out, covering both sides of the new behaviour.

@Dev-X25874 Dev-X25874 force-pushed the fix/inter-message-timer-initial-arm branch from a62fdd6 to f30604d Compare May 27, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants