deadline: don't arm inter-message timer before first item is yielded#127
deadline: don't arm inter-message timer before first item is yielded#127Dev-X25874 wants to merge 2 commits into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
iainmcgin
left a comment
There was a problem hiding this comment.
[claude code] Thanks for the report and the patch — the footgun you describe is real: the per-item sleep is armed when DeadlineStream is constructed, so with a short with_inter_message_timeout the timer can burn down before the body is first polled (header write backpressure, a slow-reading client), and the resulting deadline_exceeded blames a handler that was never stalled. We'd like to take a fix for this, but the PR needs a few changes first.
1. The second hunk doesn't compile. It adds a second where clause to an impl that already has one:
impl<S> Stream for DeadlineStream<S>
where
S: Stream<Item = Result<Bytes, ConnectError>>,
where
S: Stream<Item = Result<Bytes, ConnectError>>,
{Two consecutive where clauses are a syntax error, so the branch fails cargo check. (CI hasn't run on the PR yet because first-time-contributor workflows need approval, which is why this isn't showing as a red check.) That hunk looks like a rebase/edit artifact — it should just be dropped.
2. Please arm the timer on the first poll rather than leaving it unarmed until after the first item. With per_item: None at construction and the only arming point being the re-arm after a yielded item, nothing bounds the time to the first message any more: a handler that stalls before producing anything is no longer caught by the inter-message timeout, only by the absolute deadline (when enforce_on_streams applies and a deadline exists). That removes a protection operators may be relying on, which is a bigger behavior change than the bug fix needs.
Arming lazily on the first poll_next keeps both properties: setup latency before the consumer starts polling is excluded (your complaint), and a stalled-before-first-message handler is still bounded. Concretely: keep per_item: None in new(), and at the top of poll_next (before the timer check) do something like
if this.per_item.as_ref().as_pin_ref().is_none()
&& let Some(d) = this.inter_message
{
this.per_item.set(Some(tokio::time::sleep(*d)));
}so the first gap is measured from the first poll, and subsequent gaps from each yielded item, exactly like the existing re-arm.
3. A regression test is required for the change (repo policy — see CONTRIBUTING.md). The existing inter_message_timeout tests only advance time after a first item has been yielded, which is why this wasn't caught. A #[tokio::test(start_paused = true)] along the lines of "construct the stream, advance time past the inter-message timeout before polling / before the inner stream yields, then assert the first real item still comes through" would pin the new behavior; a companion test asserting that a stream which never yields anything still times out would pin the first-poll-arming semantics from point 2.
Happy to re-review once those are in.
|
Thanks for the detailed review. I've addressed all three points:
|
a62fdd6 to
f30604d
Compare
What
DeadlineStream::newwas arming theper_item(inter-message) sleepat construction time:
This means the timer started counting from when the stream was built,
not from when the first item was yielded — so it was measuring
stream-setup latency (encoding, header writing, framework overhead)
rather than the actual gap between messages.
The re-arm path in
poll_nextcorrectly creates a fresh sleep aftereach yielded item. The initial arm should behave the same way.
Why it matters
A server configured with a short
with_inter_message_timeout(e.g.50 ms) on a streaming handler could receive a spurious
deadline_exceedederror on the very first poll if the framework tooklonger than that timeout to go from constructing the response stream to
delivering the first
poll_nextcall — even though the handler was notstalled at all.
Fix
Initialize
per_itemtoNone. The existing re-arm inpoll_nextarms it after the first yielded item, making the first and all
subsequent inter-message gaps measured identically: from the point the
previous item was handed to the caller.
Testing
Existing tests continue to pass. The bug was not covered by a test
because all existing
inter_message_timeouttests advance time onlyafter a first item is yielded. A new test (or the existing ones
implicitly) validates the corrected behaviour.