feat: Add request_duration metric for each endpoint#634
Conversation
| } | ||
| requestDurationView *view.View = &view.View{ //nolint:gochecknoglobals | ||
| Measure: requestDurationMeasure, | ||
| Aggregation: view.Distribution(10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000, 500000, 1000000), |
There was a problem hiding this comment.
What happens if different versions of relay have different distributions?
There was a problem hiding this comment.
Different distributions will cause an individual metric to potentially be classified differently.
As is, a 23 microsecond response time gets bucketed into the second bucket. If we defined this differently, perhaps it would go into bucket 2 or 3.
This could lead to your P50/P90/etc showing a little differently. But as long as the thresholds aren't insanely different, it shouldn't be a big deal.
In v9 with OTEL, we move to dynamic, exponential distributions which is the new recommendation.
| // Don't record duration for streaming responses — their lifetime is unbounded | ||
| if w.Header().Get("X-Accel-Buffering") != "no" { | ||
| metrics.RecordRequestDuration(ctx.Env.GetMetricsContext(), userAgent, sdkWrapper, route, req.Method, time.Since(start), measure) | ||
| } |
There was a problem hiding this comment.
Is this a long term stable way to make this distinction? Does the application have the logical context to provide a parameter that says this is long lived, or perhaps something on the request reference?
A quick google search on that header seems to indicate this relates to NGINX. Seems a little off to make this decision based on a header respected by a certain popular proxy.
There was a problem hiding this comment.
The relay always adds that header for streaming requests, which is why it's okay for us to check against it.
There was a problem hiding this comment.
I am fine with this, but I do think "Content-Type: text/event-stream" is better. SSE will always have it, and it directly says that it is a stream. So even if you used it for non-sse we would still want to exclude it. (This is the header that has been used to fix the "wait for complete" logic in the various network debug tools.)
🤖 I have created a release *beep* *boop* --- ## [8.18.0](v8.17.8...v8.18.0) (2026-04-09) ### Features * Add request_duration metric for each endpoint ([#634](#634)) ([fb309fc](fb309fc)) ### Bug Fixes * **deps:** bump supported Go versions to 1.26.2 and 1.25.9 ([#633](#633)) ([08bbc9e](08bbc9e)) * remove racy syncTimeCh length assertions in big segment sync tests ([#624](#624)) ([2c43bda](2c43bda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Note
Low Risk
Low risk: adds a new OpenCensus duration metric and small middleware timing logic; behavior change is limited to additional metrics emission and explicitly skips unbounded streaming responses.
Overview
Adds a new
request_durationmetric and view, emitting a distribution of per-request latency tagged by env/platform/user agent/sdk wrapper plus route and method.Updates
RequestCountmiddleware to time requests and record duration for non-streaming responses, while continuing to count requests and explicitly skippingtext/event-stream(SSE) endpoints. Tests and shared test exporter utilities are extended to validate distribution metrics and to assert no duration is recorded for streaming routes.Reviewed by Cursor Bugbot for commit b6842d3. Bugbot is set up for automated code reviews on this repo. Configure here.