Skip to content

feat: Add request_duration metric for each endpoint#634

Merged
keelerm84 merged 4 commits intov8from
mk/sdk-2144/request-duration
Apr 9, 2026
Merged

feat: Add request_duration metric for each endpoint#634
keelerm84 merged 4 commits intov8from
mk/sdk-2144/request-duration

Conversation

@keelerm84
Copy link
Copy Markdown
Member

@keelerm84 keelerm84 commented Apr 8, 2026

Note

Low Risk
Low risk: adds a new OpenCensus duration metric and small middleware timing logic; behavior change is limited to additional metrics emission and explicitly skips unbounded streaming responses.

Overview
Adds a new request_duration metric and view, emitting a distribution of per-request latency tagged by env/platform/user agent/sdk wrapper plus route and method.

Updates RequestCount middleware to time requests and record duration for non-streaming responses, while continuing to count requests and explicitly skipping text/event-stream (SSE) endpoints. Tests and shared test exporter utilities are extended to validate distribution metrics and to assert no duration is recorded for streaming routes.

Reviewed by Cursor Bugbot for commit b6842d3. Bugbot is set up for automated code reviews on this repo. Configure here.

@keelerm84 keelerm84 requested a review from a team as a code owner April 8, 2026 19:20
}
requestDurationView *view.View = &view.View{ //nolint:gochecknoglobals
Measure: requestDurationMeasure,
Aggregation: view.Distribution(10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000, 500000, 1000000),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if different versions of relay have different distributions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different distributions will cause an individual metric to potentially be classified differently.

As is, a 23 microsecond response time gets bucketed into the second bucket. If we defined this differently, perhaps it would go into bucket 2 or 3.

This could lead to your P50/P90/etc showing a little differently. But as long as the thresholds aren't insanely different, it shouldn't be a big deal.

In v9 with OTEL, we move to dynamic, exponential distributions which is the new recommendation.

// Don't record duration for streaming responses — their lifetime is unbounded
if w.Header().Get("X-Accel-Buffering") != "no" {
metrics.RecordRequestDuration(ctx.Env.GetMetricsContext(), userAgent, sdkWrapper, route, req.Method, time.Since(start), measure)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a long term stable way to make this distinction? Does the application have the logical context to provide a parameter that says this is long lived, or perhaps something on the request reference?

A quick google search on that header seems to indicate this relates to NGINX. Seems a little off to make this decision based on a header respected by a certain popular proxy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relay always adds that header for streaming requests, which is why it's okay for us to check against it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this, but I do think "Content-Type: text/event-stream" is better. SSE will always have it, and it directly says that it is a stream. So even if you used it for non-sse we would still want to exclude it. (This is the header that has been used to fix the "wait for complete" logic in the various network debug tools.)

@keelerm84 keelerm84 requested a review from tanderson-ld April 9, 2026 14:35
@keelerm84 keelerm84 merged commit fb309fc into v8 Apr 9, 2026
16 of 17 checks passed
@keelerm84 keelerm84 deleted the mk/sdk-2144/request-duration branch April 9, 2026 17:03
keelerm84 pushed a commit that referenced this pull request Apr 9, 2026
🤖 I have created a release *beep* *boop*
---


##
[8.18.0](v8.17.8...v8.18.0)
(2026-04-09)


### Features

* Add request_duration metric for each endpoint
([#634](#634))
([fb309fc](fb309fc))


### Bug Fixes

* **deps:** bump supported Go versions to 1.26.2 and 1.25.9
([#633](#633))
([08bbc9e](08bbc9e))
* remove racy syncTimeCh length assertions in big segment sync tests
([#624](#624))
([2c43bda](2c43bda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants