feat(bigtable): metrics refactor#14693
Conversation
…on, location scoping, and attempt latency calculations
There was a problem hiding this comment.
Code Review
This pull request refactors the Bigtable client's metrics tracing by passing the metrics tracer via context instead of explicit function arguments, and introduces gRPC unary and stream interceptors to handle metrics collection. Feedback highlights two key issues: first, streaming RPCs that exit early may fail to record attempt metrics because RecvMsg is not called, which can be resolved by monitoring context cancellation; second, the client blocking latency is incorrectly recorded as zero because recordAttemptClientBlockingLatency is called immediately after the start time is set.
| wrapped := &metricsWrappedClientStream{ | ||
| ClientStream: clientStream, | ||
| mt: mt, | ||
| headerMD: &headerMD, | ||
| trailerMD: &trailerMD, | ||
| } | ||
| return wrapped, nil |
There was a problem hiding this comment.
For streaming RPCs (such as ReadRows or ExecuteQuery), the caller may exit the read loop early (e.g., when a row limit is reached or the query is aborted). In these scenarios, RecvMsg is never called again to receive io.EOF or another error. Consequently, recordAttemptCompletionWithMetadata is never invoked, and the attempt metrics for these successful early-exit streams are lost.
To ensure metrics are always recorded, we should also monitor the stream's context cancellation to trigger completion recording.
wrapped := &metricsWrappedClientStream{
ClientStream: clientStream,
mt: mt,
headerMD: &headerMD,
trailerMD: &trailerMD,
}
go func() {
<-clientStream.Context().Done()
if atomic.CompareAndSwapInt32(&wrapped.completed, 0, 1) {
wrapped.mt.recordAttemptCompletionWithMetadata(*wrapped.headerMD, *wrapped.trailerMD, clientStream.Context().Err())
}
}()
return wrapped, nil| func (mt *builtinMetricsTracer) recordAttemptClientBlockingLatency() { | ||
| if !mt.builtInEnabled { | ||
| return | ||
| } | ||
| startTime := mt.currOp.currAttempt.startTime | ||
| if !startTime.IsZero() { | ||
| mt.currOp.currAttempt.clientBlockingLatency = convertToMs(time.Since(startTime)) | ||
| } | ||
| } |
There was a problem hiding this comment.
The recordAttemptClientBlockingLatency method calculates time.Since(startTime). However, in both metricsUnaryClientInterceptor and metricsStreamClientInterceptor, recordAttemptClientBlockingLatency() is called immediately after recordAttemptStart(). Since recordAttemptStart() sets startTime to time.Now(), the elapsed time will always be essentially zero.
This makes the recorded client blocking latency always zero unless overwritten by blockingLatencyTracker in recordAttemptCompletionWithMetadata. Consider revising this logic or removing the redundant immediate calls in the interceptors.
No description provided.