feat(storage): App Centric Observability#14685
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a concurrency-safe Least Recently Used (LRU) cache (bucketMetadataCache) to store bucket resource names and locations, allowing GCS operations to dynamically populate trace spans with destination ID and location attributes. However, a critical performance bottleneck and memory leak were identified in the slice-based lruCache implementation, where linear scans on the hot path cause O(N) overhead and slice manipulation retains references to evicted keys. Replacing this with a doubly linked list and map implementation is recommended to achieve O(1) complexity and prevent memory leaks.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a concurrency-safe LRU cache to store bucket metadata (resource ID and location) and integrates it with OpenTelemetry tracing to automatically inject these details as span attributes across various storage operations. The feedback suggests several improvements: using safe comma-ok type assertions when retrieving values from context.Context to prevent potential panics, optimizing the gRPC GetBucketRequest by restricting the ReadMask to only fetch required fields instead of using a wildcard, and avoiding caching transient failures indefinitely as placeholders in the metadata cache.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a concurrency-safe LRU cache (bucketMetadataCache) to store bucket metadata (resource ID and location) and enrich OpenTelemetry trace spans with destination resource attributes. It updates various storage operations to use startSpanWithBucket and implements background fetching with singleflight deduplication, alongside opportunistic cache filling during synchronous attribute retrieval. The review feedback highlights two key improvements: evicting buckets from the cache on transient background fetch errors to prevent permanent cache poisoning, and adding a guard to avoid cache pollution and redundant fetches when the bucket name is empty.
| var entry bucketMetadata | ||
| if err != nil { | ||
| if isForbiddenOrPermissionError(err) { | ||
| entry = bucketMetadata{ | ||
| resource: fmt.Sprintf("projects/_/buckets/%s", bucket), | ||
| location: "global", | ||
| } | ||
| c.put(bucket, entry) | ||
| } | ||
| } else { | ||
| entry = resVal.(bucketMetadata) | ||
| c.put(bucket, entry) | ||
| } |
There was a problem hiding this comment.
If the background metadata fetch fails due to a transient error (such as a temporary network timeout or rate limit), the placeholder entry (projects/_/buckets/<b> and global) will remain in the cache indefinitely. This prevents subsequent requests from ever retrying the fetch, permanently poisoning the cache for that bucket.
To resolve this, evict the bucket from the cache when a non-permission error occurs so that future requests can attempt to fetch the metadata again.
var entry bucketMetadata
if err != nil {
if isForbiddenOrPermissionError(err) {
entry = bucketMetadata{
resource: fmt.Sprintf("projects/_/buckets/%s", bucket),
location: "global",
}
c.put(bucket, entry)
} else {
c.evict(bucket)
}
} else {
entry = resVal.(bucketMetadata)
c.put(bucket, entry)
}
No description provided.