Skip to content

feat(storage): App Centric Observability#14685

Open
krishnamd-jkp wants to merge 5 commits into
googleapis:mainfrom
krishnamd-jkp:aco
Open

feat(storage): App Centric Observability#14685
krishnamd-jkp wants to merge 5 commits into
googleapis:mainfrom
krishnamd-jkp:aco

Conversation

@krishnamd-jkp
Copy link
Copy Markdown
Contributor

No description provided.

@krishnamd-jkp krishnamd-jkp requested review from a team as code owners May 29, 2026 07:20
@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label May 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a concurrency-safe Least Recently Used (LRU) cache (bucketMetadataCache) to store bucket resource names and locations, allowing GCS operations to dynamically populate trace spans with destination ID and location attributes. However, a critical performance bottleneck and memory leak were identified in the slice-based lruCache implementation, where linear scans on the hot path cause O(N) overhead and slice manipulation retains references to evicted keys. Replacing this with a doubly linked list and map implementation is recommended to achieve O(1) complexity and prevent memory leaks.

Comment thread storage/lru.go Outdated
@krishnamd-jkp
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a concurrency-safe LRU cache to store bucket metadata (resource ID and location) and integrates it with OpenTelemetry tracing to automatically inject these details as span attributes across various storage operations. The feedback suggests several improvements: using safe comma-ok type assertions when retrieving values from context.Context to prevent potential panics, optimizing the gRPC GetBucketRequest by restricting the ReadMask to only fetch required fields instead of using a wildcard, and avoiding caching transient failures indefinitely as placeholders in the metadata cache.

Comment thread storage/trace.go Outdated
Comment thread storage/grpc_client.go
Comment thread storage/bucket_metadata_cache.go Outdated
@krishnamd-jkp
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a concurrency-safe LRU cache (bucketMetadataCache) to store bucket metadata (resource ID and location) and enrich OpenTelemetry trace spans with destination resource attributes. It updates various storage operations to use startSpanWithBucket and implements background fetching with singleflight deduplication, alongside opportunistic cache filling during synchronous attribute retrieval. The review feedback highlights two key improvements: evicting buckets from the cache on transient background fetch errors to prevent permanent cache poisoning, and adding a guard to avoid cache pollution and redundant fetches when the bucket name is empty.

Comment on lines +109 to +121
var entry bucketMetadata
if err != nil {
if isForbiddenOrPermissionError(err) {
entry = bucketMetadata{
resource: fmt.Sprintf("projects/_/buckets/%s", bucket),
location: "global",
}
c.put(bucket, entry)
}
} else {
entry = resVal.(bucketMetadata)
c.put(bucket, entry)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the background metadata fetch fails due to a transient error (such as a temporary network timeout or rate limit), the placeholder entry (projects/_/buckets/<b> and global) will remain in the cache indefinitely. This prevents subsequent requests from ever retrying the fetch, permanently poisoning the cache for that bucket.

To resolve this, evict the bucket from the cache when a non-permission error occurs so that future requests can attempt to fetch the metadata again.

		var entry bucketMetadata
		if err != nil {
			if isForbiddenOrPermissionError(err) {
				entry = bucketMetadata{
					resource: fmt.Sprintf("projects/_/buckets/%s", bucket),
					location: "global",
				}
				c.put(bucket, entry)
			} else {
				c.evict(bucket)
			}
		} else {
			entry = resVal.(bucketMetadata)
			c.put(bucket, entry)
		}

Comment thread storage/trace.go Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant