feat(metrics): add 20 SRE-focused Prometheus metrics across all modules#12
Merged
Conversation
Instrument all modules with comprehensive observability metrics so on-call engineers can quickly diagnose production issues without guessing. New metrics cover: request duration histograms, response bytes, rate limit rejections, singleflight pressure, negative/stale cache hits, fanout size, upstream connection errors, blob probe outcomes, discovery health, circuit breaker transitions, Redis operation visibility, cache entry counts, and build info. Also refactors metrics test module with a shared constructor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_metrics()helper to eliminate duplicationNew Metrics
request_duration_secondsresponse_bytes_totalrate_limit_rejected_totaltags_resolve_duration_secondsnegative_cache_hits_totalcache_stale_serves_totalsingleflight_inflightsingleflight_wait_duration_secondsupstream_connection_errors_totalblob_probe_totalfanout_sizediscovery_errors_totaldiscovery_last_success_timestamp_secondsdiscovery_duration_secondscircuit_breaker_transitions_totalredis_operations_totalredis_fallback_totalredis_reconnections_totalcache_entriesbuild_infoChanges by file
src/metrics.rs— All 20 metric definitions in struct + registrations. Test refactor withtest_metrics()helper.src/cache.rs—entry_count()onCacheBackendtrait. Redis operation/fallback/reconnection counters.src/discovery.rs— Timing, error counting, last-success timestamp inrefresh().src/resolver.rs— Singleflight inflight/wait, negative cache hits, stale serves, fanout size, upstream connection errors, cache entry gauge.src/proxy.rs— Request duration, response bytes, tags resolve duration, blob probe outcomes.src/circuit_breaker.rs—state_name()helper + transition counter on all CAS-successful state changes.src/main.rs— Rate limit rejection counter.Verification
cargo check --all-features✅cargo clippy --all-features -- -D warnings✅ zero warningscargo test --all-features✅ 108/108 passcargo fmt --check✅