Skip to content

feat(metrics): add 20 SRE-focused Prometheus metrics across all modules#12

Merged
Macbet merged 1 commit into
Macbet:mainfrom
p2pdkivenko:feat/sre-metrics
Mar 9, 2026
Merged

feat(metrics): add 20 SRE-focused Prometheus metrics across all modules#12
Macbet merged 1 commit into
Macbet:mainfrom
p2pdkivenko:feat/sre-metrics

Conversation

@p2pdkivenko

Copy link
Copy Markdown
Contributor

Summary

  • Add 20 new Prometheus metrics across all modules for comprehensive SRE observability
  • Instrument every layer: HTTP proxy, resolver, singleflight, cache, discovery, circuit breaker, Redis backend
  • Refactor metrics test module with shared test_metrics() helper to eliminate duplication

New Metrics

Metric Type Module Purpose
request_duration_seconds HistogramVec proxy Overall HTTP latency by method/type/status
response_bytes_total CounterVec proxy Response size tracking from Content-Length
rate_limit_rejected_total Counter main Rate limiter rejections
tags_resolve_duration_seconds HistogramVec proxy Tag list resolution latency
negative_cache_hits_total Counter resolver Image-not-found served from cache
cache_stale_serves_total Counter resolver Stale entries served during revalidation
singleflight_inflight Gauge resolver Active singleflight groups
singleflight_wait_duration_seconds Histogram resolver Follower wait time
upstream_connection_errors_total CounterVec resolver Connection failures by reason (timeout/connect/other)
blob_probe_total CounterVec proxy HEAD probe outcomes (found/not_found)
fanout_size Histogram resolver Projects per parallel fanout
discovery_errors_total Counter discovery Refresh failures
discovery_last_success_timestamp_seconds Gauge discovery Epoch of last successful refresh
discovery_duration_seconds Histogram discovery Refresh cycle duration
circuit_breaker_transitions_total CounterVec circuit_breaker State transitions (from→to per project)
redis_operations_total CounterVec cache Redis ops by type and result
redis_fallback_total Counter cache Local Moka fallback activations
redis_reconnections_total Counter cache Sentinel reconnection attempts
cache_entries GaugeVec resolver Current local cache size
build_info GaugeVec metrics Version + commit metadata

Changes by file

  • src/metrics.rs — All 20 metric definitions in struct + registrations. Test refactor with test_metrics() helper.
  • src/cache.rsentry_count() on CacheBackend trait. Redis operation/fallback/reconnection counters.
  • src/discovery.rs — Timing, error counting, last-success timestamp in refresh().
  • src/resolver.rs — Singleflight inflight/wait, negative cache hits, stale serves, fanout size, upstream connection errors, cache entry gauge.
  • src/proxy.rs — Request duration, response bytes, tags resolve duration, blob probe outcomes.
  • src/circuit_breaker.rsstate_name() helper + transition counter on all CAS-successful state changes.
  • src/main.rs — Rate limit rejection counter.

Verification

  • cargo check --all-features
  • cargo clippy --all-features -- -D warnings ✅ zero warnings
  • cargo test --all-features ✅ 108/108 pass
  • cargo fmt --check

Instrument all modules with comprehensive observability metrics so on-call
engineers can quickly diagnose production issues without guessing.

New metrics cover: request duration histograms, response bytes, rate limit
rejections, singleflight pressure, negative/stale cache hits, fanout size,
upstream connection errors, blob probe outcomes, discovery health, circuit
breaker transitions, Redis operation visibility, cache entry counts, and
build info. Also refactors metrics test module with a shared constructor.
@Macbet Macbet merged commit b2e171b into Macbet:main Mar 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants