-
Notifications
You must be signed in to change notification settings - Fork 6
chore: Improve metrics coverage #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds metrics (Gauges and Histograms) and timing instrumentation across the runtime: worker registration/removal, per-request and per-chain processing (including per-worker chain durations), and latest block height/slot; RuntimeBuilder now constructs and uses the dedicated Metrics instance. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Runtime
participant Worker
participant Metrics
Client->>Runtime: send request
Runtime->>Metrics: record request start (optional)
Runtime->>Worker: dispatch request
Worker-->>Runtime: respond
Runtime->>Metrics: record handle_request_duration_ms(worker_id, method, duration)
sequenceDiagram
autonumber
participant Trigger
participant Runtime
participant WorkerA as Worker (per-worker)
participant Commit
participant Metrics
Trigger->>Runtime: start handle_chain
Runtime->>Metrics: note chain start time
Runtime->>WorkerA: run worker chain step
WorkerA-->>Runtime: step complete (duration)
Runtime->>Metrics: record handle_worker_chain_duration_ms(worker_id, duration)
Runtime->>Commit: commit chain
Commit-->>Runtime: commit result, latest_height/slot
Runtime->>Metrics: record handle_chain_duration_ms(total_duration)
Runtime->>Metrics: set latest_block_height / latest_block_slot
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
balius-runtime/src/metrics.rs (1)
106-131: Consider extracting histogram boundaries to a constant.The same boundary values are duplicated across all three histograms. Extracting to a constant improves maintainability and ensures consistency if boundaries need adjustment.
+const DURATION_HISTOGRAM_BOUNDARIES_MS: [f64; 11] = [ + 1.0, 5.0, 10.0, 25.0, 50.0, 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, +]; +Then use
.with_boundaries(DURATION_HISTOGRAM_BOUNDARIES_MS.to_vec())for each histogram.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
balius-runtime/src/lib.rs(9 hunks)balius-runtime/src/metrics.rs(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
balius-runtime/src/lib.rs (1)
balius-runtime/src/metrics.rs (1)
default(277-279)
balius-runtime/src/metrics.rs (5)
balius-runtime/src/lib.rs (1)
new(716-739)balius-runtime/src/kv/mod.rs (1)
new(37-43)balius-runtime/src/ledgers/mod.rs (1)
new(54-60)balius-runtime/src/sign/mod.rs (1)
new(27-33)balius-runtime/src/logging/mod.rs (1)
new(37-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: lint
🔇 Additional comments (10)
balius-runtime/src/metrics.rs (4)
1-5: LGTM!The imports are correctly updated to include
GaugeandHistogramtypes needed for the new metrics.
24-29: LGTM!Appropriate metric types:
Gaugefor current state values (workers, block height/slot) andHistogramfor duration distributions.
101-104: LGTM!Gauge metrics are well-defined with clear descriptions for tracking current state.
Also applies to: 133-141
241-273: LGTM!The new recording methods follow existing patterns. Label consistency between
handle_request_duration_msand the existingrequestmethod is good for correlation in dashboards.balius-runtime/src/lib.rs (6)
6-6: LGTM!Standard import for timing instrumentation.
572-585: LGTM!Good pattern - recording the worker count while still holding the lock ensures accuracy.
615-631: Verify: Intentional idempotent removal?
remove_workerreturnsOk(())regardless of whether the worker existed. This provides idempotent semantics but differs from howhandle_requesthandles missing workers (returnsError::WorkerNotFound). Confirm this is the intended design.
638-665: LGTM!Good instrumentation pattern. Per-worker timing captures individual
apply_chainlatency, while overall timing captures the full operation including store operations. Block height/slot are recorded after successful commit, ensuring they reflect committed state.
674-694: LGTM!Duration is correctly recorded after dispatch, covering both success and error cases. This provides complete latency visibility.
829-833: LGTM!Clean initialization pattern. Setting
workers_loaded(0)ensures the gauge is initialized to a known state rather than being undefined until the first worker registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/asteria-tracker/src/lib.rs (1)
90-95: Refactor is fine; consider hardening datum parsing to avoid panics on unexpected shapesThe
match→if letchange keeps behavior the same and is easier to read. The only concern here is the continued use ofunwrap()on data coming from the chain: if theConstrhas fewer fields than expected or different types, the worker will panic rather than just skipping that UTxO.If you want this example to be more robust (especially if used beyond demos), you could keep the same structure but replace the
unwrap()s with a guarded pattern + early return, e.g.:- if let plutus_data::PlutusData::Constr(x) = p { - let mut f = x.fields.iter(); - - pos_x = integer_plutus_field(f.next()).unwrap(); - pos_y = integer_plutus_field(f.next()).unwrap(); - asset_name = hex::encode(string_plutus_field(f.next()).unwrap()); - } + if let plutus_data::PlutusData::Constr(x) = p { + let mut f = x.fields.iter(); + + let p1 = f.next(); + let p2 = f.next(); + let p3 = f.next(); + + if let (Some(px), Some(py), Some(name_bytes)) = ( + integer_plutus_field(p1), + integer_plutus_field(p2), + string_plutus_field(p3), + ) { + pos_x = px; + pos_y = py; + asset_name = hex::encode(name_bytes); + } else { + worker::logging::log( + worker::logging::Level::Debug, + "Skipping UTxO with unexpected datum shape", + &out_ref, + ); + return Ok(()); + } + }This keeps the new structure but turns malformed/unknown datums into a logged, non-fatal skip instead of a panic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
balius-runtime/src/lib.rs(9 hunks)examples/asteria-tracker/src/lib.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- balius-runtime/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: lint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
balius-runtime/src/metrics.rs (1)
250-258: Consider addingsuccessattribute for consistency withrequest()method.The existing
request()method (line 166-175) includes asuccessattribute, buthandle_request_duration_msonly capturesworkerandmethod. If analyzing duration by success/failure is useful, consider adding it:- pub fn handle_request_duration_ms(&self, worker_id: &str, method: &str, duration_ms: f64) { + pub fn handle_request_duration_ms(&self, worker_id: &str, method: &str, success: bool, duration_ms: f64) { self.handle_request_duration_ms.record( duration_ms, &[ KeyValue::new("worker", worker_id.to_owned()), KeyValue::new("method", method.to_owned()), + KeyValue::new("success", success.to_string()), ], ); }Otherwise, if separating duration by outcome is not needed, the current implementation is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
balius-runtime/src/metrics.rs(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
balius-runtime/src/metrics.rs (4)
balius-runtime/src/kv/mod.rs (1)
new(37-43)balius-runtime/src/logging/mod.rs (1)
new(37-43)balius-runtime/src/ledgers/mod.rs (1)
new(54-60)balius-runtime/src/sign/mod.rs (1)
new(27-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
balius-runtime/src/metrics.rs (4)
1-5: LGTM!The imports are correctly updated to include
GaugeandHistogramtypes from OpenTelemetry, and all imported types are used in the implementation.
24-29: LGTM!Good use of
Gauge<u64>for point-in-time values (worker count, block height, slot) andHistogram<f64>for duration distributions.
157-162: LGTM!Struct initialization correctly includes all new metric fields.
260-273: LGTM!The gauge methods for block height/slot and the per-worker chain duration histogram follow established patterns and are correctly implemented.
| .with_boundaries(vec![ | ||
| 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0, | ||
| ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious histogram boundary value: 1200000.0 ms (20 minutes).
The last boundary 1200000.0 ms equals 20 minutes, which breaks the progression pattern (100ms → 60s). Was this intended to be 120000.0 (2 minutes) instead?
Also, consider extracting the duplicated boundaries to a constant to avoid inconsistencies:
+const DURATION_BOUNDARIES_MS: [f64; 10] = [
+ 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 120000.0,
+];
+
// Then in new():
let handle_chain_duration_ms = meter
.f64_histogram("handle_chain_duration_ms")
.with_description("Duration to process handle_chain in milliseconds.")
.with_unit("ms")
- .with_boundaries(vec![
- 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0,
- ])
+ .with_boundaries(DURATION_BOUNDARIES_MS.to_vec())
.build();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .with_boundaries(vec![ | |
| 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0, | |
| ]) | |
| .with_boundaries(DURATION_BOUNDARIES_MS.to_vec()) |
🤖 Prompt for AI Agents
In balius-runtime/src/metrics.rs around lines 110 to 112, the last histogram
boundary is likely wrong: 1200000.0 ms (20 minutes) breaks the progression and
should be 120000.0 ms (2 minutes); update that value accordingly and factor the
full vec of boundaries into a shared constant (e.g., HISTOGRAM_BOUNDARIES) used
wherever these boundaries are defined to avoid future inconsistencies.
Summary by CodeRabbit
New Features
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.