chore: Improve metrics coverage #95

gonzalezzfelipe · 2025-12-10T17:39:07Z

Summary by CodeRabbit

New Features
- Added runtime observability: worker count gauge, per-request and per-chain duration histograms, per-worker chain timings, and latest block height/slot gauges.
Refactor
- Simplified example data parsing to streamline variant handling and reduce control-flow noise.
Chores
- Initialized and wired a dedicated metrics instance for the runtime to enable accurate reporting and lifecycle visibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T17:39:19Z

Walkthrough

Adds metrics (Gauges and Histograms) and timing instrumentation across the runtime: worker registration/removal, per-request and per-chain processing (including per-worker chain durations), and latest block height/slot; RuntimeBuilder now constructs and uses the dedicated Metrics instance.

Changes

Cohort / File(s)	Summary
Metrics foundation `balius-runtime/src/metrics.rs`	Adds Gauge and Histogram types and six new public metric fields on `Metrics` (`workers_loaded`, `handle_chain_duration_ms`, `handle_request_duration_ms`, `handle_worker_chain_duration_ms`, `latest_block_height`, `latest_block_slot`); initializes instruments with metadata and boundaries in `Metrics::new()` and adds public recording methods for each metric.
Runtime integration `balius-runtime/src/lib.rs`	Wires metrics into runtime lifecycle and handlers: updates `workers_loaded` on register/remove; records start times and reports durations for `handle_chain` (total and per-worker) and `handle_request`; updates latest block height/slot after commit; `RuntimeBuilder` uses the dedicated `Metrics` instance.
Example minor change `examples/asteria-tracker/src/lib.rs`	Replaces a `match` on `PlutusData` with an `if let Constr(...)` to simplify variant handling while preserving extraction and error behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Runtime
    participant Worker
    participant Metrics

    Client->>Runtime: send request
    Runtime->>Metrics: record request start (optional)
    Runtime->>Worker: dispatch request
    Worker-->>Runtime: respond
    Runtime->>Metrics: record handle_request_duration_ms(worker_id, method, duration)

sequenceDiagram
    autonumber
    participant Trigger
    participant Runtime
    participant WorkerA as Worker (per-worker)
    participant Commit
    participant Metrics

    Trigger->>Runtime: start handle_chain
    Runtime->>Metrics: note chain start time
    Runtime->>WorkerA: run worker chain step
    WorkerA-->>Runtime: step complete (duration)
    Runtime->>Metrics: record handle_worker_chain_duration_ms(worker_id, duration)
    Runtime->>Commit: commit chain
    Commit-->>Runtime: commit result, latest_height/slot
    Runtime->>Metrics: record handle_chain_duration_ms(total_duration)
    Runtime->>Metrics: set latest_block_height / latest_block_slot

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect balius-runtime/src/metrics.rs for correct meter metadata and histogram boundaries.
Verify metric label usage (worker_id, method) and unit conversions to milliseconds.
Confirm workers_loaded updates on all register/remove code paths and initial value from builder.
Ensure timing instrumentation doesn't change control flow or swallow errors.

Poem

A rabbit times each hop and chore 🐇⏱️
I count the workers, track each score
Chains and requests, durations I keep
Heights and slots — recorded deep
Hooray, observability — hop, tally, repeat 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore: Improve metrics coverage' clearly summarizes the main change: expanding metrics instrumentation and observability across the runtime. It directly reflects the primary objective of adding timing metrics and new metric fields.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

balius-runtime/src/metrics.rs (1)
106-131: Consider extracting histogram boundaries to a constant.

The same boundary values are duplicated across all three histograms. Extracting to a constant improves maintainability and ensures consistency if boundaries need adjustment.
+const DURATION_HISTOGRAM_BOUNDARIES_MS: [f64; 11] = [
+    1.0, 5.0, 10.0, 25.0, 50.0, 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0,
+];
+
Then use .with_boundaries(DURATION_HISTOGRAM_BOUNDARIES_MS.to_vec()) for each histogram.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2837757 and a631633.

📒 Files selected for processing (2)

balius-runtime/src/lib.rs (9 hunks)
balius-runtime/src/metrics.rs (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

balius-runtime/src/lib.rs (1)

balius-runtime/src/metrics.rs (1)

default (277-279)

balius-runtime/src/metrics.rs (5)

balius-runtime/src/lib.rs (1)

new (716-739)

balius-runtime/src/kv/mod.rs (1)

new (37-43)

balius-runtime/src/ledgers/mod.rs (1)

new (54-60)

balius-runtime/src/sign/mod.rs (1)

new (27-33)

balius-runtime/src/logging/mod.rs (1)

new (37-43)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: lint

🔇 Additional comments (10)

balius-runtime/src/metrics.rs (4)

1-5: LGTM!

The imports are correctly updated to include Gauge and Histogram types needed for the new metrics.

24-29: LGTM!

Appropriate metric types: Gauge for current state values (workers, block height/slot) and Histogram for duration distributions.

101-104: LGTM!

Gauge metrics are well-defined with clear descriptions for tracking current state.

Also applies to: 133-141

241-273: LGTM!

The new recording methods follow existing patterns. Label consistency between handle_request_duration_ms and the existing request method is good for correlation in dashboards.

balius-runtime/src/lib.rs (6)

6-6: LGTM!

Standard import for timing instrumentation.

572-585: LGTM!

Good pattern - recording the worker count while still holding the lock ensures accuracy.

615-631: Verify: Intentional idempotent removal?

remove_worker returns Ok(()) regardless of whether the worker existed. This provides idempotent semantics but differs from how handle_request handles missing workers (returns Error::WorkerNotFound). Confirm this is the intended design.

638-665: LGTM!

Good instrumentation pattern. Per-worker timing captures individual apply_chain latency, while overall timing captures the full operation including store operations. Block height/slot are recorded after successful commit, ensuring they reflect committed state.

674-694: LGTM!

Duration is correctly recorded after dispatch, covering both success and error cases. This provides complete latency visibility.

829-833: LGTM!

Clean initialization pattern. Setting workers_loaded(0) ensures the gauge is initialized to a known state rather than being undefined until the first worker registration.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/asteria-tracker/src/lib.rs (1)

90-95: Refactor is fine; consider hardening datum parsing to avoid panics on unexpected shapes

The match → if let change keeps behavior the same and is easier to read. The only concern here is the continued use of unwrap() on data coming from the chain: if the Constr has fewer fields than expected or different types, the worker will panic rather than just skipping that UTxO.

If you want this example to be more robust (especially if used beyond demos), you could keep the same structure but replace the unwrap()s with a guarded pattern + early return, e.g.:

-            if let plutus_data::PlutusData::Constr(x) = p {
-                let mut f = x.fields.iter();
-
-                pos_x = integer_plutus_field(f.next()).unwrap();
-                pos_y = integer_plutus_field(f.next()).unwrap();
-                asset_name = hex::encode(string_plutus_field(f.next()).unwrap());
-            }
+            if let plutus_data::PlutusData::Constr(x) = p {
+                let mut f = x.fields.iter();
+
+                let p1 = f.next();
+                let p2 = f.next();
+                let p3 = f.next();
+
+                if let (Some(px), Some(py), Some(name_bytes)) = (
+                    integer_plutus_field(p1),
+                    integer_plutus_field(p2),
+                    string_plutus_field(p3),
+                ) {
+                    pos_x = px;
+                    pos_y = py;
+                    asset_name = hex::encode(name_bytes);
+                } else {
+                    worker::logging::log(
+                        worker::logging::Level::Debug,
+                        "Skipping UTxO with unexpected datum shape",
+                        &out_ref,
+                    );
+                    return Ok(());
+                }
+            }

This keeps the new structure but turns malformed/unknown datums into a logged, non-fatal skip instead of a panic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a631633 and a860a8e.

📒 Files selected for processing (2)

balius-runtime/src/lib.rs (9 hunks)
examples/asteria-tracker/src/lib.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

balius-runtime/src/lib.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: lint

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

balius-runtime/src/metrics.rs (1)
250-258: Consider adding success attribute for consistency with request() method.

The existing request() method (line 166-175) includes a success attribute, but handle_request_duration_ms only captures worker and method. If analyzing duration by success/failure is useful, consider adding it:
-    pub fn handle_request_duration_ms(&self, worker_id: &str, method: &str, duration_ms: f64) {
+    pub fn handle_request_duration_ms(&self, worker_id: &str, method: &str, success: bool, duration_ms: f64) {
         self.handle_request_duration_ms.record(
             duration_ms,
             &[
                 KeyValue::new("worker", worker_id.to_owned()),
                 KeyValue::new("method", method.to_owned()),
+                KeyValue::new("success", success.to_string()),
             ],
         );
     }
Otherwise, if separating duration by outcome is not needed, the current implementation is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a860a8e and a5382f4.

📒 Files selected for processing (1)

balius-runtime/src/metrics.rs (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

balius-runtime/src/metrics.rs (4)

balius-runtime/src/kv/mod.rs (1)

new (37-43)

balius-runtime/src/logging/mod.rs (1)

new (37-43)

balius-runtime/src/ledgers/mod.rs (1)

new (54-60)

balius-runtime/src/sign/mod.rs (1)

new (27-33)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (4)

balius-runtime/src/metrics.rs (4)

1-5: LGTM!

The imports are correctly updated to include Gauge and Histogram types from OpenTelemetry, and all imported types are used in the implementation.

24-29: LGTM!

Good use of Gauge<u64> for point-in-time values (worker count, block height, slot) and Histogram<f64> for duration distributions.

157-162: LGTM!

Struct initialization correctly includes all new metric fields.

260-273: LGTM!

The gauge methods for block height/slot and the per-worker chain duration histogram follow established patterns and are correctly implemented.

coderabbitai · 2025-12-10T19:58:36Z

balius-runtime/src/metrics.rs

+            .with_boundaries(vec![
+                100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0,
+            ])


⚠️ Potential issue | 🟡 Minor

Suspicious histogram boundary value: 1200000.0 ms (20 minutes).

The last boundary 1200000.0 ms equals 20 minutes, which breaks the progression pattern (100ms → 60s). Was this intended to be 120000.0 (2 minutes) instead?

Also, consider extracting the duplicated boundaries to a constant to avoid inconsistencies:

+const DURATION_BOUNDARIES_MS: [f64; 10] = [ + 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 120000.0, +]; + // Then in new(): let handle_chain_duration_ms = meter .f64_histogram("handle_chain_duration_ms") .with_description("Duration to process handle_chain in milliseconds.") .with_unit("ms") - .with_boundaries(vec![ - 100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0, - ]) + .with_boundaries(DURATION_BOUNDARIES_MS.to_vec()) .build();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.with_boundaries(vec![

100.0, 250.0, 500.0, 1000.0, 2500.0, 5000.0, 10000.0, 25000.0, 60000.0, 1200000.0,

])

.with_boundaries(DURATION_BOUNDARIES_MS.to_vec())

🤖 Prompt for AI Agents

In balius-runtime/src/metrics.rs around lines 110 to 112, the last histogram boundary is likely wrong: 1200000.0 ms (20 minutes) breaks the progression and should be 120000.0 ms (2 minutes); update that value accordingly and factor the full vec of boundaries into a shared constant (e.g., HISTOGRAM_BOUNDARIES) used wherever these boundaries are defined to avoid future inconsistencies.

chore: Improve metrics coverage

a631633

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

gonzalezzfelipe added 2 commits December 10, 2025 16:51

Merge conflicts and clippy

a860a8e

Better bucket sizes

a5382f4

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

scarmuega approved these changes Dec 10, 2025

View reviewed changes

scarmuega merged commit 02fd30c into txpipe:main Dec 10, 2025
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Improve metrics coverage #95

chore: Improve metrics coverage #95

Uh oh!

gonzalezzfelipe commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: Improve metrics coverage #95

chore: Improve metrics coverage #95

Uh oh!

Conversation

gonzalezzfelipe commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gonzalezzfelipe commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading