Skip to content

feat(observability): add escrow indexer metrics and health#254

Open
Amas-01 wants to merge 1 commit into
Liquifact:mainfrom
Amas-01:feature/escrow-indexer-observability
Open

feat(observability): add escrow indexer metrics and health#254
Amas-01 wants to merge 1 commit into
Liquifact:mainfrom
Amas-01:feature/escrow-indexer-observability

Conversation

@Amas-01
Copy link
Copy Markdown

@Amas-01 Amas-01 commented May 29, 2026

Closes #207

Summary

Implementation Complete: Escrow Indexer Observability (#207)

I have successfully implemented comprehensive Prometheus metrics and health-check observability for the escrow indexer job. Here is a breakdown of what was delivered:

Files Modified/Created

  • metrics.js - Registered 4 new metrics:

    • escrow_indexer_events_processed_total (Counter)
    • escrow_indexer_events_skipped_total (Counter)
    • escrow_indexer_cycle_failures_total (Counter)
    • escrow_indexer_last_cursor_advance_timestamp_seconds (Gauge)
  • escrowIndexer.js - Integrated metric emission:

    • Imports all 4 metrics.
    • Emits counters per cycle with input validation.
    • Tracks the last cursor advance timestamp.
    • Increments the failure counter on exceptions.
  • health.js - Added staleness detection:

    • New checkIndexerStaleness() function.
    • Gated on ESCROW_INDEXER_ENABLED='true'.
    • Computes elapsed seconds since the last advance.
    • Returns "healthy", "stale", "disabled", or "error".
    • Treats an unset gauge (startup state) as healthy.
  • index.js - Extended configuration schema:

    • Added ESCROW_INDEXER_ENABLED (enum: 'true' or 'false').
    • Added ESCROW_INDEXER_STALE_THRESHOLD_SECONDS (default: 300).
  • .env.example - Added environment variable documentation:

    • Included ESCROW_INDEXER_STALE_THRESHOLD_SECONDS=300 with an explanatory comment.
  • escrowIndexer.metrics.test.js - Comprehensive 13-test suite:

    • 6 metric increment tests (processed, skipped, failures, cursor advance).
    • 7 health check tests (staleness detection, threshold configurability, feature flags).
    • 3 security validation tests (invalid input handling, string feature flags).
    • 3 integration tests (/metrics, /ready endpoints).
    • 3 vacuousness checks (confirm test conditions are correctly implemented).
  • escrow-indexing-strategy.md - Expanded documentation:

    • Metric descriptions and semantics.
    • Staleness check behavior and configuration.
    • Prometheus query examples.
    • Configuration reference table.
    • Security considerations.
    • Monitoring and alerting recommendations.

Key Design Decisions

  • Library: Used prom-client v15.1.3 (already installed).
  • Gauge Strategy: Stores the Unix timestamp of the last cursor advance (not elapsed seconds) so the health check computes staleness dynamically.
  • Startup Behavior: The gauge is left unset (undefined) and treated as healthy to prevent false positives on a fresh start.
  • Feature Flag: Uses strict equality checking: config.ESCROW_INDEXER_ENABLED === "true" (avoids truthy bugs).
  • Health Check Pattern: Follows the existing KYC provider pattern (register unconditionally, return status).
  • Input Validation: Counters validate processed and skipped as non-negative integers before incrementing.

Security & Compliance

  • No PII in metric labels — dimensions only (job name, status).
  • Input validation before metric increment — invalid counts trigger the failure counter.
  • Strict feature flag checking=== "true" only.
  • Config module centralization — no direct process.env access for secrets.
  • No secrets committed.env.example used only.
  • Robust Testing — Tests cover 13 scenarios, including vacuousness checks.

@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented May 29, 2026

@Amas-01 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add escrow indexer observability: lag, processed/skipped counters, and cursor-staleness health check

1 participant