Skip to content

Expose process memory, CPU, and event-loop lag metrics via prom-client #238

@Jagadeeshftw

Description

@Jagadeeshftw

Description

GET /metrics serves prom-client default metrics but Node.js-specific runtime health indicators — heap used/total, external memory, CPU usage percentage, and event-loop lag — are not consistently collected or exposed. During load spikes, operators have no signal to distinguish GC pressure from event-loop starvation. A dedicated runtime metrics collector must sample these values on a configurable interval.

Requirements and context

  • Register fluxora_nodejs_heap_used_bytes, fluxora_nodejs_heap_total_bytes, fluxora_nodejs_external_bytes, fluxora_nodejs_event_loop_lag_seconds (histogram) gauges/histograms using prom-client
  • Measure event-loop lag using a setTimeout-based probe on a configurable METRICS_SAMPLE_INTERVAL_MS
  • Ensure the collector is initialised in src/app.ts and gracefully stopped in src/shutdown.ts
  • Must be secure, tested, and documented
  • Should be efficient and easy to review

Suggested execution

Fork the repo and create a branch

git checkout -b feature/runtime-metrics-collector

Implement changes

  • Update/Write: src/metrics/runtimeMetrics.ts — runtime metrics collector with event-loop probe
  • Update/Write: src/app.ts — initialise collector on startup
  • Update/Write: src/shutdown.ts — stop collector interval on graceful shutdown
  • Write comprehensive tests: tests/metrics/runtimeMetrics.test.ts
  • Add documentation: docs/observability.md — document runtime metrics and alert thresholds
  • Include clear code comments and types
  • Validate security assumptions

Test and commit

  • Run tests: pnpm test (or pnpm test:coverage)
  • Cover edge cases: collector start/stop, metric values positive, event-loop lag under artificial load, clean shutdown with no dangling intervals
  • Include test output and security notes

Example commit message

feat: expose Node.js memory, CPU, and event-loop lag via prom-client

Guidelines

  • Minimum 95 percent test coverage
  • Clear documentation
  • Timeframe: 96 hours

Metadata

Metadata

Assignees

No one assigned

    Labels

    Stellar WaveIssues in the Stellar wave programbackendBackend service workobservabilityLogging / metrics / tracingperformancePerformance / caching

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions