Description
GET /metrics serves prom-client default metrics but Node.js-specific runtime health indicators — heap used/total, external memory, CPU usage percentage, and event-loop lag — are not consistently collected or exposed. During load spikes, operators have no signal to distinguish GC pressure from event-loop starvation. A dedicated runtime metrics collector must sample these values on a configurable interval.
Requirements and context
- Register
fluxora_nodejs_heap_used_bytes, fluxora_nodejs_heap_total_bytes, fluxora_nodejs_external_bytes, fluxora_nodejs_event_loop_lag_seconds (histogram) gauges/histograms using prom-client
- Measure event-loop lag using a
setTimeout-based probe on a configurable METRICS_SAMPLE_INTERVAL_MS
- Ensure the collector is initialised in
src/app.ts and gracefully stopped in src/shutdown.ts
- Must be secure, tested, and documented
- Should be efficient and easy to review
Suggested execution
Fork the repo and create a branch
git checkout -b feature/runtime-metrics-collector
Implement changes
- Update/Write:
src/metrics/runtimeMetrics.ts — runtime metrics collector with event-loop probe
- Update/Write:
src/app.ts — initialise collector on startup
- Update/Write:
src/shutdown.ts — stop collector interval on graceful shutdown
- Write comprehensive tests:
tests/metrics/runtimeMetrics.test.ts
- Add documentation:
docs/observability.md — document runtime metrics and alert thresholds
- Include clear code comments and types
- Validate security assumptions
Test and commit
- Run tests:
pnpm test (or pnpm test:coverage)
- Cover edge cases: collector start/stop, metric values positive, event-loop lag under artificial load, clean shutdown with no dangling intervals
- Include test output and security notes
Example commit message
feat: expose Node.js memory, CPU, and event-loop lag via prom-client
Guidelines
- Minimum 95 percent test coverage
- Clear documentation
- Timeframe: 96 hours
Description
GET /metricsserves prom-client default metrics but Node.js-specific runtime health indicators — heap used/total, external memory, CPU usage percentage, and event-loop lag — are not consistently collected or exposed. During load spikes, operators have no signal to distinguish GC pressure from event-loop starvation. A dedicated runtime metrics collector must sample these values on a configurable interval.Requirements and context
fluxora_nodejs_heap_used_bytes,fluxora_nodejs_heap_total_bytes,fluxora_nodejs_external_bytes,fluxora_nodejs_event_loop_lag_seconds(histogram) gauges/histograms using prom-clientsetTimeout-based probe on a configurableMETRICS_SAMPLE_INTERVAL_MSsrc/app.tsand gracefully stopped insrc/shutdown.tsSuggested execution
Fork the repo and create a branch
Implement changes
src/metrics/runtimeMetrics.ts— runtime metrics collector with event-loop probesrc/app.ts— initialise collector on startupsrc/shutdown.ts— stop collector interval on graceful shutdowntests/metrics/runtimeMetrics.test.tsdocs/observability.md— document runtime metrics and alert thresholdsTest and commit
pnpm test(orpnpm test:coverage)Example commit message
Guidelines