Skip to content

Stackbilt-dev/worker-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

@stackbilt/worker-observability

Edge-native observability for Cloudflare Workers. A complete monitoring stack extracted from a production orchestration platform: health checks, structured logging, metrics collection, distributed tracing, SLI/SLO monitoring, alerting, and HTML dashboard generation.

Zero production dependencies. Designed for the Workers runtime.

Install

npm install @stackbilt/worker-observability

Quick Start

import { createMonitoring, createMonitoringMiddleware } from '@stackbilt/worker-observability';

// One-line setup with all modules
const monitoring = createMonitoring({
  service: 'my-worker',
  version: '1.0.0',
  environment: 'production',
  analyticsEngine: env.ANALYTICS,  // optional: Cloudflare Analytics Engine binding
  enableTracing: true,
  enableSLOs: true,
  enableAlerting: true,
});

// Use individual components
monitoring.logger.info('Worker started');
monitoring.metrics.increment('requests.total');
monitoring.errorTracker.track(new Error('something broke'));

Modules

Health Checks

Register custom checks and dependency probes. Built-in checks for D1 databases, Durable Objects, and service bindings.

import { HealthChecker, commonChecks, createHealthEndpoint } from '@stackbilt/worker-observability';

const checker = new HealthChecker({
  service: 'my-worker',
  version: '1.0.0',
  startTime: Date.now(),
});

// Built-in D1 health check
checker.register('database', commonChecks.database(env.DB));

// Built-in Durable Object health check
checker.register('counter-do', commonChecks.durableObject(env.COUNTER, 'health'));

// Built-in service binding health check
checker.register('auth-service', commonChecks.serviceBinding(env.AUTH, 'auth'));

// Custom check
checker.register('cache', async () => ({
  name: 'cache',
  status: 'healthy',
  timestamp: Date.now(),
}));

// Register external dependencies
checker.registerDependency({
  name: 'stripe-api',
  type: 'api',
  url: 'https://api.stripe.com/v1',
  timeout: 3000,
  critical: true,
});

// Create a /health endpoint handler
const healthHandler = createHealthEndpoint(checker);

Health Aggregator

Aggregate health across multiple services with weighted scoring.

import { HealthAggregator } from '@stackbilt/worker-observability';

const aggregator = new HealthAggregator();
aggregator.registerService('api', apiChecker, 1.0);
aggregator.registerService('worker', workerChecker, 0.5);

const health = await aggregator.getAggregatedHealth();
// { status: 'healthy', score: 0.92, services: [...], summary: { healthy: 2, ... } }

const readiness = await aggregator.getReadinessScore();
// { ready: true, score: 0.92, healthyServices: ['api', 'worker'], ... }

Structured Logging

JSON-structured logging with log levels, child loggers, and multiple output targets.

import { createLogger, JSONOutput, ConsoleOutput, CloudflareAnalyticsOutput } from '@stackbilt/worker-observability';

const logger = createLogger({
  service: 'my-worker',
  minLevel: 'info',
  output: new JSONOutput(),  // or ConsoleOutput, CloudflareAnalyticsOutput
});

logger.info('Request received', { path: '/api/users' });
logger.error('Query failed', new Error('timeout'), { table: 'users' });

// Child loggers inherit context
const reqLogger = logger.child({ requestId: 'abc-123', userId: 'user-1' });
reqLogger.info('Processing request');

Metrics Collection

Counters, gauges, histograms, and timers with pluggable exporters.

import { MetricsCollector, ConsoleExporter, CloudflareAnalyticsExporter, DatadogMetricsExporter } from '@stackbilt/worker-observability';

const metrics = new MetricsCollector({
  service: 'my-worker',
  flushInterval: 30000,
  export: new ConsoleExporter(),
  // or: new CloudflareAnalyticsExporter(env.ANALYTICS, 'metrics')
  // or: new DatadogMetricsExporter('dd-api-key')
});

metrics.increment('requests.total', 1, { method: 'GET' });
metrics.gauge('connections.active', 42);
metrics.histogram('request.duration', 150, 'milliseconds', { path: '/api' });

// Timer helper
const end = metrics.startTimer('db.query', { table: 'users' });
await db.query('SELECT ...');
end();

// Measure async operations
const result = await metrics.measure('api.call', () => fetch('https://api.example.com'));

// Get summary
const summary = metrics.getSummary({ start: Date.now() - 60000, end: Date.now() });

Business Metrics

Pre-built metric patterns for workflow tracking, agent usage, cost monitoring, and SLA compliance.

import { BusinessMetrics } from '@stackbilt/worker-observability';

const biz = new BusinessMetrics(metrics);
biz.workflowCreated('tenant-1', 'data-pipeline', 'wf-123');
biz.agentTokensUsed('tenant-1', 'gpt-4', 'openai', 1500);
biz.computeCost('tenant-1', 'inference', 0.003);

Distributed Tracing

W3C Trace Context propagation with span-based tracing.

import { Tracer, ConsoleTraceExporter, trace } from '@stackbilt/worker-observability';

const tracer = new Tracer({
  service: 'my-worker',
  sampling: 1.0,
  export: new ConsoleTraceExporter(),
});

// Manual spans
const span = tracer.startTrace('handle-request', { 'http.method': 'GET' });
span.addEvent('cache-miss');
span.setAttributes({ 'db.rows': 42 });
span.end();

// Trace helper for async operations
const result = await trace(tracer, 'db-query', async (span) => {
  span.setAttributes({ 'db.statement': 'SELECT ...' });
  return await db.query('SELECT ...');
});

// Context propagation
const parentCtx = tracer.extract(request.headers);
const childSpan = tracer.startSpan('downstream-call', { parent: parentCtx });
tracer.inject(childSpan.getContext(), outgoingHeaders);

Error Tracking

In-memory error aggregation, circuit breaker, and exponential backoff retry.

import { InMemoryErrorTracker, CircuitBreaker, ExponentialBackoffStrategy, withErrorTracking } from '@stackbilt/worker-observability';

const tracker = new InMemoryErrorTracker();
tracker.track(new Error('connection reset'), { requestId: 'abc-123' });

const stats = tracker.getErrorStats();
// { total: 1, byType: { Error: 1 }, recentRate: 1 }

const recent = tracker.getRecentErrors(10);

// Circuit breaker
const breaker = new CircuitBreaker(5, 60000, 2);
const result = await breaker.execute(() => fetch('https://api.example.com'));
console.log(breaker.getState()); // 'closed' | 'open' | 'half-open'

// Retry with backoff
const strategy = new ExponentialBackoffStrategy(3, 1000, 30000);
if (strategy.shouldRecover(error)) {
  await strategy.recover(error);
}

// Wrap operations with automatic tracking
const data = await withErrorTracking(tracker, () => riskyOperation());

SLI/SLO Monitoring

Service Level Indicators and Objectives with error budget tracking.

import { SLIMonitor, AvailabilitySLI, LatencySLI, ErrorRateSLI, createStandardSLOs } from '@stackbilt/worker-observability';

const monitor = new SLIMonitor(metrics);

// Register SLI calculators
monitor.registerSLI(new AvailabilitySLI(metrics, 'my-worker'));
monitor.registerSLI(new LatencySLI(metrics, 'my-worker', 500));
monitor.registerSLI(new ErrorRateSLI(metrics, 'my-worker', 1));

// Register standard SLOs (99.9% availability, 95% latency, etc.)
createStandardSLOs('my-worker').forEach(slo => monitor.registerSLO(slo));

// Check SLO status
const status = await monitor.getSLOStatus();
for (const [id, sloStatus] of status) {
  console.log(`${id}: ${sloStatus.status} (${sloStatus.current}% / ${sloStatus.target}%)`);
  console.log(`  Error budget remaining: ${sloStatus.errorBudget.remaining}%`);
}

Alert Manager

Threshold-based alerting with webhook, Slack, email, and PagerDuty channels.

import { AlertManager, createStandardAlerts } from '@stackbilt/worker-observability';

const alertManager = new AlertManager(metrics, {
  service: 'my-worker',
  environment: 'production',
});

// Register standard alert rules
createStandardAlerts('my-worker').forEach(rule => alertManager.registerRule(rule));

// Register notification channels
alertManager.registerChannel({
  id: 'slack',
  name: 'Ops Slack',
  type: 'slack',
  config: { webhook_url: 'https://hooks.slack.com/...' },
  enabled: true,
});

// Get current incidents
const firing = alertManager.getFiringIncidents();

Redaction

Defense-in-depth redaction of sensitive fields in logs, traces, and metric payloads. Enabled by default with a deny-list of common credential and PII field name patterns so consumers can't accidentally exfiltrate secrets by forgetting to configure it.

The default deny-list covers:

  • HTTP credential headers: authorization, cookie, set-cookie, proxy-authorization, x-api-key, x-auth-token, x-csrf-token
  • Credential substrings: any field name containing token, secret, password, passwd, api_key/apiKey/api-key, bearer, credential, private_key, signing_key, pwd
  • Session identifiers: exact session, session_id/sessionId, session_token/sessionToken
  • Auth exact match: auth (does NOT match author, authentication_method, etc.)
  • PII email fields (broad substring match): any field name containing email — catches user_email, recipient_email, from_email, customer_email, contact_email, emailAddress, etc. Paired with a keep-list that lets known-safe variants pass through: email_verified, email_template_id, email_template, email_domain, email_provider, email_enabled, email_count, emails_sent, email_preferences, email_settings (and the emailVerified camelCase)
  • Client IP addresses (GDPR personal data under Article 4(1)): client_ip/clientIp/client-ip, remote_addr/remote_ip/remoteAddr/remoteIp, x-forwarded-for, x-real-ip, cf-connecting-ip, true-client-ip (Cloudflare Spectrum/Akamai), user_ip/userIp

Matching is case-insensitive. Only field values are replaced — field names remain visible so payload structure is preserved. The keep-list uses an override semantic: if a field name matches a keep pattern, it's never redacted, even if it also matches a deny pattern. This is how the broad /email/i can coexist with non-PII metadata like email_verified.

Known limitation: value-level matching is not in scope

The redactor operates on field names, not values. If your application interpolates secrets into free-form log messages, those secrets will not be caught. For example:

// ❌ Leaks — the message string is not inspected
logger.info(`Validated token ${token}`);

// ✅ Safe — `token` is a separate field and matches the deny-list
logger.info('Validated token', { token });

Use structured logging (separate message + metadata fields) so the redactor can actually see and mask sensitive values. Value-based redaction (regex-matching the contents of log messages for credit card numbers, UUIDs that look like tokens, etc.) is a significantly larger architectural question and is deliberately out of scope for this release.

import { Redactor, createRedactor } from '@stackbilt/worker-observability';

// Default: all common credential/PII fields redacted
const r = new Redactor();
r.redact({ headers: { authorization: 'Bearer abc' } });
// → { headers: { authorization: '[REDACTED]' } }

// Broad email match catches common variants
r.redact({ user_email: 'a@b.co', recipient_email: 'c@d.co' });
// → { user_email: '[REDACTED]', recipient_email: '[REDACTED]' }

// Keep-list lets known-safe non-PII variants through
r.redact({ email_verified: true, email_template_id: 'welcome' });
// → { email_verified: true, email_template_id: 'welcome' }

// Client IPs redacted by default (GDPR)
r.redact({ client_ip: '1.2.3.4', 'cf-connecting-ip': '5.6.7.8' });
// → { client_ip: '[REDACTED]', 'cf-connecting-ip': '[REDACTED]' }

// Nested traversal + array walking
r.redact({ users: [{ email: 'a@b.co', name: 'Alice' }] });
// → { users: [{ email: '[REDACTED]', name: 'Alice' }] }

// Extend the deny-list with service-specific sensitive fields
const custom = new Redactor({
  patterns: [/^x-stripe-signature$/i, /customer[-_]?id/i],
  fieldNames: ['X-Internal-Audit-Token'],
});

// Extend the keep-list to let additional non-PII variants through
const withKeep = new Redactor({
  keepPatterns: [/^email_campaign$/i, /^email_events$/i],
});

// Custom replacement marker
const masked = new Redactor({ replacement: '***' });

// Opt out entirely (NOT recommended in production)
const off = new Redactor({ enabled: false });

// Use only custom patterns (no defaults, no keep-list)
const onlyMine = new Redactor({
  useDefaults: false,
  patterns: [/^mysecret$/],
});

Integration with createMonitoring

When you use the createMonitoring factory, a single Redactor instance is built once and propagated to the Logger, Tracer, and StackbiltCloudExporter — so every signal pathway (log entry, span attribute, span event, metric tag, HTTPS egress payload) gets the same redaction policy.

const monitoring = createMonitoring({
  service: 'my-worker',
  version: '1.0.0',
  stackbilt: { token: env.STACKBILT_OBSERVE_TOKEN },
  redaction: {
    // Extend the defaults with worker-specific sensitive fields
    patterns: [/^x-stripe-signature$/i],
    fieldNames: ['oauth_state', 'csrf_nonce'],
  },
});

Defense-in-depth: StackbiltCloudExporter applies a final pass

Even if you forget to pass a Redactor to Logger/Tracer, the StackbiltCloudExporter runs a final redaction pass on the full outbound payload just before the HTTPS POST. Layered safety:

Logger.log() ─┐
Tracer.recordSpan() ─┼─ Redactor (per-signal)
Metrics tags ─┘
                    ↓
          StackbiltCloudExporter.flush()
                    ↓
          Redactor (final egress pass) ← belt-and-suspenders
                    ↓
          POST stackbilder.com/api/observe/ingest

Per the OSS policy's "validate at boundaries" rule, redaction is enforced at both the signal-generation boundary and the egress boundary.

HTML Dashboard

Generate a self-contained HTML monitoring dashboard.

import { createHTMLDashboard, DashboardAggregator, createDashboardEndpoint } from '@stackbilt/worker-observability';

// Static HTML page
const html = createHTMLDashboard();

// Or use the aggregator for live data
const aggregator = new DashboardAggregator([
  { name: 'api', healthEndpoint: 'https://api.example.com/health' },
  { name: 'worker', healthEndpoint: 'https://worker.example.com/health' },
], errorTracker);

const dashboardHandler = createDashboardEndpoint(aggregator, {
  cacheDuration: 30000,
  requireAuth: true,
});

Middleware

The package includes middleware compatible with Hono and similar frameworks.

import { metricsMiddleware, tracingMiddleware, requestLogger } from '@stackbilt/worker-observability';

// Apply to Hono app
app.use('*', metricsMiddleware(metrics));
app.use('*', tracingMiddleware(tracer));

// Or use the convenience function
import { createMonitoringMiddleware } from '@stackbilt/worker-observability';
const middlewares = createMonitoringMiddleware({ logger, metrics, tracer });

Cloudflare Analytics Engine

Several exporters support Cloudflare Analytics Engine for zero-cost metric storage:

  • CloudflareAnalyticsOutput for logs
  • CloudflareAnalyticsExporter for metrics
  • CloudflareTraceExporter for traces

Bind an Analytics Engine dataset in your wrangler.toml:

[[analytics_engine_datasets]]
binding = "ANALYTICS"

Then pass it to the monitoring setup:

const monitoring = createMonitoring({
  service: 'my-worker',
  version: '1.0.0',
  analyticsEngine: env.ANALYTICS,
});

License

Apache-2.0

Built by Stackbilt.

About

Edge-native observability for Cloudflare Workers. Health checks, structured logging, metrics, tracing, SLI/SLO monitoring, and alerting.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors