Summary
StackbiltCloudExporter buffers spans/metrics/logs internally and only POSTs to the ingest endpoint when the buffer reaches 100 items or 50KB. This threshold is rarely reached in a typical low-volume Worker before the isolate is evicted, so the buffered signals are silently dropped when the isolate dies. Callers calling tracer.flush() / metrics.flush() in ctx.waitUntil() believe they're forcing a real flush, but those methods only drain their own buffers into the exporter — they never trigger the POST.
Discovered while dogfooding on Stackbilt-dev/tarotscript (issue Stackbilt-dev/tarotscript#163).
Repro / evidence
tarotscript-worker was instrumented per the README pattern: root span per request in a middleware, obs.tracer.flush() + obs.metrics.flush() in waitUntil.
- Over ~14 hours of live traffic, the dashboard showed only 5 traces total, all from a single early burst. The dashboard and the underlying D1 agreed perfectly — the data just wasn't arriving.
- After patching the worker to call the underlying
exporter.flush() directly, traces immediately began flowing on every request (32 traces, last-seen 3m ago, real p50/p95/p99 populating within minutes).
Root cause
Two layers of buffering, and tracer.flush() only drains the first:
Layer 1 — Tracer.buffer in src/tracing.ts. Tracer.flush() snapshots the buffer and calls this.options.export.export(spans):
```ts
// src/tracing.ts:233
async flush(): Promise {
if (this.buffer.length === 0) return;
const spans = [...this.buffer];
this.buffer = [];
if (this.options.export) {
await this.options.export.export(spans); // <-- hands off to Layer 2
}
}
```
Layer 2 — StackbiltCloudExporter.spans in src/stackbilt-exporter.ts. export() pushes into its own buffer and calls maybeFlush(), which gates on a batch threshold:
```ts
// src/stackbilt-exporter.ts:100
async export(items: MetricPoint[] | TraceSpan[]): Promise {
if (items.length === 0) return;
if ('traceId' in items[0] && 'spanId' in items[0]) {
this.spans.push(...(items as TraceSpan[])); // <-- buffered
} else {
this.metrics.push(...(items as MetricPoint[]));
}
await this.maybeFlush(); // <-- gated, not forced
}
// src/stackbilt-exporter.ts:138
private async maybeFlush(): Promise {
if (Date.now() < this.backoffUntil) return;
const totalItems = this.metrics.length + this.spans.length + this.logs.length + this.alerts.length;
if (totalItems === 0) return;
if (totalItems < this.maxBatchSize) { // default 100
const bytes = this.estimateBytes();
if (bytes < this.maxBatchBytes) return; // default 50KB
}
await this.flush();
}
```
A typical tarotscript scaffold-cast request produces ~5 spans + ~3 metric points = ~8 items per request. Reaching 100 items requires ~12 concurrent-ish requests in the same isolate. Between bursts, Cloudflare evicts the idle isolate and the exporter buffer dies with it.
The public exporter.flush() at stackbilt-exporter.ts:129 is the method that actually POSTs, but it's not reachable from the return value of createMonitoring() — the exporter is only referenced internally by the tracer and metrics collector.
Why batching is the wrong default for Workers
Traditional exporters batch because network round-trips are expensive and long-running processes have time to fill a buffer between flushes. Workers invert both of those assumptions:
- Isolates are ephemeral. There is no "next request" guarantee — you get one shot to flush before eviction.
setInterval-based auto-flush doesn't work reliably (timers don't fire while the isolate is idle).
- Workers already amortize HTTP round-trips via subrequest budgets; one POST-per-request is fine for the volume this package targets.
- Cost is already bounded by the dashboard's per-worker cap (the 403 backoff path) — batching isn't needed for cost protection.
Proposed fixes
Option A (preferred) — remove exporter-level buffering for the Workers case. Have StackbiltCloudExporter.export() POST immediately. The Tracer and MetricsCollector already have their own buffers that batch within a single request, which is the right granularity for Workers. This makes the exporter stateless across requests, which also fixes the "buffer dies with the isolate" failure mode.
Option B — propagate flush() through the tracer. Have Tracer.flush() check if the exporter has a flush() method and call it after export(spans) returns. Same for MetricsCollector.flush(). This preserves the batching semantics for anyone who relies on them but makes "I called flush, my data is on the wire" actually true.
Option C — expose the exporter on the monitoring bundle. Add exporter: StackbiltCloudExporter | null to the createMonitoring() return value so callers can do await obs.exporter?.flush() in waitUntil. Lowest-risk change but pushes the workaround onto every consumer.
My vote is A — the Workers-native assumption is that isolates are short-lived and you flush per-request. B is the next-best if you want to keep batching as an opt-in for long-lived use cases.
Worker-side workaround (shipped in tarotscript today)
Until this is fixed upstream, consumers can reach into the tracer's private options field:
```ts
// worker/src/observability.ts
const base = createMonitoring({ ... });
const exporter =
((base.tracer as unknown as { options?: { export?: unknown } } | null)
?.options?.export as StackbiltCloudExporter | undefined) ?? null;
return { ...base, exporter };
// worker/src/index.ts (middleware)
c.executionCtx.waitUntil((async () => {
await Promise.allSettled([obs.tracer.flush(), obs.metrics.flush()]);
await obs.exporter?.flush();
})());
```
This is the pattern currently running in tarotscript-worker deployed version 9d53b2c3-08a2-415a-8d95-b252e6e6f610. It works but relies on a private field and should be considered a stopgap.
Impact on other dogfooders
Per Stackbilt-dev/tarotscript#163, stackbilt-web and edge-auth were instrumented before tarotscript. Their dashboards are also worth auditing — if they're medium-to-high traffic they may have been masking the problem by naturally hitting the 100-item threshold, but any low-volume Worker adopting this package will silently lose telemetry.
Acceptance
Summary
StackbiltCloudExporterbuffers spans/metrics/logs internally and only POSTs to the ingest endpoint when the buffer reaches 100 items or 50KB. This threshold is rarely reached in a typical low-volume Worker before the isolate is evicted, so the buffered signals are silently dropped when the isolate dies. Callers callingtracer.flush()/metrics.flush()inctx.waitUntil()believe they're forcing a real flush, but those methods only drain their own buffers into the exporter — they never trigger the POST.Discovered while dogfooding on
Stackbilt-dev/tarotscript(issue Stackbilt-dev/tarotscript#163).Repro / evidence
tarotscript-workerwas instrumented per the README pattern: root span per request in a middleware,obs.tracer.flush()+obs.metrics.flush()inwaitUntil.exporter.flush()directly, traces immediately began flowing on every request (32 traces, last-seen 3m ago, real p50/p95/p99 populating within minutes).Root cause
Two layers of buffering, and
tracer.flush()only drains the first:Layer 1 —
Tracer.bufferinsrc/tracing.ts.Tracer.flush()snapshots the buffer and callsthis.options.export.export(spans):```ts
// src/tracing.ts:233
async flush(): Promise {
if (this.buffer.length === 0) return;
const spans = [...this.buffer];
this.buffer = [];
if (this.options.export) {
await this.options.export.export(spans); // <-- hands off to Layer 2
}
}
```
Layer 2 —
StackbiltCloudExporter.spansinsrc/stackbilt-exporter.ts.export()pushes into its own buffer and callsmaybeFlush(), which gates on a batch threshold:```ts
// src/stackbilt-exporter.ts:100
async export(items: MetricPoint[] | TraceSpan[]): Promise {
if (items.length === 0) return;
if ('traceId' in items[0] && 'spanId' in items[0]) {
this.spans.push(...(items as TraceSpan[])); // <-- buffered
} else {
this.metrics.push(...(items as MetricPoint[]));
}
await this.maybeFlush(); // <-- gated, not forced
}
// src/stackbilt-exporter.ts:138
private async maybeFlush(): Promise {
if (Date.now() < this.backoffUntil) return;
const totalItems = this.metrics.length + this.spans.length + this.logs.length + this.alerts.length;
if (totalItems === 0) return;
if (totalItems < this.maxBatchSize) { // default 100
const bytes = this.estimateBytes();
if (bytes < this.maxBatchBytes) return; // default 50KB
}
await this.flush();
}
```
A typical tarotscript scaffold-cast request produces ~5 spans + ~3 metric points = ~8 items per request. Reaching 100 items requires ~12 concurrent-ish requests in the same isolate. Between bursts, Cloudflare evicts the idle isolate and the exporter buffer dies with it.
The public
exporter.flush()atstackbilt-exporter.ts:129is the method that actually POSTs, but it's not reachable from the return value ofcreateMonitoring()— the exporter is only referenced internally by the tracer and metrics collector.Why batching is the wrong default for Workers
Traditional exporters batch because network round-trips are expensive and long-running processes have time to fill a buffer between flushes. Workers invert both of those assumptions:
setInterval-based auto-flush doesn't work reliably (timers don't fire while the isolate is idle).Proposed fixes
Option A (preferred) — remove exporter-level buffering for the Workers case. Have
StackbiltCloudExporter.export()POST immediately. TheTracerandMetricsCollectoralready have their own buffers that batch within a single request, which is the right granularity for Workers. This makes the exporter stateless across requests, which also fixes the "buffer dies with the isolate" failure mode.Option B — propagate
flush()through the tracer. HaveTracer.flush()check if the exporter has aflush()method and call it afterexport(spans)returns. Same forMetricsCollector.flush(). This preserves the batching semantics for anyone who relies on them but makes "I called flush, my data is on the wire" actually true.Option C — expose the exporter on the monitoring bundle. Add
exporter: StackbiltCloudExporter | nullto thecreateMonitoring()return value so callers can doawait obs.exporter?.flush()inwaitUntil. Lowest-risk change but pushes the workaround onto every consumer.My vote is A — the Workers-native assumption is that isolates are short-lived and you flush per-request. B is the next-best if you want to keep batching as an opt-in for long-lived use cases.
Worker-side workaround (shipped in tarotscript today)
Until this is fixed upstream, consumers can reach into the tracer's private options field:
```ts
// worker/src/observability.ts
const base = createMonitoring({ ... });
const exporter =
((base.tracer as unknown as { options?: { export?: unknown } } | null)
?.options?.export as StackbiltCloudExporter | undefined) ?? null;
return { ...base, exporter };
// worker/src/index.ts (middleware)
c.executionCtx.waitUntil((async () => {
await Promise.allSettled([obs.tracer.flush(), obs.metrics.flush()]);
await obs.exporter?.flush();
})());
```
This is the pattern currently running in
tarotscript-workerdeployed version9d53b2c3-08a2-415a-8d95-b252e6e6f610. It works but relies on a private field and should be considered a stopgap.Impact on other dogfooders
Per Stackbilt-dev/tarotscript#163,
stackbilt-webandedge-authwere instrumented before tarotscript. Their dashboards are also worth auditing — if they're medium-to-high traffic they may have been masking the problem by naturally hitting the 100-item threshold, but any low-volume Worker adopting this package will silently lose telemetry.Acceptance
tracer.flush()inwaitUntilresults in data landing at the ingest endpoint within the request lifetime