Skip to content

Reduce tracing volume for read-heavy paths#164

Open
sjmiller609 wants to merge 2 commits intomainfrom
codex/trace-volume-control
Open

Reduce tracing volume for read-heavy paths#164
sjmiller609 wants to merge 2 commits intomainfrom
codex/trace-volume-control

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 24, 2026

Summary

  • sample successful root GET traces at 10% by default via OTEL config
  • keep sampled-out GET errors visible with a compact fallback HTTP span
  • suppress noisy vm info tracing on read-heavy status paths while keeping outer GetVMInfo error visibility

Testing

  • go test ./lib/otel ./lib/hypervisor ./lib/vmm ./cmd/api/config
  • go test ./lib/hypervisor/firecracker -run TestNonexistent -count=0
  • GOOS=darwin GOARCH=arm64 go test ./lib/hypervisor/vz -run TestNonexistent -count=0
  • go test ./cmd/api -run TestNonexistent -count=0

Note

Medium Risk
Medium risk because it changes OpenTelemetry sampling/span-creation behavior; misconfiguration or overly aggressive suppression could hide useful traces during debugging.

Overview
Reduces tracing volume by introducing configurable sampling for successful server-side HTTP GET requests via otel.successful_get_sample_ratio (default 0.1) and wiring it through config/env, validation, and otel.Init.

Suppresses high-frequency hypervisor HTTP trace spans for read-heavy GET / and GET /api/v1/vm.info calls across Firecracker, VZ, and Cloud Hypervisor clients, while adding an error-only detached hypervisor.get_vm_info span to keep failures visible even when normal GetVMInfo tracing is skipped.

Written by Cursor Bugbot for commit b7f515a. This will update automatically on new commits. Configure here.

@sjmiller609 sjmiller609 requested a review from hiroTamada March 24, 2026 20:54
@sjmiller609 sjmiller609 marked this pull request as ready for review March 24, 2026 20:54
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


func (s *successfulGETSampler) Description() string {
return fmt.Sprintf("ParentBased{successful_get_ratio=%.4f}", s.ratio)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampler Description produces misleading nested ParentBased output

Low Severity

successfulGETSampler.Description() returns "ParentBased{successful_get_ratio=...}", but the struct is not itself ParentBased — it's used as the root sampler inside sdktrace.ParentBased() in newSuccessfulGETSampler. The SDK's ParentBased builds its own description by embedding the root sampler's Description(), producing a confusing "ParentBased{root:ParentBased{successful_get_ratio=…}, …}" string in diagnostics. The inner description text is inaccurate and could mislead operators into thinking there is double wrapping.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant