Add lifecycle and hypervisor tracing spans#163
Conversation
hiroTamada
left a comment
There was a problem hiding this comment.
clean tracing implementation — decorator pattern for hypervisor spans is well-designed, lifecycle instrumentation is consistent, and context-based attribute propagation avoids signature changes. one minor nit.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } | ||
| baseAttrs = append(baseAttrs, attrs...) | ||
| return startTraceSpan(ctx, otel.Tracer(traceSubsystemForType(hvType)), name, baseAttrs...) | ||
| } |
There was a problem hiding this comment.
| span.SetStatus(codes.Ok, "") | ||
| } | ||
| span.End() | ||
| } |
There was a problem hiding this comment.
Identical span-finishing logic duplicated across two packages
Low Severity
finishInstancesSpan is an exact copy of finishTraceSpan in lib/hypervisor/tracing.go, which is already publicly exported as hypervisor.FinishTraceSpan. Since lib/instances already imports lib/hypervisor, this helper could delegate to the existing exported function instead of duplicating the logic.


Summary
Testing
go test ./lib/hypervisor ./lib/hypervisor/cloudhypervisor ./lib/hypervisor/firecracker ./lib/hypervisor/qemu ./lib/vmmgo test ./lib/instances -run TestNonexistent -count=0go test ./lib/providers -run TestNonexistent -count=0Notes
lib/instancesandlib/providersrequired temporary local placeholder embedded binaries because this worktree does not include the built guest-agent, init, and vz-shim artifacts.Note
Medium Risk
Mostly observability-only, but it touches core instance lifecycle and hypervisor startup/shutdown paths where subtle context/span handling bugs could affect control flow or error propagation. Adds new wrappers around hypervisor clients/starters and extra retries/timeouts in tests, so regressions would show up mainly as altered behavior or flaky integration tests.
Overview
Adds end-to-end OpenTelemetry tracing across instance lifecycle operations (create/start/stop/standby/restore/snapshot/fork), replacing ad-hoc tracer usage with new
instanceshelpers that create a top-level span plus step-level spans and propagate selected attributes (e.g.instance_id,hypervisor,snapshot_id).Introduces a
hypervisortracing layer (WrapHypervisor,WrapVMStarter, and trace-attribute propagation) and applies it broadly: hypervisor client creation now returns a wrapped client; VM starters are wrapped at manager init; and Cloud Hypervisor/Firecracker/QEMU process starts, plus Firecracker/VZ/VMM HTTP calls, now emit spans with HTTP/process metadata.Links async snapshot compression jobs back to request traces via span links, and adds focused tracing unit tests; also adjusts a few integration tests for stability (retrying create during auto-pull and loosening some timeouts).
Written by Cursor Bugbot for commit 5091e45. This will update automatically on new commits. Configure here.