Skip to content

Add lifecycle and hypervisor tracing spans#163

Merged
sjmiller609 merged 6 commits intomainfrom
codex/distributed-tracing-review
Mar 24, 2026
Merged

Add lifecycle and hypervisor tracing spans#163
sjmiller609 merged 6 commits intomainfrom
codex/distributed-tracing-review

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 24, 2026

Summary

  • add lifecycle trace spans for create, start, stop, standby, restore, snapshot, and fork flows
  • add traced hypervisor and VM starter wrappers plus HTTP and process-level spans at hypervisor boundaries
  • link async snapshot compression work back to the originating request span and add focused tracing tests

Testing

  • go test ./lib/hypervisor ./lib/hypervisor/cloudhypervisor ./lib/hypervisor/firecracker ./lib/hypervisor/qemu ./lib/vmm
  • go test ./lib/instances -run TestNonexistent -count=0
  • go test ./lib/providers -run TestNonexistent -count=0

Notes

  • This is intended to be an observability-only change.
  • The compile-only checks for lib/instances and lib/providers required temporary local placeholder embedded binaries because this worktree does not include the built guest-agent, init, and vz-shim artifacts.

Note

Medium Risk
Mostly observability-only, but it touches core instance lifecycle and hypervisor startup/shutdown paths where subtle context/span handling bugs could affect control flow or error propagation. Adds new wrappers around hypervisor clients/starters and extra retries/timeouts in tests, so regressions would show up mainly as altered behavior or flaky integration tests.

Overview
Adds end-to-end OpenTelemetry tracing across instance lifecycle operations (create/start/stop/standby/restore/snapshot/fork), replacing ad-hoc tracer usage with new instances helpers that create a top-level span plus step-level spans and propagate selected attributes (e.g. instance_id, hypervisor, snapshot_id).

Introduces a hypervisor tracing layer (WrapHypervisor, WrapVMStarter, and trace-attribute propagation) and applies it broadly: hypervisor client creation now returns a wrapped client; VM starters are wrapped at manager init; and Cloud Hypervisor/Firecracker/QEMU process starts, plus Firecracker/VZ/VMM HTTP calls, now emit spans with HTTP/process metadata.

Links async snapshot compression jobs back to request traces via span links, and adds focused tracing unit tests; also adjusts a few integration tests for stability (retrying create during auto-pull and loosening some timeouts).

Written by Cursor Bugbot for commit 5091e45. This will update automatically on new commits. Configure here.

@sjmiller609 sjmiller609 marked this pull request as ready for review March 24, 2026 15:50
@sjmiller609 sjmiller609 requested a review from hiroTamada March 24, 2026 16:56
Copy link
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean tracing implementation — decorator pattern for hypervisor spans is well-designed, lifecycle instrumentation is consistent, and context-based attribute propagation avoids signature changes. one minor nit.

@sjmiller609 sjmiller609 merged commit 4a24278 into main Mar 24, 2026
5 of 6 checks passed
@sjmiller609 sjmiller609 deleted the codex/distributed-tracing-review branch March 24, 2026 20:04
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}
baseAttrs = append(baseAttrs, attrs...)
return startTraceSpan(ctx, otel.Tracer(traceSubsystemForType(hvType)), name, baseAttrs...)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exported function only used within its own package

Low Severity

StartImplementationSpan is exported but its only caller is StartProcessSpan within the same package. No other package in the codebase references it. It can be unexported to reduce the public API surface.

Fix in Cursor Fix in Web

span.SetStatus(codes.Ok, "")
}
span.End()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical span-finishing logic duplicated across two packages

Low Severity

finishInstancesSpan is an exact copy of finishTraceSpan in lib/hypervisor/tracing.go, which is already publicly exported as hypervisor.FinishTraceSpan. Since lib/instances already imports lib/hypervisor, this helper could delegate to the existing exported function instead of duplicating the logic.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants