Skip to content

Add optional snapshot compression defaults and standby integration#149

Merged
sjmiller609 merged 34 commits intomainfrom
codex/snapshot-compression-defaults
Mar 23, 2026
Merged

Add optional snapshot compression defaults and standby integration#149
sjmiller609 merged 34 commits intomainfrom
codex/snapshot-compression-defaults

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 18, 2026

Summary

  • add optional snapshot compression config (zstd/lz4) to snapshot and instance APIs
  • add server-level and per-instance snapshot compression defaults with precedence: request override > instance policy > server default
  • add async snapshot memory compression jobs for standby and centrally stored standby snapshots
  • add restore-time snapshot memory preparation (cancel/wait active compression + on-demand decompression)
  • include compression metadata/state on snapshot resources

API / Config

  • POST /instances/{id}/standby accepts optional compression
  • POST /instances/{id}/snapshots accepts optional compression
  • POST /instances accepts optional snapshot_policy
  • config adds snapshot.compression_default (enabled, algorithm, level)

Validation and behavior

  • compression is standby-only (rejected for stopped snapshots)
  • zstd level validated (1-19)
  • lz4 rejects level input
  • standby validates compression policy before state transitions

Tests

  • go test -run '^$' ./cmd/api/api ./lib/instances ./lib/providers ./integration
  • go test ./lib/instances -run 'TestNormalizeCompressionConfig|TestResolveSnapshotCompressionPolicyPrecedence|TestValidateCreateRequestSnapshotPolicy|TestValidateCreateSnapshotRequestRejectsStoppedCompression'

Note

High Risk
High risk because it changes core instance lifecycle paths (standby/restore/fork/snapshot/delete) to manage asynchronous compression jobs and on-demand decompression, which can affect VM correctness, performance, and snapshot data integrity.

Overview
Adds optional snapshot compression (zstd or lz4) across the API and instance domain model: POST /instances can set snapshot_policy, POST /instances/{id}/standby and POST /instances/{id}/snapshots accept per-request compression overrides, and instance/snapshot responses now include compression policy + compression state/size metadata.

Implements asynchronous snapshot-memory compression for standby instances and centrally stored standby snapshots with configurable defaults (server snapshot.compression_default, per-instance policy, then request override). Foreground operations (restore, fork, create snapshot, delete) now cancel/wait in-flight compression and ensure raw memory is available (including on-demand decompression) before proceeding.

Updates plumbing and tests accordingly: instances.NewManager takes snapshot defaults, StandbyInstance takes a request object, adds OpenTelemetry metrics for compression jobs/fallbacks/preemptions, skips copying .zst.tmp/.lz4.tmp artifacts during forks, and introduces middleware to normalize empty standby POST bodies to {} for strict request decoding.

Written by Cursor Bugbot for commit 8638cce. This will update automatically on new commits. Configure here.

@github-actions
Copy link

github-actions bot commented Mar 19, 2026

✱ Stainless preview builds

This PR will update the hypeman SDKs with the following commit message.

feat: Add optional snapshot compression defaults and standby integration
hypeman-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

⚠️ hypeman-typescript studio · code

Your SDK build had at least one "error" diagnostic.
generate ❗build ❗lint ✅test ✅

hypeman-go studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅build ⏭️lint ✅test ✅

go get github.com/stainless-sdks/hypeman-go@b6d9ab3d0b4ffdd7d56153464387ed9e603d8d0a

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-03-23 17:17:55 UTC

Co-authored-by: Steven Miller <sjmiller609@gmail.com>
@sjmiller609 sjmiller609 marked this pull request as ready for review March 20, 2026 13:53
@sjmiller609 sjmiller609 removed the request for review from masnwilliams March 20, 2026 18:23
@sjmiller609 sjmiller609 requested review from masnwilliams and removed request for masnwilliams March 20, 2026 18:25
sjmiller609 and others added 3 commits March 21, 2026 00:05
The compression integration tests were using zstd level 19 and lz4 level 9,
which are very slow for compressing ~1GB memory files. After merging main
(which added more integration tests to lib/instances), the total package
test time exceeded the 20-minute CI timeout.

Reduce to level 3 for both zstd and lz4 high-level cases. The tests still
exercise the full compression/decompression pipeline across both algorithms
and multiple levels (1 and 3 for zstd, 0 and 3 for lz4).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Five compression cycles (each involving VM standby + compress + restore +
boot + exec readiness) consistently exceed the 20-minute CI timeout after
merging main. Reduce to three cycles: one in-flight zstd, one completed
zstd, and one completed lz4. This still exercises both algorithms and
both the in-flight/completed code paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… error handling

- Add descriptions to snapshot_policy and compression fields in openapi.yaml
  per Steven's review comments
- Check dst.Close() errors in runGoCompression and runGoDecompression to
  prevent silently corrupt snapshot files on delayed write failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@masnwilliams masnwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid feature — async compression design is clean, cancellation at every lifecycle point is thorough, and the standby body middleware is a nice backward-compat solution. main concern is the missing server-side validation for compression config (algorithm + level). see inline comments.

Copy link

@masnwilliams masnwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving but see comments above

@sjmiller609 sjmiller609 marked this pull request as draft March 23, 2026 15:05
…metadata errors

- Validate algorithm (zstd/lz4) and per-algorithm level ranges in
  toDomainSnapshotCompressionConfig instead of passing through unchecked
- Log metadata update errors in compression jobs instead of silently
  discarding them
- Normalize algorithm to lowercase in config struct after validation
- Fix misleading test name (OmitsLevel -> PreservesLevel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sjmiller609 sjmiller609 marked this pull request as ready for review March 23, 2026 15:10
@sjmiller609 sjmiller609 marked this pull request as draft March 23, 2026 15:30
@sjmiller609 sjmiller609 marked this pull request as ready for review March 23, 2026 15:42
@sjmiller609 sjmiller609 marked this pull request as draft March 23, 2026 15:48
@sjmiller609 sjmiller609 marked this pull request as ready for review March 23, 2026 15:50
@sjmiller609 sjmiller609 merged commit e43d9a2 into main Mar 23, 2026
5 of 6 checks passed
@sjmiller609 sjmiller609 deleted the codex/snapshot-compression-defaults branch March 23, 2026 17:10
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants