Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 41 additions & 13 deletions docs/design/kubevirt-datamover.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,26 @@ Taking a VolumeSnapshot and then using kopia to process the entire volume and co
- Note that unlike the built-in datamover, this DataUpload is tied to a VM (which could include multiple PVCs) rather than to a single PVC.
- An annotation will be added to the DataUpload identifying the VirtualMachine we're backing up.
- Add `velerov1api.DataUploadNameAnnotation` to VirtualMachine
- Add `velerov1api.PVCNamespaceNameLabel` annotation to VirtualMachine (doesn't need to be a label, since we're just using it to figure out what label selector to use for the ConfigMap on restore).
- OperationID will be created and returned similar to what's done with the CSI PVC plugin, and the async operation Progress method will report on progress based on the DU status (similar to CSI PVC plugin)
Comment on lines +62 to 63
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for PVCNamespaceNameLabel definition in Velero API references

# Search for PVCNamespaceNameLabel in go files
rg -n "PVCNamespaceNameLabel" --type=go

# Also check if it's imported from velero
rg -n "velerov1api\." --type=go -A2 -B2 | rg -i "pvc.*label"

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Search for similar label/annotation constants in the codebase
rg "Label\s*=" --type=go | head -20

# Search for velerov1api imports to understand what's available
rg "velerov1api\." --type=go -B5 | head -40

# Check pkg/common/common.go for existing constants
fd "common.go" --type=f | head -5

Repository: openshift/oadp-operator

Length of output: 1220


🏁 Script executed:

# Read pkg/common/common.go to see existing constants
cat -n pkg/common/common.go

Repository: openshift/oadp-operator

Length of output: 15794


🏁 Script executed:

# Search for how other Velero annotations are referenced and used
rg "velerov1\." --type=go | grep -i "annotation\|label" | head -20

Repository: openshift/oadp-operator

Length of output: 586


Clarify and verify the PVCNamespaceNameLabel annotation.

The constant velerov1api.PVCNamespaceNameLabel is referenced in the design document but does not exist anywhere in the codebase or Velero v1 API references found here. Additionally, the annotation name includes "Label" while the comment states it "doesn't need to be a label"—this contradiction is confusing.

Before implementation, clarify:

  1. Whether PVCNamespaceNameLabel needs to be added to Velero upstream or defined locally in pkg/common/common.go
  2. Why the name includes "Label" if it's an annotation, not a label
  3. The precise purpose and structure of this annotation for ConfigMap label selector resolution
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/kubevirt-datamover.md` around lines 62 - 63, The document
references a non-existent constant velerov1api.PVCNamespaceNameLabel and mixes
“label” vs “annotation”; update the design to either (a) import/add a constant
in Velero upstream only if agreed with maintainers, or (b) define it locally in
pkg/common/common.go (e.g., PVCNamespaceNameAnnotation) and use that symbol in
the doc; rename the identifier to remove “Label” if it will be an annotation,
update the VirtualMachine annotation description to state its exact key/name,
expected value format (e.g., "<namespace>/<name>" or just "<namespace>"), and
how it is used to compute the ConfigMap label selector during restore so the
implementation (OperationID/async progress/DU status) can rely on a concrete
annotation key and value shape.

- PVC BIA plugin
- Add `kubevirt-datamover-vm` annotation to PVC with the `VirtualMachine` name to signal to RIA that we need to remove `VolumeName` and set `Selector.MatchLabels` on PVC.
- VirtualMachine RIA plugin
- (further investigation still needed on exactly what's needed here)
- Similar in functionality to csi PVC restore action
- Create temporary PVC (needed, or is this handled by controller?)
- Create DD based on DU annotation and DU ConfigMap (this may be handled already by existing DU RIA)
- Create DD based on DU annotation and DU ConfigMap
- Need to confirm that VM resource has the PVC name annotation added by the BIA plugin
- PVC RIA plugin
- If PVC has `kubevirt-datamover-vm` annotation, need to do the following:
- set spec.VolumeName to ""
- set selector with MatchLabels to match PV that will be created by restore controller
- VirtualMachineBackup/VirtualMachineBackupTracker RIA plugin
- Simple RIA that discards VMB/VMBT resources on restore
- We don't want to restore these because they would kick off another VMBackup action.

### Kubevirt Datamover Controller
- Responsible for reconciling DataUploads/DataDownloads where `Spec.DataMover` is "kubevirt"
- Configurable concurrency limits: concurrent-vm-backups and concurrent-vm-datauploads
- We need the `qemu-img` binary built into the controller image.
- DataUpload reconciler (backup):
- create the (temporary) PVC.
- identify the VirtualMachine from the PVC metadata.
Expand All @@ -87,18 +94,36 @@ Taking a VolumeSnapshot and then using kopia to process the entire volume and co
- We need to properly handle cases where we attempt an incremental backup but a full backup is taken instead (checkpoint lost, CSI snapshot restore since last checkpoint, VM restart, etc.)
- Aborted backups also need to be handled (resulting in a failed PVC backup on the Velero side)
- DataDownload reconciler (restore)
- (this area is less well defined so far, since the kubevirt enhancement doesn't go into as much detail on restore)
- We will need a temporary PVC for pulling qcow2 images from object store (if we're restoring the empty temp PVC from backup, that might work here)
- We also need PVCs created for each VM disk we're restoring from qcow2 images.
- Identify the VM from the DD.
- Pull BSL metadata for the VM and backup
- Create the temporary PVC to download the qcow2 files onto.
- PV here is also temporary
- PVC size based on the size of the qcow2 files in BSL needed for restore as well as the PVC sizes
- For each PVC, calculate the sum of all qcow2 files added to the PVC size, and then add 10% as a buffer. If there are multiple PVCs, take the max value, as we can process one PVC at a time, so we don't need to hold files for all PVCs on the temp disk at the same time.
Comment on lines +101 to +102
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the 10% buffer sizing calculation.

The sizing calculation adds a 10% buffer to the sum of qcow2 files and PVC sizes. This buffer percentage appears arbitrary and may be insufficient in edge cases, such as:

  • Highly compressed qcow2 files that expand significantly during conversion
  • Temporary intermediate files created during the qemu-img operations
  • Filesystem overhead on the temporary PVC

Consider whether:

  1. The 10% buffer has been empirically validated with real-world VM backup scenarios
  2. A larger or dynamic buffer (e.g., 15-20%) would be safer
  3. The controller should detect and handle "insufficient space" errors by automatically retrying with a larger PVC

Would you like me to research typical qcow2 expansion ratios or generate logic to handle PVC resize scenarios?

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 101-101: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 1

(MD005, list-indent)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/kubevirt-datamover.md` around lines 101 - 102, The doc's PVC
sizing logic currently hardcodes a 10% buffer when computing temp PVC size from
qcow2 sizes and PVC sizes; update the implementation (the PVC sizing calculation
/ any function named like calculate_buffered_size or tempPVCSize computation in
the datamover/controller) to make the buffer percentage configurable and
increase the default to a safer value (e.g., 15–20%), add validation tests or
notes documenting empirical bounds, and implement runtime handling for
insufficient-space errors by detecting qemu-img or filesystem ENOSPC failures
and retrying the operation after increasing the PVC size (or failing with a
clear actionable error). Ensure the change is reflected in docs by replacing the
hardcoded "10% buffer" text with the new configurable/default value and a note
about dynamic retry behavior.

- Create temporary PVCs for each PVC in the VM (identified from BSL metadata).
- These need to be mounted as block mode volumes.
- PV will be bound to workload PVCs after restore, similar to velero datamover.
- To facilitate PV reattachment, we need a similar approach to the upstream velero exposer logic:
- The `DynamicPVRestoreLabel` needs to be set on the restore PV
- Generic restore exposer reads the selector back from the target PVC: In <https://github.com/vmware-tanzu/velero/blob/main/pkg/exposer/generic_restore.go#L443-L449>, `RebindVolume()` extracts `targetPVC.Spec.Selector.MatchLabels` and passes it to `ResetPVBinding()`.
- `ResetPVBinding()` copies the labels onto the PV: In <https://github.com/vmware-tanzu/velero/blob/main/pkg/util/kube/pvc_pv.go#L226-L270>, the labels (including `DynamicPVRestoreLabel`) are copied from the PVC selector to the PV's labels, and ClaimRef is reset so Kubernetes can bind them.
- Size based on the `pvcSizes` metadata in the BSL.
- We'll need to create another datamover pod here which will do the following:
- pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating.
- pod running command/image will first get the list of qcow2 files to pull from object storage
- once we have the qcow2 full backup and incremental files from object store, repeatedly call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order
- Then, convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img`
- Finally, write this image to the PVC which contains the VM disk
- (repeat process for each disk if VM has multiple disks to restore)
- The pod permissions will need to be the same as we have for velero datamover (run as root, selinux config etc.)
- The pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating.
- The pod running command/image will first get the list of qcow2 files to pull from object storage
- Process one PVC at a time:
- Download all required qcow2 files for this PVC.
- Call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order
- FIXME: Should we use the unsafe `-u` flag? kubevirt docs suggested this, but we should confirm that this is still appropriate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve the FIXME about the unsafe -u flag before implementation.

The FIXME comment indicates uncertainty about whether to use the unsafe -u flag with qemu-img rebase. This is a critical decision point that affects data integrity during restore operations.

The unsafe flag bypasses backing file format consistency checks. Using it incorrectly could result in:

  • Corrupted restored VM disks
  • Silent data loss if the backing chain is inconsistent
  • Difficult-to-debug restore failures

This design decision must be resolved before implementation proceeds. Consult with KubeVirt subject matter experts or review the latest KubeVirt incremental backup documentation to confirm the recommended approach.

Would you like me to search the KubeVirt enhancement proposal and related issues to find guidance on this flag usage?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/kubevirt-datamover.md` at line 118, The FIXME in
docs/design/kubevirt-datamover.md about using the unsafe `-u` flag for qemu-img
rebase must be resolved: decide whether to use `-u` (and under what conditions)
by reviewing current KubeVirt incremental backup docs and consulting KubeVirt
SME/PRs, then replace the FIXME with a definitive guidance sentence that states
the chosen approach, the exact qemu-img invocation pattern, and the safety
rationale (when `-u` is allowed vs forbidden) so implementers of the datamover
know whether to include `-u` in the rebase step and why.

- Delete the incremental qcow2 files, leaving the rebased full backup.
- Convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img`
- Delete the qcow2 file
- Finally, write this image to the PVC which contains the appropriate PV:
- `dd if=/path/to/image.raw of=/dev/target_pvc_block_device bs=4M status=progress`
- Delete the raw file
- Note that the various `qemu-img` actions might eventually be combined into a single kubevirt API call, but for the moment this would need to be done manually.
- Once datamover pod has restored the VM disks, it will exit and the VirtualMachine can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content.
- Once datamover pod has restored the VM disks, it will exit, the temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the cleanup logic for temporary resources.

The statement "temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV" is unclear about the distinction between:

  1. Temporary PVCs used for qcow2 download (created at line 99)
  2. Temporary PVCs for each VM PVC (created at line 103)
  3. The "qcow2 staging PV" mentioned here

Please clarify:

  • Which specific PVCs/PVs are deleted vs. retained
  • Whether "qcow2 staging PV" refers to the PV backing the temp PVC from line 99
  • The lifecycle of all temporary resources created during restore

This ambiguity could lead to resource leaks or incorrect cleanup during implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/kubevirt-datamover.md` at line 126, Clarify cleanup by explicitly
naming the temporary resources and their lifecycle: state that the "qcow2
download PVC" (the temp PVC created for the qcow2 download/staging) and its
backing PV (referred to as the "qcow2 staging PV") are deleted after the
datamover pod finishes and the qcow2 image is converted; state that each "per-VM
temp PVC" (the temp PVCs created for each VM PVC) are also deleted after their
corresponding VM disk has been copied to the target PVC and the PV bind is
completed; specify which PVs are retained (the final restored PVs bound to the
target PVCs) versus which PVs are removed (the qcow2 staging PV and any PVs
exclusively backing the temp PVCs), and add a concise ordered lifecycle sequence
(create qcow2 download PVC -> create per-VM temp PVCs -> datamover
copies/creates final PVCs -> datamover exits -> delete per-VM temp PVCs and
qcow2 download PVC -> leave restored PVs bound to final PVCs).


### Kubevirt datamover backup data/metadata

Expand Down Expand Up @@ -133,6 +158,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json):
"vmBackup": "vmb-001",
"files": ["vmb-001-disk0.qcow2", "vmb-001-disk1.qcow2"],
"pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"],
"pvcSizes": ["10Gi", "20Gi"],
"referencedBy": ["backup-2025-01-10", "backup-2025-01-11", "backup-2025-01-12"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's changing quite a bit, but since we are still in the design phase, should we add a parent directory for the backup chain (e.g., the ID of the base full backup) between the VM name and the individual checkpoints?

Proposed Structure:

  • Current: checkpoints/<namespace>/<vm-name>/<checkpoint-id>/
  • Proposed: checkpoints/<namespace>/<vm-name>/<base-full-backup-id>/<checkpoint-id>/

Why this helps (Short Explanation):

  1. O(1) S3 API Queries: Object storage uses flat prefixes, not real directories. By grouping a chain under one prefix, we can calculate the total size of a full backup + all its incrementals using a single ListObjectsV2 API call, instead of making separate calls for every single checkpoint.
  2. Simplified Garbage Collection: When a backup chain expires or is broken, we can delete all associated qcow2 files safely and easily by deleting the single <base-full-backup-id> prefix, rather than parsing JSON to delete $N$ individual checkpoint paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpryc What exactly is the "base full backup ID"? Velero backup ID or or something else? What happens when this backup expires but the subsequent incremental backups have not? Where is the reference to this ID in the metadata files for these grouped operations?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sseago thinking laud here:

  1. The "base full backup ID" could be the checkpoint ID of the full backup that started the chain.
  2. When subsequent inc backups are still not expired we do not delete the s3 files from the chain. Only when the last inc backup expires, and the entire chain's "referencedBy" lists are completely empty, we can delete the entire folder.
"checkpoints":[
 {
   "id": "vmb-demo-1",
   "type": "Full",
   "chainId": "chain-vmb-demo-1"
 },
 {
   "id": "vmb-demo-2",
   "type": "Incremental",
   "parent": "vmb-demo-1",
   "chainId": "chain-vmb-demo-1"  <-- Maps to the same S3 parent prefix
 }
]

I am not writing here the entire definition, just asking if the idea would allow to simplify calculations on required storage for temporary pvcs, reduce number of s3 calls and allow to cleanup easier, but maybe not. S3 API calls aren't that problematic as we won't have to do many of those.

},
{
Expand All @@ -143,6 +169,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json):
"vmBackup": "vmb-002",
"files": ["vmb-002-disk0.qcow2", "vmb-002-disk1.qcow2"],
"pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"],
"pvcSizes": ["10Gi", "20Gi"],
"referencedBy": ["backup-2025-01-11", "backup-2025-01-12"]
},
{
Expand All @@ -153,6 +180,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json):
"vmBackup": "vmb-003",
"files": ["vmb-003-disk0.qcow2", "vmb-003-disk1.qcow2"],
"pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"],
"pvcSizes": ["10Gi", "20Gi"],
"referencedBy": ["backup-2025-01-12"]
}
]
Expand Down