-
Notifications
You must be signed in to change notification settings - Fork 90
More details on restore design #2151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,19 +59,26 @@ Taking a VolumeSnapshot and then using kopia to process the entire volume and co | |
| - Note that unlike the built-in datamover, this DataUpload is tied to a VM (which could include multiple PVCs) rather than to a single PVC. | ||
| - An annotation will be added to the DataUpload identifying the VirtualMachine we're backing up. | ||
| - Add `velerov1api.DataUploadNameAnnotation` to VirtualMachine | ||
| - Add `velerov1api.PVCNamespaceNameLabel` annotation to VirtualMachine (doesn't need to be a label, since we're just using it to figure out what label selector to use for the ConfigMap on restore). | ||
| - OperationID will be created and returned similar to what's done with the CSI PVC plugin, and the async operation Progress method will report on progress based on the DU status (similar to CSI PVC plugin) | ||
| - PVC BIA plugin | ||
| - Add `kubevirt-datamover-vm` annotation to PVC with the `VirtualMachine` name to signal to RIA that we need to remove `VolumeName` and set `Selector.MatchLabels` on PVC. | ||
| - VirtualMachine RIA plugin | ||
| - (further investigation still needed on exactly what's needed here) | ||
| - Similar in functionality to csi PVC restore action | ||
| - Create temporary PVC (needed, or is this handled by controller?) | ||
| - Create DD based on DU annotation and DU ConfigMap (this may be handled already by existing DU RIA) | ||
| - Create DD based on DU annotation and DU ConfigMap | ||
| - Need to confirm that VM resource has the PVC name annotation added by the BIA plugin | ||
| - PVC RIA plugin | ||
| - If PVC has `kubevirt-datamover-vm` annotation, need to do the following: | ||
| - set spec.VolumeName to "" | ||
| - set selector with MatchLabels to match PV that will be created by restore controller | ||
| - VirtualMachineBackup/VirtualMachineBackupTracker RIA plugin | ||
| - Simple RIA that discards VMB/VMBT resources on restore | ||
| - We don't want to restore these because they would kick off another VMBackup action. | ||
|
|
||
| ### Kubevirt Datamover Controller | ||
| - Responsible for reconciling DataUploads/DataDownloads where `Spec.DataMover` is "kubevirt" | ||
| - Configurable concurrency limits: concurrent-vm-backups and concurrent-vm-datauploads | ||
| - We need the `qemu-img` binary built into the controller image. | ||
| - DataUpload reconciler (backup): | ||
| - create the (temporary) PVC. | ||
| - identify the VirtualMachine from the PVC metadata. | ||
|
|
@@ -87,18 +94,36 @@ Taking a VolumeSnapshot and then using kopia to process the entire volume and co | |
| - We need to properly handle cases where we attempt an incremental backup but a full backup is taken instead (checkpoint lost, CSI snapshot restore since last checkpoint, VM restart, etc.) | ||
| - Aborted backups also need to be handled (resulting in a failed PVC backup on the Velero side) | ||
| - DataDownload reconciler (restore) | ||
| - (this area is less well defined so far, since the kubevirt enhancement doesn't go into as much detail on restore) | ||
| - We will need a temporary PVC for pulling qcow2 images from object store (if we're restoring the empty temp PVC from backup, that might work here) | ||
| - We also need PVCs created for each VM disk we're restoring from qcow2 images. | ||
| - Identify the VM from the DD. | ||
| - Pull BSL metadata for the VM and backup | ||
| - Create the temporary PVC to download the qcow2 files onto. | ||
| - PV here is also temporary | ||
| - PVC size based on the size of the qcow2 files in BSL needed for restore as well as the PVC sizes | ||
| - For each PVC, calculate the sum of all qcow2 files added to the PVC size, and then add 10% as a buffer. If there are multiple PVCs, take the max value, as we can process one PVC at a time, so we don't need to hold files for all PVCs on the temp disk at the same time. | ||
|
Comment on lines
+101
to
+102
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Validate the 10% buffer sizing calculation. The sizing calculation adds a 10% buffer to the sum of qcow2 files and PVC sizes. This buffer percentage appears arbitrary and may be insufficient in edge cases, such as:
Consider whether:
Would you like me to research typical qcow2 expansion ratios or generate logic to handle PVC resize scenarios? 🧰 Tools🪛 markdownlint-cli2 (0.22.0)[warning] 101-101: Inconsistent indentation for list items at the same level (MD005, list-indent) 🤖 Prompt for AI Agents |
||
| - Create temporary PVCs for each PVC in the VM (identified from BSL metadata). | ||
| - These need to be mounted as block mode volumes. | ||
| - PV will be bound to workload PVCs after restore, similar to velero datamover. | ||
| - To facilitate PV reattachment, we need a similar approach to the upstream velero exposer logic: | ||
| - The `DynamicPVRestoreLabel` needs to be set on the restore PV | ||
| - Generic restore exposer reads the selector back from the target PVC: In <https://github.com/vmware-tanzu/velero/blob/main/pkg/exposer/generic_restore.go#L443-L449>, `RebindVolume()` extracts `targetPVC.Spec.Selector.MatchLabels` and passes it to `ResetPVBinding()`. | ||
| - `ResetPVBinding()` copies the labels onto the PV: In <https://github.com/vmware-tanzu/velero/blob/main/pkg/util/kube/pvc_pv.go#L226-L270>, the labels (including `DynamicPVRestoreLabel`) are copied from the PVC selector to the PV's labels, and ClaimRef is reset so Kubernetes can bind them. | ||
| - Size based on the `pvcSizes` metadata in the BSL. | ||
| - We'll need to create another datamover pod here which will do the following: | ||
| - pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating. | ||
| - pod running command/image will first get the list of qcow2 files to pull from object storage | ||
| - once we have the qcow2 full backup and incremental files from object store, repeatedly call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order | ||
| - Then, convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img` | ||
| - Finally, write this image to the PVC which contains the VM disk | ||
| - (repeat process for each disk if VM has multiple disks to restore) | ||
| - The pod permissions will need to be the same as we have for velero datamover (run as root, selinux config etc.) | ||
| - The pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating. | ||
| - The pod running command/image will first get the list of qcow2 files to pull from object storage | ||
| - Process one PVC at a time: | ||
| - Download all required qcow2 files for this PVC. | ||
| - Call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order | ||
| - FIXME: Should we use the unsafe `-u` flag? kubevirt docs suggested this, but we should confirm that this is still appropriate. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolve the FIXME about the unsafe The FIXME comment indicates uncertainty about whether to use the unsafe The unsafe flag bypasses backing file format consistency checks. Using it incorrectly could result in:
This design decision must be resolved before implementation proceeds. Consult with KubeVirt subject matter experts or review the latest KubeVirt incremental backup documentation to confirm the recommended approach. Would you like me to search the KubeVirt enhancement proposal and related issues to find guidance on this flag usage? 🤖 Prompt for AI Agents |
||
| - Delete the incremental qcow2 files, leaving the rebased full backup. | ||
| - Convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img` | ||
| - Delete the qcow2 file | ||
| - Finally, write this image to the PVC which contains the appropriate PV: | ||
| - `dd if=/path/to/image.raw of=/dev/target_pvc_block_device bs=4M status=progress` | ||
| - Delete the raw file | ||
| - Note that the various `qemu-img` actions might eventually be combined into a single kubevirt API call, but for the moment this would need to be done manually. | ||
| - Once datamover pod has restored the VM disks, it will exit and the VirtualMachine can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content. | ||
| - Once datamover pod has restored the VM disks, it will exit, the temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify the cleanup logic for temporary resources. The statement "temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV" is unclear about the distinction between:
Please clarify:
This ambiguity could lead to resource leaks or incorrect cleanup during implementation. 🤖 Prompt for AI Agents |
||
|
|
||
| ### Kubevirt datamover backup data/metadata | ||
|
|
||
|
|
@@ -133,6 +158,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json): | |
| "vmBackup": "vmb-001", | ||
| "files": ["vmb-001-disk0.qcow2", "vmb-001-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "pvcSizes": ["10Gi", "20Gi"], | ||
| "referencedBy": ["backup-2025-01-10", "backup-2025-01-11", "backup-2025-01-12"] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it's changing quite a bit, but since we are still in the design phase, should we add a parent directory for the backup chain (e.g., the ID of the base full backup) between the VM name and the individual checkpoints? Proposed Structure:
Why this helps (Short Explanation):
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mpryc What exactly is the "base full backup ID"? Velero backup ID or or something else? What happens when this backup expires but the subsequent incremental backups have not? Where is the reference to this ID in the metadata files for these grouped operations?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sseago thinking laud here:
"checkpoints":[
{
"id": "vmb-demo-1",
"type": "Full",
"chainId": "chain-vmb-demo-1"
},
{
"id": "vmb-demo-2",
"type": "Incremental",
"parent": "vmb-demo-1",
"chainId": "chain-vmb-demo-1" <-- Maps to the same S3 parent prefix
}
]I am not writing here the entire definition, just asking if the idea would allow to simplify calculations on required storage for temporary pvcs, reduce number of s3 calls and allow to cleanup easier, but maybe not. S3 API calls aren't that problematic as we won't have to do many of those. |
||
| }, | ||
| { | ||
|
|
@@ -143,6 +169,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json): | |
| "vmBackup": "vmb-002", | ||
| "files": ["vmb-002-disk0.qcow2", "vmb-002-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "pvcSizes": ["10Gi", "20Gi"], | ||
| "referencedBy": ["backup-2025-01-11", "backup-2025-01-12"] | ||
| }, | ||
| { | ||
|
|
@@ -153,6 +180,7 @@ Per-VM Index (checkpoints/<ns>/<vm-name>/index.json): | |
| "vmBackup": "vmb-003", | ||
| "files": ["vmb-003-disk0.qcow2", "vmb-003-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "pvcSizes": ["10Gi", "20Gi"], | ||
| "referencedBy": ["backup-2025-01-12"] | ||
| } | ||
| ] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: openshift/oadp-operator
Length of output: 49
🏁 Script executed:
Repository: openshift/oadp-operator
Length of output: 1220
🏁 Script executed:
# Read pkg/common/common.go to see existing constants cat -n pkg/common/common.goRepository: openshift/oadp-operator
Length of output: 15794
🏁 Script executed:
Repository: openshift/oadp-operator
Length of output: 586
Clarify and verify the
PVCNamespaceNameLabelannotation.The constant
velerov1api.PVCNamespaceNameLabelis referenced in the design document but does not exist anywhere in the codebase or Velero v1 API references found here. Additionally, the annotation name includes "Label" while the comment states it "doesn't need to be a label"—this contradiction is confusing.Before implementation, clarify:
PVCNamespaceNameLabelneeds to be added to Velero upstream or defined locally inpkg/common/common.go🤖 Prompt for AI Agents