Skip to content

Conversation

@androndo
Copy link
Contributor

@androndo androndo commented Jan 18, 2026

What this PR does

Release note

Closes #1880

Added BackupClass into BackupJob and BackupPlan for simplify UX.

The following functions related StorageRef were removed as they are no longer used after migrating to BackupClass API:
// - resolveBucketStorageRef: Previously resolved S3 credentials from Bucket storageRef
// - createS3CredsForVelero: Previously created Velero S3 credentials secrets
// - createBackupStorageLocation: Previously created Velero BackupStorageLocation resources
// - createVolumeSnapshotLocation: Previously created Velero VolumeSnapshotLocation resources

Summary by CodeRabbit

  • New Features

    • Added BackupClass resource and backupClassName field so Plans and BackupJobs reference reusable strategy+parameters; webhook validates and enforces immutability of backupClassName.
  • Refactor

    • Removed inline storage/strategy references; controllers resolve strategy and parameters from BackupClass. Application apiGroup now defaults to apps.cozystack.io.
  • Tests

    • Added unit tests for BackupClass resolution, job/plan creation, webhook validation, and Velero template rendering.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 ___________________________________________
< Sending Skynet back to the drawing board. >
 -------------------------------------------
  \
   \   (\__/)
       (•ㅅ•)
       /   づ

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Tip

You can customize the high-level summary generated by CodeRabbit.

Set the reviews.high_level_summary_instructions setting in your project's settings in CodeRabbit to provide custom instructions for generating the high-level summary.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a cluster-scoped BackupClass API; removes inline StorageRef/StrategyRef from Plan/Backup/BackupJob and replaces them with backupClassName; adds normalization/resolution of ApplicationRef to select a strategy and parameters and threads a ResolvedBackupConfig through reconciliation and Velero template rendering.

Changes

Cohort / File(s) Summary
API types
api/backups/v1alpha1/backup_types.go, api/backups/v1alpha1/backupjob_types.go, api/backups/v1alpha1/plan_types.go, api/backups/v1alpha1/backupclass_types.go
Removed StorageRef/StrategyRef from Backup/Plan/BackupJob specs; added backupClassName to PlanSpec and BackupJobSpec; added new BackupClass types (BackupClass, BackupClassList, BackupClassSpec, BackupClassStrategy, ApplicationSelector, BackupClassStatus).
CRD manifests
packages/system/backup-controller/definitions/..._backupclasses.yaml, ..._backupjobs.yaml, ..._backups.yaml, ..._plans.yaml
New cluster-scoped BackupClass CRD; removed storageRef/strategyRef from Backup/Plan/Job CRDs; added backupClassName to schemas and required lists; updated applicationRef.apiGroup default documentation.
DeepCopy helpers / generated
api/backups/v1alpha1/zz_generated.deepcopy.go
Added deepcopy code for new BackupClass types and ApplicationSelector; removed deepcopy handling for deleted fields.
Resolver & tests
internal/backupcontroller/backupclass_resolver.go, internal/backupcontroller/backupclass_resolver_test.go
New resolver: DefaultApplicationAPIGroup, NormalizeApplicationRef, ResolvedBackupConfig, ResolveBackupClass — resolves strategy by Application apiGroup+Kind and returns StrategyRef+Parameters; comprehensive unit tests.
Factory & tests
internal/backupcontroller/factory/backupjob.go, internal/backupcontroller/factory/backupjob_test.go
Normalize ApplicationRef.apiGroup (default apps.cozystack.io) when creating BackupJob; populate BackupClassName; tests for normalization, name generation, and field propagation.
Controllers — BackupJob & job strategy
internal/backupcontroller/backupjob_controller.go, internal/backupcontroller/jobstrategy_controller.go
Resolve BackupClass early in reconciliation; use resolved StrategyRef/Parameters; updated reconcileJob signatures to accept *ResolvedBackupConfig; added indexer on spec.backupClassName to requeue BackupJobs when BackupClass changes.
Controllers — Velero strategy & tests
internal/backupcontroller/velerostrategy_controller.go, internal/backupcontroller/velerostrategy_controller_test.go
Thread ResolvedBackupConfig through Velero reconciler and helpers; template rendering uses resolved.Parameters; removed legacy S3/bucket helper code; tests validate template-driven Velero Backup creation with parameter substitution.
Webhooks & CLI wiring
api/backups/v1alpha1/backupjob_webhook.go, api/backups/v1alpha1/backupjob_webhook_test.go, cmd/backup-controller/main.go
New mutating/validating webhook for BackupJob enforcing non-empty and immutable backupClassName; tests for create/update/delete/default; manager cache configured to track BackupClass and webhook registered at startup.
Plan controller & scheduling
internal/backupcontroller/plan_controller.go
Minor renaming and requeue timing adjustments (startingDeadline, requeue calculations).
Design doc
api/backups/v1alpha1/DESIGN.md
Documents migration to BackupClass model, API changes, ApplicationRef.apiGroup defaulting, and template/parameter processing flow changes.

Sequence Diagram

sequenceDiagram
    autonumber
    participant JobCtrl as BackupJob Controller
    participant Resolver as BackupClass Resolver
    participant BackupClass as BackupClass Resource
    participant StrategyCtrl as Strategy Controller

    JobCtrl->>Resolver: ResolveBackupClass(backupClassName, applicationRef)
    Resolver->>Resolver: NormalizeApplicationRef(applicationRef)
    Resolver->>BackupClass: Get(backupClassName)
    BackupClass-->>Resolver: spec.strategies
    Resolver->>Resolver: Match strategy by APIGroup + Kind
    Resolver-->>JobCtrl: ResolvedBackupConfig{ StrategyRef, Parameters }
    JobCtrl->>StrategyCtrl: reconcile(job, resolved)
    StrategyCtrl->>StrategyCtrl: Render templates with Application + Parameters
    StrategyCtrl-->>JobCtrl: Result / Error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through classes, neat and keen,
Matching apps to strategies unseen.
Parameters tucked, templates hum fine,
Jobs resolve tidy — one class at a time.
Hooray for order, organized and green!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(backups): backup api rework' accurately reflects the main change—a comprehensive rework of the backups API—though it could be more specific about BackupClass introduction.
Linked Issues check ✅ Passed The pull request fully implements all four objectives from issue #1880: BackupClass API created, strategyRef/storageRef removed from BackupJob/BackupPlan, backupClassName added, and applicationRef.apiGroup defaulting implemented.
Out of Scope Changes check ✅ Passed All changes align with the stated objectives. The removal of S3-related storage functions and cache configuration updates are necessary consequences of migrating to BackupClass-based architecture.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @androndo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the backup API by introducing a BackupClass abstraction. This change centralizes the definition of backup strategies and storage locations, decoupling them from individual backup requests. By referencing a BackupClass name, users can define generic backup policies that are resolved dynamically based on the application type, leading to a more streamlined, flexible, and maintainable backup system.

Highlights

  • New BackupClass API: Introduced a new BackupClass Custom Resource Definition (CRD) that centralizes backup strategy and storage configurations based on application types. This allows for more flexible and reusable backup definitions.
  • API Rework for Backup, BackupJob, and Plan: The Backup, BackupJob, and Plan resources have been updated to remove direct references to StorageRef and StrategyRef. Instead, they now reference a BackupClassName, delegating the resolution of concrete backup strategies and storage locations to the new BackupClass.
  • BackupClass Resolution Logic: A new backupclass_resolver.go component has been added to dynamically resolve the appropriate backup strategy and storage parameters from a BackupClass based on the application being backed up. This includes normalizing application API groups.
  • Controller Updates: The BackupJobReconciler has been refactored to utilize the new BackupClass resolution logic, determining the correct strategy (e.g., Velero) and passing the resolved configuration to the reconciliation functions. The PlanReconciler also now uses the BackupClassName when creating BackupJob instances.
  • Removal of Direct Storage/Strategy Management: Several helper functions previously responsible for resolving S3 credentials and creating Velero-specific resources (like BackupStorageLocation and VolumeSnapshotLocation) have been removed from velerostrategy_controller.go. This functionality is now implicitly handled through the parameters defined within the BackupClass.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring of the backup API. The introduction of the BackupClass CRD is a great design choice that decouples backup strategies and storage configuration from the BackupJob and Plan resources, making the system more flexible and maintainable. The changes are consistent across the API types, controllers, and new tests have been added to cover the new logic. I've identified a few areas for improvement, mainly concerning code duplication and clarity, which should be straightforward to address.

@androndo androndo force-pushed the feat/backups-rework branch from 754ddfb to 6c11dff Compare January 19, 2026 11:05
@androndo androndo marked this pull request as ready for review January 19, 2026 11:05
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jan 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
api/backups/v1alpha1/plan_types.go (1)

31-34: Typo in comment: "Condtions" should be "Conditions".

Proposed fix
-// Condtions
+// Conditions
 const (
 	PlanConditionError = "Error"
 )
internal/backupcontroller/plan_controller.go (1)

53-62: Condition Reason should be CamelCase without spaces.

Per Kubernetes API conventions, the Reason field should be a single CamelCase word. "Failed to parse cron spec" contains spaces and should be reformatted.

Proposed fix
 		meta.SetStatusCondition(&p.Status.Conditions, metav1.Condition{
 			Type:    backupsv1alpha1.PlanConditionError,
 			Status:  metav1.ConditionTrue,
-			Reason:  "Failed to parse cron spec",
+			Reason:  "InvalidCronSpec",
 			Message: errWrapped.Error(),
 		})
internal/backupcontroller/velerostrategy_controller.go (1)

299-327: Artifact URI should use the Velero backup's namespace, not the BackupJob's namespace.

Velero backups are created in the cozy-velero namespace (see veleroNamespace constant at line 47 and the createVeleroBackup function), but the artifact URI at line 303 uses backupJob.Namespace. This causes the URI to point to a namespace that will never contain the Velero Backup. The driverMetadata already correctly captures veleroBackup.Namespace, so the URI should do the same.

Proposed fix
-	artifact := &backupsv1alpha1.BackupArtifact{
-		URI: fmt.Sprintf("velero://%s/%s", backupJob.Namespace, veleroBackup.Name),
-	}
+	artifact := &backupsv1alpha1.BackupArtifact{
+		URI: fmt.Sprintf("velero://%s/%s", veleroBackup.Namespace, veleroBackup.Name),
+	}
♻️ Duplicate comments (2)
internal/backupcontroller/factory/backupjob.go (1)

12-26: Consolidate ApplicationRef normalization to avoid drift.

This duplicates the default API group and normalization logic from the backupcontroller package. Consider moving the shared constant/function into a small common package (or API package) so both callers use a single source of truth without introducing cycles.

internal/backupcontroller/backupclass_resolver_test.go (1)

334-337: Strengthen StrategyRef assertions (Kind/Name/APIGroup).

Right now the test only checks Kind. Adding expected Name and APIGroup assertions would make the resolution tests more robust and prevent false positives.

🧹 Nitpick comments (4)
api/backups/v1alpha1/backupclass_types.go (1)

69-78: Consider adding validation for the Kind field.

The Kind field is required but has no validation to prevent empty strings. Consider adding a minimum length validation.

Proposed enhancement
 // ApplicationSelector specifies which application types a strategy applies to.
 type ApplicationSelector struct {
 	// APIGroup is the API group of the application.
 	// If not specified, defaults to "apps.cozystack.io".
 	// +optional
 	APIGroup *string `json:"apiGroup,omitempty"`

 	// Kind is the kind of the application (e.g., VirtualMachine, MySQL).
+	// +kubebuilder:validation:MinLength=1
 	Kind string `json:"kind"`
 }
internal/backupcontroller/backupclass_resolver.go (1)

54-58: Redundant APIGroup extraction after normalization.

Since applicationRef is already normalized at line 46 (which guarantees APIGroup is non-nil and non-empty), this block will always take the branch at line 57. The initial assignment and nil check are unnecessary.

Proposed simplification
-	// Determine application API group (already normalized, but extract for matching)
-	appAPIGroup := DefaultApplicationAPIGroup
-	if applicationRef.APIGroup != nil {
-		appAPIGroup = *applicationRef.APIGroup
-	}
+	// Extract normalized API group for matching
+	appAPIGroup := *applicationRef.APIGroup

 	// Find matching strategy
internal/backupcontroller/plan_controller.go (1)

66-76: Good practice: clearing error condition on success.

However, the Reason field here should also follow CamelCase convention.

Proposed fix
 		meta.SetStatusCondition(&p.Status.Conditions, metav1.Condition{
 			Type:    backupsv1alpha1.PlanConditionError,
 			Status:  metav1.ConditionFalse,
-			Reason:  "Cron spec is valid",
+			Reason:  "CronSpecValid",
 			Message: "The cron schedule has been successfully parsed",
 		})
internal/backupcontroller/factory/backupjob_test.go (1)

123-150: Strengthen the job-name assertion to prevent false positives.

Right now the test only checks length. It would be more robust to assert the exact expected name derived from the plan and scheduled timestamp.

Proposed test hardening
 import (
+	"fmt"
 	"testing"
 	"time"
@@
 			validate: func(t *testing.T, job *backupsv1alpha1.BackupJob) {
 				if job.Name == "" {
 					t.Error("BackupJob name should be generated")
 				}
-				// Name should start with plan name
-				if len(job.Name) < len("test-plan") {
-					t.Errorf("BackupJob name = %v, should start with test-plan", job.Name)
-				}
+				expected := fmt.Sprintf("test-plan-%d", tt.scheduled.Unix()/60)
+				if job.Name != expected {
+					t.Errorf("BackupJob name = %v, want %v", job.Name, expected)
+				}
 			},

Copy link
Member

@lllamnyp lllamnyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add parameters to templating

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)

61-95: Guard against empty resolved.StrategyRef.Name before lookup.

The reconciliation flow validates resolved.StrategyRef.APIGroup and Kind before calling reconcileVelero, but does not validate that Name is non-empty. If a BackupClass is created with an empty StrategyRef.Name, the code will attempt a lookup with an empty name, resulting in a "not found" error that masks the actual issue.

Add an explicit check to catch this early:

🛡️ Suggested guard
 	logger.Debug("fetching Velero strategy", "strategyName", resolved.StrategyRef.Name)
+	if resolved.StrategyRef.Name == "" {
+		return r.markBackupJobFailed(ctx, j, "resolved strategy ref has empty name")
+	}
 	veleroStrategy := &strategyv1alpha1.Velero{}
🤖 Fix all issues with AI agents
In `@internal/backupcontroller/velerostrategy_controller_test.go`:
- Around line 191-205: The test currently skips assertions if
veleroBackup.Spec.LabelSelector is nil, allowing a missing template expansion to
pass silently; update the test in velerostrategy_controller_test.go to check
veleroBackup.Spec.LabelSelector for nil and call t.Fatal(...) (not t.Error)
immediately if it is nil so the test fails fast when the LabelSelector is
missing, then proceed to assert the "app" label and storage location as before
(refer to veleroBackup.Spec.LabelSelector, MatchLabels and
veleroBackup.Spec.StorageLocation to locate the checks).

In `@internal/backupcontroller/velerostrategy_controller.go`:
- Around line 304-307: The artifact URI is using backupJob.Namespace instead of
the Velero backup's namespace; update the BackupArtifact URI construction (the
URI field on the BackupArtifact created in this block) to use
veleroBackup.Namespace (not backupJob.Namespace) so the URI becomes
velero://<velero-namespace>/<veleroBackup.Name>, matching the metadata which
already records veleroBackup.Namespace and ensuring consumers can locate the
Velero backup correctly.

@androndo androndo requested a review from lllamnyp January 20, 2026 06:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/backups/v1alpha1/DESIGN.md (1)

469-475: Summary seems outdated after BackupClass refactor.

Line 472 still says “Plan + Storage – core-owned,” but storage selection now lives in BackupClass and is resolved per BackupJob/Plan. Please update the summary wording to reflect BackupClass as the central configuration unit.

🤖 Fix all issues with AI agents
In `@api/backups/v1alpha1/DESIGN.md`:
- Line 324: Update the typo in the DESIGN documentation sentence that reads
"Parameters are no stored directly in `Backup`" to "Parameters are not stored
directly in `Backup`"; edit the sentence referencing `Backup` and `BackupClass`
so it reads correctly (e.g., "Parameters are not stored directly in `Backup`.
Instead, they are resolved from `BackupClass` parameters when the backup was
created.") to preserve the existing meaning.
🧹 Nitpick comments (1)
api/backups/v1alpha1/DESIGN.md (1)

170-216: Convert emphasized labels to headings to satisfy MD036.

Static analysis flags emphasis used as headings. Please convert these emphasized labels (e.g., “Key fields (spec)”, “BackupClass resolution”, “Parameters”) into proper Markdown headings for lint compliance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@api/backups/v1alpha1/DESIGN.md`:
- Line 170: Replace bold-emphasized section titles like "**Key fields (spec)**"
(and the other similar instances noted) with proper Markdown heading syntax
(e.g., use one or more leading '#' characters appropriate to the document
structure such as "## Key fields (spec)"); update all occurrences flagged (the
lines containing the bolded headings) to headings to satisfy MD036 and keep
heading levels consistent across the document.

**Purpose**
Define a class of backup configurations that encapsulate strategy and parameters per application type. `BackupClass` is a cluster-scoped resource that allows admins to configure backup strategies and parameters in a reusable way.

**Key fields (spec)**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use headings instead of bold text for section titles (MD036).

markdownlint flags these as emphasis used for headings. Consider switching to proper heading levels.

📝 Suggested fix
-**Key fields (spec)**
+#### Key fields (spec)
@@
-**BackupClass resolution**
+#### BackupClass resolution
@@
-**Parameters**
+#### Parameters

Also applies to: 204-204, 216-216

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

170-170: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In `@api/backups/v1alpha1/DESIGN.md` at line 170, Replace bold-emphasized section
titles like "**Key fields (spec)**" (and the other similar instances noted) with
proper Markdown heading syntax (e.g., use one or more leading '#' characters
appropriate to the document structure such as "## Key fields (spec)"); update
all occurrences flagged (the lines containing the bolded headings) to headings
to satisfy MD036 and keep heading levels consistent across the document.

Copy link
Member

@lllamnyp lllamnyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Update the CRDs (use the ./hack/update-codegen.sh script) and update cmd/backup-controller/main.go and cmd/backupstrategy-controller/main.go so that there is a watch for BackupClass.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 20, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/backupcontroller/backupjob_controller.go`:
- Around line 119-140: In the handler created by
handler.TypedEnqueueRequestsFromMapFunc (the block that calls r.List with a
backupsv1alpha1.BackupJobList filtered by "spec.backupClassName": bc.Name),
don't silently return nil on error; log the failure before returning. Update the
anonymous func so that when r.List(ctx, &jobs,
client.MatchingFields{"spec.backupClassName": bc.Name}) returns an error you
call the controller logger (e.g., r.Log or the controller's logger field) to
emit a descriptive message including bc.Name and the error, then return nil as
before.
🧹 Nitpick comments (1)
packages/system/backup-controller/definitions/backups.cozystack.io_backupclasses.yaml (1)

167-171: Consider adding printer columns for better UX.

The CRD lacks additionalPrinterColumns. Consider adding columns to show useful info when running kubectl get backupclasses, such as the number of strategies or age.

♻️ Optional: Add printer columns
     served: true
     storage: true
+    additionalPrinterColumns:
+    - jsonPath: .metadata.creationTimestamp
+      name: Age
+      type: date
     subresources:
       status: {}

Comment on lines 119 to 140
// Requeue BackupJobs when their referenced BackupClass changes
WatchesRawSource(source.Kind(
mgr.GetCache(),
&backupsv1alpha1.BackupClass{},
handler.TypedEnqueueRequestsFromMapFunc(func(ctx context.Context, bc *backupsv1alpha1.BackupClass) []reconcile.Request {
var jobs backupsv1alpha1.BackupJobList
if err := r.List(ctx, &jobs, client.MatchingFields{"spec.backupClassName": bc.Name}); err != nil {
return nil
}

reqs := make([]reconcile.Request, 0, len(jobs.Items))
for _, job := range jobs.Items {
reqs = append(reqs, reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: job.Namespace,
Name: job.Name,
},
})
}
return reqs
}),
)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log errors from List operation instead of silently returning nil.

When r.List fails (line 125), the handler returns nil which silently drops the event. This could hide transient errors or misconfigurations. Consider logging the error before returning.

🐛 Proposed fix
 		WatchesRawSource(source.Kind(
 			mgr.GetCache(),
 			&backupsv1alpha1.BackupClass{},
 			handler.TypedEnqueueRequestsFromMapFunc(func(ctx context.Context, bc *backupsv1alpha1.BackupClass) []reconcile.Request {
+				logger := log.FromContext(ctx)
 				var jobs backupsv1alpha1.BackupJobList
 				if err := r.List(ctx, &jobs, client.MatchingFields{"spec.backupClassName": bc.Name}); err != nil {
+					logger.Error(err, "failed to list BackupJobs for BackupClass", "backupClassName", bc.Name)
 					return nil
 				}
🤖 Prompt for AI Agents
In `@internal/backupcontroller/backupjob_controller.go` around lines 119 - 140, In
the handler created by handler.TypedEnqueueRequestsFromMapFunc (the block that
calls r.List with a backupsv1alpha1.BackupJobList filtered by
"spec.backupClassName": bc.Name), don't silently return nil on error; log the
failure before returning. Update the anonymous func so that when r.List(ctx,
&jobs, client.MatchingFields{"spec.backupClassName": bc.Name}) returns an error
you call the controller logger (e.g., r.Log or the controller's logger field) to
emit a descriptive message including bc.Name and the error, then return nil as
before.

Comment on lines 119 to 140
// Requeue BackupJobs when their referenced BackupClass changes
WatchesRawSource(source.Kind(
mgr.GetCache(),
&backupsv1alpha1.BackupClass{},
handler.TypedEnqueueRequestsFromMapFunc(func(ctx context.Context, bc *backupsv1alpha1.BackupClass) []reconcile.Request {
var jobs backupsv1alpha1.BackupJobList
if err := r.List(ctx, &jobs, client.MatchingFields{"spec.backupClassName": bc.Name}); err != nil {
return nil
}

reqs := make([]reconcile.Request, 0, len(jobs.Items))
for _, job := range jobs.Items {
reqs = append(reqs, reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: job.Namespace,
Name: job.Name,
},
})
}
return reqs
}),
)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The backupClassName field should be immutable.
  2. You shouldn't really declare a watch for BackupClasses here. Instead do this in main.go:
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
  Cache: cache.Options{
    ByObject: map[client.Object]cache.ByObject{
      &backupsv1alpha1.BackupClass{}: {},
    },
  },
})

This way you have quick read access to BackupClasses and a much simpler watch setup.

Also repeat this setup for the backup strategy controller as well.

@androndo androndo force-pushed the feat/backups-rework branch from 39893c4 to b9d8deb Compare January 21, 2026 15:10
…oduction

Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
@androndo androndo force-pushed the feat/backups-rework branch from b9d8deb to f6641c1 Compare January 22, 2026 06:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@api/backups/v1alpha1/backupclass_types.go`:
- Around line 56-61: The StrategyRef field on BackupClass currently uses
corev1.TypedLocalObjectReference which is namespaced; because BackupClass is
cluster-scoped you must replace StrategyRef with a cluster-scoped reference
struct (e.g., a new BackupStrategyRef or ClusterTypedObjectReference) that only
contains apiGroup, kind and name (no namespace) and update all usages and JSON
tags to the new type; ensure you add the new struct definition in the same
package (with deepcopy+openapi markers as needed) and update any import
references (remove corev1 usage for this field) so CRD generation and any code
referencing BackupClass.StrategyRef compile against the cluster-scoped reference
type.

In `@api/backups/v1alpha1/DESIGN.md`:
- Around line 362-363: The document is unclear how restore-time storage is
deterministically resolved since BackupJob only references Backup and Backup
doesn’t include BackupClass; update the DESIGN text (including the other
mentioned section around lines 411-418) to require drivers to persist the
necessary storage identifier (for example the BackupClass name, storage location
name, or equivalent id) into Backup.driverMetadata when creating the Backup so
that the driver can deterministically resolve the storage location at restore
time; explicitly state that restore logic must read Backup.driverMetadata to
locate the original storage and fall back to driver-internal metadata only if
that field is absent.
♻️ Duplicate comments (1)
internal/backupcontroller/backupjob_controller.go (1)

119-140: Log errors from List operation instead of silently returning nil.

When r.List fails (line 125), the handler returns nil which silently drops the event. This could hide transient errors or misconfigurations that would be difficult to debug in production.

🐛 Proposed fix
 		WatchesRawSource(source.Kind(
 			mgr.GetCache(),
 			&backupsv1alpha1.BackupClass{},
 			handler.TypedEnqueueRequestsFromMapFunc(func(ctx context.Context, bc *backupsv1alpha1.BackupClass) []reconcile.Request {
+				logger := log.FromContext(ctx)
 				var jobs backupsv1alpha1.BackupJobList
 				if err := r.List(ctx, &jobs, client.MatchingFields{"spec.backupClassName": bc.Name}); err != nil {
+					logger.Error(err, "failed to list BackupJobs for BackupClass", "backupClassName", bc.Name)
 					return nil
 				}
🧹 Nitpick comments (1)
internal/backupcontroller/backupclass_resolver.go (1)

68-72: Avoid leaking mutable spec maps to callers.

Parameters is returned directly from the BackupClass spec; downstream mutation would affect the cached object. Consider copying the map before returning.

♻️ Suggested defensive copy
-		if strategyAPIGroup == appAPIGroup && strategy.Application.Kind == applicationRef.Kind {
+		if strategyAPIGroup == appAPIGroup && strategy.Application.Kind == applicationRef.Kind {
+			var params map[string]string
+			if strategy.Parameters != nil {
+				params = make(map[string]string, len(strategy.Parameters))
+				for k, v := range strategy.Parameters {
+					params[k] = v
+				}
+			}
 			return &ResolvedBackupConfig{
 				StrategyRef: strategy.StrategyRef,
-				Parameters:  strategy.Parameters,
+				Parameters:  params,
 			}, nil
 		}

Comment on lines 56 to 61
// StrategyRef references the driver-specific BackupStrategy (e.g., Velero).
StrategyRef corev1.TypedLocalObjectReference `json:"strategyRef"`

// Application specifies which application types this strategy applies to.
Application ApplicationSelector `json:"application"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for strategy CRD definitions and their scope
# First, try the suggested path
echo "=== Searching in api/backups/strategy ==="
find api/backups -type d -name strategy

echo ""
echo "=== Looking for strategy-related types.go files ==="
find api -name "*.go" -path "*strateg*" | head -20

echo ""
echo "=== Searching for kubebuilder resource markers in api/backups ==="
rg -n '\+kubebuilder:resource' api/backups/ -A 1 -B 1

Repository: cozystack/cozystack

Length of output: 248


🌐 Web query:

Cozystack BackupClass strategy CRD scope kubebuilder

💡 Result:

Short answer: make the BackupClass/strategy CRD cluster-scoped with kubebuilder by either creating it with --namespaced=false or adding a resource marker (+kubebuilder:resource:scope=Cluster) to the API type and regenerating manifests.

How to (two options)

  • New API: kubebuilder create api --group --version --kind BackupClass --resource=true --controller=true --namespaced=false. [1]
  • Existing API: add the marker above the type in types.go:
    //+kubebuilder:resource:scope=Cluster
    then run make manifests (controller-gen) to regenerate the CRD YAML. [1]

Why cluster scope (recommended): strategy/backup-class objects are usually global configuration referenced across namespaces (Velero and Cozystack backup components live in a management namespace like cozy-velero), so cluster scope avoids duplicating definitions per namespace. See Cozystack backup docs referencing Velero in the cozy-velero management namespace. [2]

References

  • Kubebuilder CRD scope and markers. [1]
  • Cozystack Backup & Recovery (Velero integration, cozy-velero). [2]

🌐 Web query:

Cozystack api/backups strategy TypedLocalObjectReference namespaced

💡 Result:

Short answer:

  • Cozystack’s backup/restore integration uses Velero (configured via the Cozystack docs) and expects Kubernetes-style references for in‑namespace objects. [1][2]
  • Use a TypedLocalObjectReference to point to a namespaced resource by providing: kind, name, and (when the resource is not in core) apiGroup. That reference resolves only inside the same namespace. Example fields: kind: , name: , apiGroup: . [2]
  • For Cozystack-specific backup setup and where these references are used (e.g., BackupStorageLocation, Velero config, secrets), see the Cozystack Backup & Recovery and Cozystack API docs. [1][3]

Sources:
[1] Cozystack — Backup and Recovery.
[2] Kubernetes — TypedLocalObjectReference definition.
[3] Cozystack — Cozystack API.


Use a cluster-scoped reference type for StrategyRef instead of TypedLocalObjectReference.

BackupClass is cluster-scoped (global configuration), but TypedLocalObjectReference is designed for namespaced resource references only and resolves in-namespace. Strategy CRDs should be cluster-scoped to match Cozystack's Velero integration pattern (management namespace provides cluster-wide configuration). Use a custom reference type with only kind, name, and apiGroup fields (without namespace), or define a custom reference struct for cluster-scoped resources.

🤖 Prompt for AI Agents
In `@api/backups/v1alpha1/backupclass_types.go` around lines 56 - 61, The
StrategyRef field on BackupClass currently uses corev1.TypedLocalObjectReference
which is namespaced; because BackupClass is cluster-scoped you must replace
StrategyRef with a cluster-scoped reference struct (e.g., a new
BackupStrategyRef or ClusterTypedObjectReference) that only contains apiGroup,
kind and name (no namespace) and update all usages and JSON tags to the new
type; ensure you add the new struct definition in the same package (with
deepcopy+openapi markers as needed) and update any import references (remove
corev1 usage for this field) so CRD generation and any code referencing
BackupClass.StrategyRef compile against the cluster-scoped reference type.

Comment on lines +362 to +363
**Note:** Parameters are resolved from `BackupClass` when the `BackupJob` is created. The driver uses these parameters to determine where to store backups. The storage location itself is managed by the driver (e.g., Velero's `BackupStorageLocation` CRD) and is not directly referenced in the `Backup` resource. When restoring, the driver resolves the storage location from the original `BackupClass` parameters or from the driver's own metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify how restore-time storage is deterministically resolved.

RestoreJob only references Backup, and Backup doesn’t include backupClassName. If the intent is to resolve storage from the original BackupClass, consider explicitly stating that the driver must persist the needed identifier (e.g., storage location name or BackupClass name) in Backup.driverMetadata so restore can be deterministic.

Also applies to: 411-418

🤖 Prompt for AI Agents
In `@api/backups/v1alpha1/DESIGN.md` around lines 362 - 363, The document is
unclear how restore-time storage is deterministically resolved since BackupJob
only references Backup and Backup doesn’t include BackupClass; update the DESIGN
text (including the other mentioned section around lines 411-418) to require
drivers to persist the necessary storage identifier (for example the BackupClass
name, storage location name, or equivalent id) into Backup.driverMetadata when
creating the Backup so that the driver can deterministically resolve the storage
location at restore time; explicitly state that restore logic must read
Backup.driverMetadata to locate the original storage and fall back to
driver-internal metadata only if that field is absent.

Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@api/backups/v1alpha1/backupjob_webhook.go`:
- Around line 27-30: The BackupJob defaulter is currently a no-op; implement
defaulting so that when a BackupJob has a non-nil applicationRef with an empty
apiGroup you set it to "apps.cozystack.io". In BackupJob.Default(), check the
ApplicationRef field (e.g. j.Spec.ApplicationRef or j.ApplicationRef depending
on struct) for nil and for an empty ApiGroup string, and assign
"apps.cozystack.io" when missing; keep nil-safety and only modify the field when
it is unset so existing explicit values remain unchanged.
🧹 Nitpick comments (1)
api/backups/v1alpha1/backupjob_types.go (1)

47-57: Add schema-level non-empty validation for backupClassName.

Webhook validation covers empty values, but a CRD MinLength avoids round-trips and gives earlier feedback.

♻️ Suggested schema tweak
 // BackupClassName references a BackupClass that contains strategy and storage configuration.
 // The BackupClass will be resolved to determine the appropriate strategy and storage
 // based on the ApplicationRef.
 // This field is immutable once the BackupJob is created.
 // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="backupClassName is immutable"
+// +kubebuilder:validation:MinLength=1
 BackupClassName string `json:"backupClassName"`

Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@api/backups/v1alpha1/backupjob_webhook_test.go`:
- Around line 55-70: The test currently expects a whitespace-only
BackupJob.Spec.BackupClassName to be accepted, but we should reject it; update
the test case in backupjob_webhook_test.go to set wantErr: true for the case
with BackupClassName: "   " and then update the BackupJob webhook validation
(the BackupJob's admission/validation logic—e.g., the
ValidateCreate/ValidateUpdate handler that checks BackupJobSpec.BackupClassName)
to trim whitespace (strings.TrimSpace) and treat an empty result as invalid,
returning an appropriate validation error when BackupClassName is empty after
trimming.
- Around line 173-201: The test case "update when old backupClassName is empty
should allow setting it" contradicts the BackupJobSpec immutability rule for
BackupClassName: since BackupJobSpec declares BackupClassName as immutable (self
== oldSelf), updating from empty to non-empty should be rejected; update the
test to expect an error (set wantErr: true) or alternatively relax the
immutability in the BackupJobSpec schema/validation where BackupClassName is
enforced. Locate the case in backupjob_webhook_test.go and either change the
test's wantErr for that table entry or modify the validation/field immutability
logic that enforces BackupClassName in BackupJobSpec.

In `@api/backups/v1alpha1/backupjob_webhook.go`:
- Around line 54-56: The current immutability check in backupjob_webhook.go only
rejects changes when oldJob.Spec.BackupClassName is non-empty; update the check
to reject any change to BackupClassName regardless of emptiness by comparing
oldJob.Spec.BackupClassName directly to j.Spec.BackupClassName (i.e., remove the
oldJob.Spec.BackupClassName != "" guard) and keep returning the same formatted
error when they differ so empty→non-empty transitions are also rejected.
- Around line 36-38: The validation currently only checks j.Spec.BackupClassName
== "" and accepts whitespace-only values; change the check in the webhook
validation (where j.Spec.BackupClassName is validated) to treat whitespace-only
strings as empty by using strings.TrimSpace(j.Spec.BackupClassName) and return
the same fmt.Errorf when the trimmed value is empty. Also update the unit test
that sets BackupClassName to a whitespace-only string (the test case in
backupjob_webhook_test that currently uses BackupClassName: "   ") to expect an
error instead of no error.
♻️ Duplicate comments (1)
api/backups/v1alpha1/backupjob_webhook.go (1)

24-27: Default() should set applicationRef.apiGroup when empty.

Spec explicitly documents the default (apps.cozystack.io), but the defaulter is still a no-op. That leaves user-created objects without the promised default and can break downstream assumptions.

✅ Proposed fix
 func (j *BackupJob) Default() {
-	// No defaults needed for BackupJob currently
+	if j.Spec.ApplicationRef.APIGroup == nil || (j.Spec.ApplicationRef.APIGroup != nil && *j.Spec.ApplicationRef.APIGroup == "") {
+		defaultGroup := "apps.cozystack.io"
+		j.Spec.ApplicationRef.APIGroup = &defaultGroup
+	}
 }

Comment on lines 55 to 70
name: "BackupJob with whitespace-only backupClassName should be rejected",
job: &BackupJob{
ObjectMeta: metav1.ObjectMeta{
Name: "test-job",
Namespace: "default",
},
Spec: BackupJobSpec{
ApplicationRef: corev1.TypedLocalObjectReference{
Kind: "VirtualMachine",
Name: "vm1",
},
BackupClassName: " ",
},
},
wantErr: false, // Whitespace is technically not empty, but this is acceptable
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject whitespace-only backupClassName.

A whitespace-only value can’t resolve to any real BackupClass, so it should be treated as empty and rejected to fail fast. Please align the test expectation with that behavior (and update the webhook validation accordingly).

💡 Proposed test adjustment
 		{
 			name: "BackupJob with whitespace-only backupClassName should be rejected",
 			job: &BackupJob{
@@
 					BackupClassName: "   ",
 				},
 			},
-			wantErr: false, // Whitespace is technically not empty, but this is acceptable
+			wantErr: true,
+			errMsg:  "backupClassName is required and cannot be empty",
 		},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: "BackupJob with whitespace-only backupClassName should be rejected",
job: &BackupJob{
ObjectMeta: metav1.ObjectMeta{
Name: "test-job",
Namespace: "default",
},
Spec: BackupJobSpec{
ApplicationRef: corev1.TypedLocalObjectReference{
Kind: "VirtualMachine",
Name: "vm1",
},
BackupClassName: " ",
},
},
wantErr: false, // Whitespace is technically not empty, but this is acceptable
},
name: "BackupJob with whitespace-only backupClassName should be rejected",
job: &BackupJob{
ObjectMeta: metav1.ObjectMeta{
Name: "test-job",
Namespace: "default",
},
Spec: BackupJobSpec{
ApplicationRef: corev1.TypedLocalObjectReference{
Kind: "VirtualMachine",
Name: "vm1",
},
BackupClassName: " ",
},
},
wantErr: true,
errMsg: "backupClassName is required and cannot be empty",
},
🤖 Prompt for AI Agents
In `@api/backups/v1alpha1/backupjob_webhook_test.go` around lines 55 - 70, The
test currently expects a whitespace-only BackupJob.Spec.BackupClassName to be
accepted, but we should reject it; update the test case in
backupjob_webhook_test.go to set wantErr: true for the case with
BackupClassName: "   " and then update the BackupJob webhook validation (the
BackupJob's admission/validation logic—e.g., the ValidateCreate/ValidateUpdate
handler that checks BackupJobSpec.BackupClassName) to trim whitespace
(strings.TrimSpace) and treat an empty result as invalid, returning an
appropriate validation error when BackupClassName is empty after trimming.

Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[backups] refactor for BackupClasses

3 participants