Skip to content

Optimize prepare_duplicates_for_delete#14564

Open
valentijnscholten wants to merge 6 commits intoDefectDojo:bugfixfrom
valentijnscholten:optimize-prepare-duplicates-for-delete-v2
Open

Optimize prepare_duplicates_for_delete#14564
valentijnscholten wants to merge 6 commits intoDefectDojo:bugfixfrom
valentijnscholten:optimize-prepare-duplicates-for-delete-v2

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Mar 21, 2026

Summary

While working on optimizing the (hard) delete of findings, I found this optimization that can/should be its own PR.

  • Replace per-original O(n×m) loop in prepare_duplicates_for_delete() with a single bulk UPDATE for inside-scope duplicate reset, significantly reducing query count for large deletion operations
  • Add 11 unit tests covering all duplicate deletion scenarios (inside-scope reset, outside-scope reconfiguration, mixed clusters, cascade delete setting, status/found_by inheritance)
  • Add WARN-level logging to fix_loop_duplicates() for production visibility into duplicate loop frequency
  • Add explanatory comment on removeLoop() documenting optimization opportunity

Replace per-original O(n×m) loop with a single bulk UPDATE for
inside-scope duplicate reset. Outside-scope reconfiguration still
runs per-original but now uses .iterator() and .exists() to avoid
loading full querysets into memory.

Also adds WARN-level logging to fix_loop_duplicates for visibility
into how often duplicate loops occur in production, and a comment on
removeLoop explaining the optimization opportunity.
@valentijnscholten valentijnscholten changed the title Optimize prepare_duplicates_for_delete and add test coverage Optimize prepare_duplicates_for_delete Mar 21, 2026
Remove redundant .exclude() and .exists() calls by leveraging the
bulk UPDATE that already unlinks inside-scope duplicates. Add
prefetch_related to fetch all reverse relations in a single query.
…plicate_cluster

Use QuerySet.update() instead of mass_model_updater to re-point
duplicates to the new original. Single SQL query instead of loading
all findings into Python and calling bulk_update.
Remove reset_duplicate_before_delete, reset_duplicates_before_delete,
and set_new_original — all replaced by bulk UPDATE in
prepare_duplicates_for_delete and .update() in
reconfigure_duplicate_cluster. Remove unused mass_model_updater import.
@valentijnscholten
Copy link
Member Author

This is not needed if we decide to merge #14566

@Maffooch
Copy link
Contributor

Feels like #14566 would be a better approach. Will move this PR to the next milestone in case others have different opinions

@Maffooch Maffooch modified the milestones: 2.56.3, 2.56.4 Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants