PERF: AlignSections Filters OoC Optimization#1560
Draft
joeykleingers wants to merge 3 commits intoBlueQuartzSoftware:developfrom
Draft
PERF: AlignSections Filters OoC Optimization#1560joeykleingers wants to merge 3 commits intoBlueQuartzSoftware:developfrom
joeykleingers wants to merge 3 commits intoBlueQuartzSoftware:developfrom
Conversation
2504647 to
200daaa
Compare
Add slice-buffered OOC paths for the AlignSections filter family: - AlignSectionsMisorientation: OOC findShiftsOoc() with 2-slice quats/phases/mask buffering (1.6x OOC speedup) - AlignSectionsMutualInformation: OOC formFeaturesSectionsOoc() with per-slice buffering - AlignSectionsFeatureCentroid: Transfer phase optimization only - AlignSectionsListFilter: Transfer phase optimization only Base class AlignSections::execute() now dispatches to AlignSectionsTransferDataOocImpl when any cell array is OOC, using sequential read-into-buffer then write-back-shifted pattern that eliminates per-tuple chunk thrashing. All correctness tests now exercise both in-core and OOC algorithm paths via GENERATE(false, true) + ForceOocAlgorithmGuard. Signed-off-by: BlueQuartz Software <info@bluequartz.net>
200daaa to
8251640
Compare
…ndShifts Pre-load the first reference slice before the convergence loop and swap cur→ref buffers at each iteration instead of re-reading the reference from DataStore. Halves per-iteration DataStore reads, improving OOC Misorientation from 21s to 16s (2.0x vs baseline). Add Doxygen comments for private OOC methods in Misorientation and MutualInformation headers.
- Remove unused #include <iostream> from AlignSectionsMisorientation.cpp - Remove duplicate cancel check (m_ShouldCancel before getCancel()) - Fix local variable naming: m_CellPhases/m_CrystalStructures → cellPhases/crystalStructures in formFeaturesSections - Use hidden Catch2 tag [.Benchmark] so benchmark tests don't run in default CI - Run clang-format on all PR files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds slice-buffered OOC algorithm paths for the 4 AlignSections filters, using dual-dispatch (Strategy C) to preserve the original in-core code untouched while adding OOC-optimized variants.
Changes
Base class (
AlignSections.cpp)AlignSectionsTransferDataOocImpl<T>dispatched when any cell array is OOCAlignSectionsMisorientation
findShiftsOoc()buffers 2 adjacent Z-slices of quats, cellPhases, and mask before the convergence loopstd::swapcur→ref each iteration to avoid re-reading the reference from DataStoreAlignSectionsMutualInformation
formFeaturesSectionsOoc()buffers one Z-slice of quats, cellPhases, and mask for the per-slice 2D flood-fill segmentationAlignSectionsFeatureCentroid & AlignSectionsListFilter
Tests
GENERATE(false, true)+ForceOocAlgorithmGuardBenchmark Results (200×200×200)
Optimization Ceiling Analysis
The OOC speedups are more modest than Groups B/C/D because the transfer phase dominates OOC runtime and is bottlenecked by ZarrStore's per-element overhead (~55–75ns per
operator[]: mutex lock/unlock + chunk lookup vs ~1ns for in-core DataStore). The Misorientation filter shows the most benefit (2.0x) because its findShifts convergence loop re-reads the same 2 slices many times — slice buffering plus reference-slice swap (reusing the previous iteration's current-slice buffer as the next iteration's reference) eliminates that repeated I/O.Further improvement requires deeper OOC infrastructure changes:
AbstractDataStore— eliminates ~47.8M mutex lock/unlock cycles per filter (~1–2s savings)Test Plan
simplnx-Rel)simplnx-ooc-Rel)GENERATE(false, true)exercises both algorithm paths in both builds