WIP: PERF: OOC-optimized algorithm variants for 30+ filters#1575
Draft
joeykleingers wants to merge 16 commits intoBlueQuartzSoftware:developfrom
Draft
WIP: PERF: OOC-optimized algorithm variants for 30+ filters#1575joeykleingers wants to merge 16 commits intoBlueQuartzSoftware:developfrom
joeykleingers wants to merge 16 commits intoBlueQuartzSoftware:developfrom
Conversation
3 tasks
838a49f to
f145122
Compare
1579e64 to
7a5a3c7
Compare
Replace the chunk-based DataStore API with a plugin-driven hook
architecture that cleanly separates OOC policy (in the SimplnxOoc
plugin) from mechanism (in the core library). The old API required
every caller to understand chunk geometry; the new design hides OOC
details behind bulk I/O primitives and plugin-registered callbacks.
--- AbstractDataStore / IDataStore API ---
Remove the entire chunk API from AbstractDataStore and IDataStore:
loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds,
getChunkShape, getChunkSize, getChunkTupleShape, getChunkExtents, and
convertChunkToDataStore. Replace with two bulk I/O primitives:
copyIntoBuffer(startIndex, span<T>) and copyFromBuffer(startIndex,
span<const T>), implemented in DataStore (std::copy on raw memory) and
EmptyDataStore (throws). This shifts the abstraction from "load a
chunk, then index into it" to "copy a contiguous range into a caller-
owned buffer," which works identically for in-core and OOC stores.
Simplify StoreType to three values (InMemory, OutOfCore, Empty) by
removing EmptyOutOfCore. IsOutOfCore() now checks StoreType instead
of testing getChunkShape().has_value(). Add getRecoveryMetadata()
virtual to IDataStore for crash-recovery attribute persistence.
--- Plugin Hook System (DataIOCollection / IDataIOManager) ---
Add three plugin-registered callback hooks to DataIOCollection:
FormatResolverFnc: Decides storage format for a given array based on
type, shape, and size. Called from DataStoreUtilities::CreateDataStore
and CreateListStore. Replaces the removed checkStoreDataFormat() and
TryForceLargeDataFormatFromPrefs — format decisions now live entirely
in the plugin, with core only calling resolveFormat() when no format
is already set.
BackfillHandlerFnc: Post-import callback that lets the plugin finalize
placeholder stores after all HDF5 objects are read. Called from
ImportH5ObjectPathsAction after importing all paths. Replaces the
removed backfillReadOnlyOocStores core implementation.
WriteArrayOverrideFnc: Intercepts HDF5 writes during recovery file
creation, allowing the plugin to write lightweight placeholder
datasets instead of full array data. Activated via RAII
WriteArrayOverrideGuard, wired into DataStructureWriter.
Add factory registration on IDataIOManager for ListStoreRefCreateFnc,
StringStoreCreateFnc, and FinalizeStoresFnc, with delegating creation
methods on DataIOCollection. Guard against reserved format name
"Simplnx-Default-In-Memory" during IO manager registration.
--- EmptyStringStore Placeholder ---
Add EmptyStringStore, a placeholder class for OOC string array import
that stores only tuple shape metadata. All data access
methods throw std::runtime_error. isPlaceholder() returns true (vs
false for StringStore). StringArrayIO creates EmptyStringStore in OOC mode instead of
allocating numValues empty strings.
--- HDF5 I/O ---
DataStoreIO::ReadDataStore gains two interception paths before the
normal in-core load: (1) recovery file detection via OocBackingFilePath
HDF5 attributes, creating a read-only reference store pointing at the
backing file; (2) OOC format resolution via resolveFormat(), creating a
read-only reference store directly from the source .dream3d file with
no temp copy.
DataArrayIO::writeData always calls WriteDataStore
directly — OOC stores materialize their data through the plugin's
writeHdf5() method; recovery writes use WriteArrayOverrideFnc.
NeighborListIO gains OOC interception: computes total neighbor count,
calls resolveFormat(), and creates a read-only ref list store when an
OOC format is available. Legacy NeighborList reading passes a preflight
flag through the entire call chain (readLegacyNeighborList ->
createLegacyNeighborList -> ReadHdf5Data) so legacy .dream3d imports
create EmptyListStore placeholders instead of eagerly loading per-
element via setList().
DataStructureWriter checks WriteArrayOverrideFnc before normal writes,
giving the registered plugin callback first chance to handle each
data object.
Add explicit template instantiations for DatasetIO::createEmptyDataset
and DatasetIO::writeSpanHyperslab for all numeric types plus bool.
These are needed by the SimplnxOoc plugin's AbstractOocStore::writeHdf5(),
which cannot use writeSpan() because the full array is not in memory.
Instead it creates an empty dataset, then fills it region-by-region
via hyperslab writes as it streams data from the backing file.
--- Preferences ---
Add unified oocMemoryBudgetBytes preference (default 8 GB) that
the ChunkCache, visualization, and stride cache all use. Add k_InMemoryFormat
sentinel constant for explicit in-core format choice. Add migration
logic to erase legacy empty-string and "In-Memory" preference values.
checkUseOoc() now tests against k_InMemoryFormat.
setLargeDataFormat("") removes the key so plugin defaults take effect.
--- Algorithm Infrastructure ---
AlgorithmDispatch: Add ForceInCoreAlgorithm/ForceOocAlgorithm global
flags with RAII guards. Add DispatchAlgorithm template that selects
Direct (in-core) vs Scanline (OOC) algorithm variant based on store
types and force flags. Add SIMPLNX_TEST_ALGORITHM_PATH CMake option
(0=both, 1=OOC-only, 2=InCore-only) for dual-dispatch test control.
IParallelAlgorithm: Remove blanket TBB disabling for OOC data — OOC
stores are now thread-safe via ChunkCache + HDF5 global mutex.
CheckStoresInMemory/CheckArraysInMemory use StoreType instead of
getDataFormat().
VtkUtilities: Rewrite binary write path to read into 4096-element
buffers via copyIntoBuffer, byte-swap in the buffer, and fwrite —
replacing direct DataStore data() pointer access.
--- Filter Algorithm Updates ---
FillBadData: Rewrite phaseOneCCL and phaseThreeRelabeling to use
Z-slab buffered I/O via copyIntoBuffer/copyFromBuffer instead of
the removed chunk API (loadChunk, getChunkLowerBounds, etc.).
operator()() scans feature counts in 64K-element chunks via
copyIntoBuffer.
QuickSurfaceMesh: Remove getChunkShape() call in generateTripleLines()
that set ParallelData3DAlgorithm chunk size, as the chunk API no
longer exists on AbstractDataStore.
--- File Import ---
ImportH5ObjectPathsAction: Add deferred-load pattern. When a backfill
handler is registered, pass preflight=true to create placeholder stores
during import, then call runBackfillHandler() after all paths are
imported to let the plugin finalize.
Dream3dIO: Add WriteRecoveryFile() that wraps WriteFile with WriteArrayOverrideGuard.
--- Utility Changes ---
DataStoreUtilities: Remove TryForceLargeDataFormatFromPrefs entirely.
CreateDataStore and CreateListStore call resolveFormat() on the IO
collection. ArrayCreationUtilities: check k_InMemoryFormat sentinel
before skipping memory checks.
ITKArrayHelper/ITKTestBase: OOC checks use getStoreType() instead of
getDataFormat().empty(). IsArrayInMemory simplified from a 40-line
DataType switch to a single StoreType check.
ArraySelectionParameter: Remove EmptyOutOfCore handling; simplify to
just StoreType::Empty.
--- Tests ---
Add EmptyStringStore tests (6 cases: metadata, zero tuples, throwing
access, deep copy placeholder preservation, resize, isPlaceholder).
Add DataIOCollection hooks tests (format resolver, backfill handler).
Add IOFormat tests (7 cases: InMemory sentinel, empty format,
resolveFormat with/without plugin). Add IParallelAlgorithm OOC tests
(8 cases with MockOocDataStore: TBB enablement for in-memory, OOC,
and mixed arrays/stores).
Remove the "Target DataStructure Size" test from IOFormat.cpp — it
was a tautology that re-implemented the same arithmetic as
updateMemoryDefaults() without testing any edge case or behavior.
Fix RodriguesConvertorTest exemplar data: add missing expected values
for the 4th tuple (indices 12-15). The old CompareDataArrays broke
on the first floating-point mismatch regardless of magnitude, masking
this incomplete exemplar. The new chunked comparison correctly
continues past epsilon-close differences, exposing the missing data.
Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add comprehensive documentation to all new methods, type aliases, classes, and algorithms introduced in the OOC architecture rewrite. Every new public API now has Doxygen explaining what it does, how it works, and why it is needed. Algorithm implementations have step-by- step inline comments explaining the logic. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…ation layer Move the format resolver call site from the low-level DataStoreUtilities:: CreateDataStore/CreateListStore functions up to the array creation layer (ArrayCreationUtilities::CreateArray and ImportH5ObjectPathsAction). This is a prerequisite for the upcoming data store import handler refactor. Key architectural changes: 1. FormatResolverFnc signature expanded to (DataStructure, DataPath, DataType, dataSizeBytes). The resolver can now walk parent objects to determine geometry type, enabling it to force in-core for unstructured/ poly geometry arrays without caller-side checks. 2. Format resolution removed from DataStoreUtilities::CreateDataStore and CreateListStore. These are now simple factories that take an already- resolved format string. Callers are responsible for calling the resolver. 3. CreateArrayAction no longer carries a dataFormat member or constructor parameter. The k_DefaultDataFormat constant is removed. Format is resolved at execute time inside ArrayCreationUtilities::CreateArray. 4. ImportH5ObjectPathsAction gains a format-resolver loop that iterates Empty-store DataArrays after preflight import, consulting the resolver to decide which arrays to eager-load (in-core) vs leave for the backfill handler (OOC). 5. DataStoreIO::ReadDataStore and NeighborListIO::finishImportingData lose their inline format-resolution and OOC reference-store creation code. Format decisions for imported data are now made at the action level, not during raw HDF5 I/O. 6. Geometry actions (CreateGeometry1D/2D/3DAction, CreateVertexGeometry, CreateRectGridGeometry) lose their createdDataFormat parameter. They now materialize OOC topology arrays into in-core stores when the source arrays have StoreType::OutOfCore, since unstructured/poly geometry topology must be in-core for the visualization layer. 7. CheckMemoryRequirement simplified to a pure RAM check. OOC fallback logic removed since the resolver handles format decisions upstream. All filter callers updated to drop the dataFormat argument from CreateArrayAction constructors. Python binding updated (data_format parameter renamed to fill_value). Test files updated for new resolveFormat signature.
…arden .dream3d import Rename the "backfill handler" to "data store import handler" and expand its role to handle ALL data store loading from .dream3d files — in-core eager loading, OOC reference stores, and recovery reattachment. This replaces the split decision-making where ImportH5ObjectPathsAction ran a format-resolver loop and a separate backfill handler. Key changes: 1. DataIOCollection: Rename BackfillHandlerFnc to DataStoreImportHandlerFnc with expanded signature that includes importStructure. Rename set/has/runBackfillHandler to set/has/runDataStoreImportHandler. Add format display name registry (registerFormatDisplayName/getFormatDisplayNames) for human-readable format names in the UI dropdown. 2. DataStoreIO: Rename ReadDataStore to ReadDataStoreIntoMemory. Remove recovery reattachment code (OOC-specific HDF5 attribute checks moved to SimplnxOoc plugin). Add placeholder detection — compares physical HDF5 element count against shape attributes, returns Result<> with warning when mismatch detected (guards against loading placeholder datasets without the OOC plugin). Change return type to Result<shared_ptr<AbstractDataStore<T>>> so callers can accumulate warnings across arrays. 3. ImportH5ObjectPathsAction: Remove the format-resolver loop (79 lines). The action now delegates entirely to the registered handler when present, or falls back to FinishImportingObject for non-OOC builds. 4. CreateArrayAction: Restore dataFormat parameter for per-filter format override. When non-empty, bypasses the format resolver. Dropdown shows "Automatic" (resolver decides), "In Memory", and plugin-registered formats with display names. Fix 12 filter callers where fillValue was being passed as dataFormat after parameter reordering. 5. Dream3dIO: Route DREAM3D::ReadFile through ImportH5ObjectPathsAction so recovery and OOC hooks fire. Remove unused ImportDataObjectFromFile and ImportSelectDataObjectsFromFile. 6. Application: Add getDataStoreFormatDisplayNames() to expose display name registry to DataStoreFormatParameter. Updated callers: DataArrayIO (2 sites), NeighborListIO (2 sites), Dream3dIO (2 legacy helpers), DataStructureWriter (comment), 12 filter files, simplnxpy Python binding, DataIOCollectionHooksTest.
Replace the old Dream3dIO public API (ReadFile, ImportDataStructureFromFile, FinishImportingObject) with four new purpose-specific functions: - LoadDataStructure(path) — full load with OOC handler support - LoadDataStructureArrays(path, dataPaths) — selective array load with pruning - LoadDataStructureMetadata(path) — metadata-only skeleton (preflight) - LoadDataStructureArraysMetadata(path, dataPaths) — pruned metadata skeleton The new API eliminates the bool preflight parameter in favor of distinct functions, decouples pipeline loading from DataStructure loading, and centralizes the OOC handler integration in a single internal LoadDataStructureWithHandler function. Key changes: DataIOCollection: Add EagerLoadFnc typedef and pass it through the DataStoreImportHandlerFnc signature, replacing the importStructure parameter. The handler can now eager-load individual arrays via callback without knowing Dream3dIO internals. ImportH5ObjectPathsAction: Rewrite to use the new API — preflight calls LoadDataStructureMetadata, execute calls LoadDataStructure. The action no longer manages HDF5 file handles or deferred loading directly; it merges source objects into the pipeline DataStructure via shallow copy. ReadDREAM3DFilter: Switch preflight from ImportDataStructureFromFile(reader, true) to LoadDataStructureMetadata(path), removing manual HDF5 file open. Dream3dIO internals: Move LoadDataObjectFromHDF5, EagerLoadDataFromHDF5, PruneDataStructure, and LoadDataStructureWithHandler into an anonymous namespace. LoadDataStructureWithHandler implements the shared logic: build metadata skeleton, optionally delegate to the OOC import handler, fall back to eager in-core loading. Test callers: Switch ComputeIPFColorsTest, RotateSampleRefFrameTest, DREAM3DFileTest, and H5Test to UnitTest::LoadDataStructure. Add Dream3dLoadingApiTest with coverage for all four new functions. UnitTestCommon: Simplify LoadDataStructure/LoadDataStructureMetadata helpers to delegate directly to the new DREAM3D:: functions.
Add the namespace fs = std::filesystem alias to .cpp files that spell out std::filesystem, consistent with the existing convention used throughout the codebase (e.g., AtomicFile.cpp, FileUtilities.cpp, all ITK test files, UnitTestCommon.hpp). Files updated: Dream3dIO.cpp, ImportH5ObjectPathsAction.cpp, DataIOCollection.cpp, H5Test.cpp, UnitTestCommon.cpp, DREAM3DFileTest.cpp, ComputeIPFColorsTest.cpp.
Previously IDataStore provided a default implementation that returned an empty map, which silently disabled recovery metadata for any store subclass that forgot to override it. Make it pure virtual so every concrete store must explicitly state what (if any) recovery metadata it produces. DataStore overrides it to return an empty map (in-memory stores have no backing file or external state, so the recovery file's HDF5 dataset contains all the data needed to reconstruct the store). EmptyDataStore overrides it to throw std::runtime_error, matching the fail-fast behavior of every other data-access method on this metadata- only placeholder class. Querying recovery metadata on a placeholder is a programming error: the real store that replaces the placeholder during execution is the one responsible for providing recovery info. MockOocDataStore in IParallelAlgorithmTest.cpp gains a no-op override returning an empty map so it remains constructible. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…y format sentinel Addresses code review feedback on DataIOCollection ownership and factory error messages. Ownership clarification: * DataStoreUtilities::GetIOCollection() and Application::getIOCollection() now return DataIOCollection& instead of std::shared_ptr. The collection is owned by the Application singleton which outlives every caller, so a reference expresses non-ownership more clearly than a shared_ptr and prevents accidental lifetime extension. * WriteArrayOverrideGuard stores a DataIOCollection& member. Since the guard is already non-copyable and non-movable, a reference member is natural and the "may be null no-op" path was dropped (no caller used it). In-memory format sentinel hygiene: * CoreDataIOManager::formatName() now returns Preferences::k_InMemoryFormat instead of the empty string. Empty means "unset/auto" and k_InMemoryFormat means "explicit in-memory"; previously "" was doing double duty. * DataIOCollection constructor registers the core manager directly into the manager map, bypassing the addIOManager() guard. The guard still rejects plugin registrations under the reserved name. * createDataStore/createListStore fallbacks now look up the core manager from m_ManagerMap under k_InMemoryFormat instead of constructing a fresh local CoreDataIOManager. * ArrayCreationUtilities no longer translates k_InMemoryFormat to ""; the RAM-check path recognizes both sentinels as in-core. Actionable factory errors: * Added DataIOCollection::generateManagerListString() that produces a padded multi-line capability matrix of every registered IO manager and the store types it supports (DataStore, ListStore, StringStore, ReadOnlyRef(DataStore), ReadOnlyRef(ListStore)). Uses display names where registered, falling back to the raw format identifier. * Wired the helper into the existing CreateArray nullptr-check error message so users can immediately see which formats are available when a requested format is unknown. Tests updated to reflect the new reference API. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Convert DataIOCollection::addIOManager and the AbstractDataStore
copyIntoBuffer/copyFromBuffer family from throwing std::runtime_error /
std::out_of_range to returning Result<>. Updates all call sites to
propagate errors: Application::loadPlugin, Create{Vertex,1D,2D,3D}Geometry
actions, WriteVtkRectilinearGrid, and FillBadData phases (early-return
on error). IOFormat test now uses SIMPLNX_RESULT_REQUIRE_INVALID.
…tore Adds nx::core::Extent (a strided N-dimensional range struct, moved from simplnx-ooc) and pure-virtual readExtent/writeExtent on AbstractDataStore<T>. This is the primary bulk-access API for visualization code: the concrete store (DataStore for in-memory, HDF5ChunkedStore for OOC) picks the optimal I/O primitive per call, which lets the viz layer make one virtual call instead of hand-rolling per-row copyIntoBuffer loops. DataStore<T> implements the 3D fast path via std::memcpy on contiguous X rows (the hot path for image geometries in ZYX tuple order). The memcpy branch is guarded behind !std::is_same_v<T, bool> because std::vector<bool> has no .data() accessor; bool falls through to a per-tuple copy. 1D extents use a flat strided walk. EmptyDataStore returns empty / no-ops as a placeholder. Also updates MockOocDataStore in IParallelAlgorithmTest to override the new virtuals (otherwise the mock is abstract and can't be instantiated) and adds ExtentTest with 82 assertions covering construction, contains/overlaps/intersect, equality, and strided semantics.
…etter The force_ooc_data preference is intended as an independent user toggle that overrides the "in-memory format" choice — "force every eligible array to OOC regardless of the selected large-data format." But both the getter and setter were gated on m_UseOoc, which is only true when the selected format is something other than the in-memory sentinel. Effect of the bug: when the user's large-data format was 'Simplnx-Default-In-Memory' (the default), toggling Force OOC in the Preferences dialog had zero effect. The setter silently dropped the new value (because the gate returned early), and even if the key had been persisted in a prior session, the getter would always return false. The OocDataIOManager format resolver already handles the (forceOoc=true, userChoseInMemory=true) case correctly — it returns 'HDF5-OOC' for every eligible array. The upstream gate in the preference class prevented that design from ever being reached. Removing the gate in both the getter and setter lets the toggle work as designed: when Force OOC is on, the format resolver routes every eligible array to OOC storage even if the user's format preference still says in-memory. Discovered while debugging a 6-second drag-drop-to-outline latency on a 25 GB dataset — the user had Force OOC on and the 2 GB large-data threshold set, but arrays were still being eagerly loaded into RAM. The root cause was this preference gate silently defeating the user's choice. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
7a5a3c7 to
5bc9a95
Compare
* Undef min and max at the top of Extent.hpp so MSVC's transitively included Windows.h macros do not expand against the struct's member names in the constructor initializer list * Fixes cascading fmt/color.h namespace errors on v143 Windows builds caused by MSVC losing its parse state after the Extent syntax error * Re-sort includes in AbstractDataStore.hpp to satisfy clang-format Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Rename 13 algorithm files to their in-core variant names in preparation for adding OOC (out-of-core) dispatch alternatives. This enables git rename tracking so that subsequent optimization commits show proper diffs against the original algorithm code. Renames (SimplnxCore): FillBadData -> FillBadDataBFS IdentifySample -> IdentifySampleBFS ComputeBoundaryCells -> ComputeBoundaryCellsDirect ComputeFeatureNeighbors -> ComputeFeatureNeighborsDirect ComputeSurfaceAreaToVolume -> ComputeSurfaceAreaToVolumeDirect ComputeSurfaceFeatures -> ComputeSurfaceFeaturesDirect SurfaceNets -> SurfaceNetsDirect QuickSurfaceMesh -> QuickSurfaceMeshDirect DBSCAN -> DBSCANDirect ComputeKMedoids -> ComputeKMedoidsDirect MultiThresholdObjects -> MultiThresholdObjectsDirect Renames (OrientationAnalysis): BadDataNeighborOrientationCheck -> BadDataNeighborOrientationCheckWorklist No logic changes. InputValues structs and filter classes unchanged.
…ntationAnalysis
Replace per-element DataStore access with chunked bulk I/O
(copyIntoBuffer/copyFromBuffer) across 60+ algorithm files to eliminate
virtual dispatch overhead and HDF5 chunk thrashing when arrays are backed
by out-of-core storage.
--- Architecture ---
DispatchAlgorithm pattern (Direct/Scanline):
11 algorithms gain a base dispatcher class that selects between an
in-core Direct implementation and an OOC Scanline variant at runtime
based on IsOutOfCore()/ForceOocAlgorithm():
SimplnxCore: ComputeBoundaryCells, ComputeFeatureNeighbors,
ComputeKMedoids, ComputeSurfaceAreaToVolume, ComputeSurfaceFeatures,
DBSCAN, MultiThresholdObjects, QuickSurfaceMesh, SurfaceNets
OrientationAnalysis: BadDataNeighborOrientationCheck, ComputeIPFColors
ComputeGBCDPoleFigure dispatches directly from its filter executeImpl().
Connected Component Labeling (CCL) pattern:
4 algorithms gain a two-pass CCL variant as an OOC alternative to
random-access BFS/DFS flood-fill:
SimplnxCore: FillBadData (BFS/CCL), IdentifySample (BFS/CCL)
OrientationAnalysis: EBSDSegmentFeatures, CAxisSegmentFeatures
The CCL engine in SegmentFeatures::executeCCL() scans voxels in Z-Y-X
order with a 2-slice rolling buffer and UnionFind equivalence tracking,
giving sequential I/O access patterns. Supports Face and FaceEdgeVertex
connectivity with optional periodic boundaries.
--- New utility infrastructure ---
- UnionFind (src/simplnx/Utilities/UnionFind.hpp):
Vector-based disjoint set with union-by-rank and path-halving.
- SliceBufferedTransfer (src/simplnx/Utilities/SliceBufferedTransfer.hpp):
Z-slice buffered tuple transfer for propagating neighbor voxel data
used by ErodeDilate, FillBadData, MinNeighbors, and ReplaceElements.
- TupleTransfer batch API (Filters/Algorithms/TupleTransfer.hpp):
Batch bulk I/O methods for QuickSurfaceMesh and SurfaceNets mesh
generation attribute transfer.
- SegmentFeaturesTestUtils.hpp:
Shared test builder functions for segmentation filter test suites.
--- Bulk I/O conversions (existing algorithms) ---
Core utilities:
DataArrayUtilities (ImportFromBinaryFile, AppendData, CopyData,
mirror ops), DataGroupUtilities (RemoveInactiveObjects),
ClusteringUtilities (RandomizeFeatureIds), GeometryHelpers
(FindElementsContainingVert, FindElementNeighbors),
AlignSections (Z-slice OOC transfer path),
ImageRotationUtilities (source slab caching for nearest-neighbor),
TriangleUtilities (bulk-load triangles/labels for winding repair),
H5DataStore (streaming row-batch FillOocDataStore replacing full-
dataset allocation)
SimplnxCore algorithms:
AlignSectionsFeatureCentroid, ComputeEuclideanDistMap,
ComputeFeatureCentroids, ComputeFeatureClustering, ComputeFeatureSizes,
CropImageGeometry, ErodeDilateBadData, ErodeDilateCoordinationNumber,
ErodeDilateMask, RegularGridSampleSurfaceMesh, RequireMinimumSizeFeatures,
ReplaceElementAttributesWithNeighborValues, ScalarSegmentFeatures,
WriteAvizoRectilinearCoordinate, WriteAvizoUniformCoordinate
OrientationAnalysis algorithms:
AlignSectionsMisorientation, AlignSectionsMutualInformation,
ComputeAvgCAxes, ComputeAvgOrientations, ComputeCAxisLocations,
ComputeFeatureNeighborCAxisMisalignments,
ComputeFeatureReferenceCAxisMisorientations,
ComputeFeatureReferenceMisorientations, ComputeGBCD,
ComputeGBCDMetricBased, ComputeKernelAvgMisorientations,
ComputeTwinBoundaries, ConvertOrientations, MergeTwins,
NeighborOrientationCorrelation, RotateEulerRefFrame, WriteGBCDGMTFile,
WriteGBCDTriangleData, WritePoleFigure
EBSD readers:
ReadAngData, ReadCtfData, ReadH5Ebsd, ReadH5EspritData
--- Test infrastructure ---
- UnitTestCommon: ExpectedStoreType()/RequireExpectedStoreType() helpers,
TestFileSentinel reference-counted decompression, CompareDataArrays
rewritten with chunked bulk I/O for OOC-safe comparison.
- 29 test files updated with OOC dual-path testing:
ForceOocAlgorithmGuard + GENERATE(from_range(k_ForceOocTestValues))
runs every test case in both in-core and forced-OOC modes.
… bugs Add CreateResolvedDataStore utility that runs the IOCollection format resolver before creating a DataStore, matching the path filter actions use. Update test builder functions to call it so that test-constructed arrays become OOC stores when the OOC plugin is active. Fix three bugs in the OOC ComputeAvgOrientations Rodrigues average: - Allow featureId 0 in accumulation (matching architecture branch) - Start normalization loop from featureId 0 - Add missing continue for zero-count features to avoid divide-by-zero Fix stale GetIOCollection API call in UnitTestCommon (shared_ptr to ref). Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…mized algorithms Adds extensive documentation across all out-of-core optimized filter algorithms explaining what each algorithm does and why the OOC variant works the way it does. Targets readers with no prior OOC knowledge. - Headers: Doxygen @Class, @brief, @param on all classes, methods, InputValues structs, and member variables - Source files: file-level overviews, Doxygen on operator()(), and inline comments explaining rolling windows, buffer strategies, dispatch logic, and OOC rationale - Filter docs: Algorithm sections with In-Core/Out-of-Core/Performance subsections added to ~45 filter markdown files - Key utilities: SliceBufferedTransfer.hpp and TupleTransfer.hpp documented as core OOC infrastructure
5bc9a95 to
5a6a024
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds out-of-core (OOC) optimized algorithm variants for 30+ filters, using
DispatchAlgorithmto select between in-core (Direct/BFS) and OOC (Scanline/CCL) code paths at runtime based on data store type. A preparatory rename commit gives git rename tracking so that GitHub shows meaningful diffs against the original algorithm code.This PR contains only the filter optimization layer. The core OOC infrastructure (
copyIntoBuffer/copyFromBufferAPI,HDF5ChunkedStore,OocDataIOManager, etc.) is in a separateooc-architecture-rewritebranch that this PR stacks on top of.Branch Structure
Commit 0 — Rename for Git Tracking
Renames 13 algorithm files to their in-core variant names before any logic changes, so that when dispatch variants are introduced, GitHub shows proper diffs against the original code instead of "new file" with no context.
Bug Fixes
OOC import of legacy SIMPL files with multi-dimensional component arrays
Legacy SIMPL
.dream3dfiles store multi-dimensional component arrays (e.g., GBCD with componentShape[10,10,10,20,20,2]) with HDF5 physical dimensions in reversed order relative to theComponentDimensionsattribute.Two fixes address this at different layers:
AbstractOocStore::readHdf5(SimplnxOoc): Detects shape mismatch between logical and physical dimensions before the streaming import path. Falls back to flat bulk read (H5S_ALL) when shapes differ, preserving correct byte order.ImportH5ObjectPathsAction::backfillReadOnlyOocStores(simplnx): The read-only reference store optimization creates stores pointing directly at the source file. For mismatched arrays, the N-D hyperslabs would be out-of-bounds. Detects the mismatch and creates a writable OOC store populated viareadHdf5(which triggers the flat-read fallback) instead of a read-only reference.Filter Optimizations
Group B — Face-Neighbor Filters (5 filters)
Split into Direct (in-core) and Scanline (OOC) algorithm classes using
DispatchAlgorithm. Scanline variants use Z-slice rolling windows (prev/cur/next) for cross-slice neighbor access with zero per-element OOC overhead.Filters:
ComputeBoundaryCells,ComputeSurfaceFeatures,ComputeFeatureNeighbors,ComputeSurfaceAreaToVolume,BadDataNeighborOrientationCheckGroup C — Morphological / Neighbor Replacement (5 filters)
Z-slice rolling buffers for all 6 face-neighbor reads from RAM.
SliceBufferedTransferfor type-dispatched bulk tuple copy.Filters:
ErodeDilateBadData,ErodeDilateCoordinationNumber,ErodeDilateMask,ReplaceElementAttributesWithNeighborValues,NeighborOrientationCorrelationGroup D — CCL Segmentation (5 filters)
Chunk-sequential Connected Component Labeling using
UnionFindequivalence tracking, replacing BFS/DFS flood fill for OOC data.Filters:
ScalarSegmentFeatures,EBSDSegmentFeatures,CAxisSegmentFeatures,FillBadData,IdentifySampleGroup E — AlignSections Family (4 filters)
Bulk slice read/write via
AlignSectionsTransferDataOocImpl. Per-filter OOCfindShiftswith 2-slice buffers and bulk mask reads.Filters:
AlignSectionsMisorientation,AlignSectionsMutualInformation,AlignSectionsFeatureCentroid,AlignSectionsListFilterQuickSurfaceMesh
DispatchAlgorithm<QuickSurfaceMeshDirect, QuickSurfaceMeshScanline>. Scanline eliminates the O(volume)nodeIdsarray (7.5 GB for 1000³) with rolling 2-plane node buffers (16 MB). Two-pass architecture: counting pass + mesh creation pass. All output arrays (triangle connectivity, faceLabels, vertex coordinates, nodeTypes) buffered per z-slice and flushed withcopyFromBuffer. BatchquickSurfaceTransferBatchAPI added toTupleTransferfor bulk source-read/dest-write of cell and feature data.SurfaceNets
DispatchAlgorithm<SurfaceNetsDirect, SurfaceNetsScanline>. Scanline is a complete reimplementation (881 lines) eliminating the O(n)Cell[]array — uses O(surface) hash map + vertex vectors with slice-by-slice FeatureIds reading. All output arrays (vertices, nodeTypes, triangle connectivity, faceLabels) buffered and flushed withcopyFromBuffer. BatchsurfaceNetsTransferBatchAPI added toTupleTransferfor bulk I/O.Mesh Infrastructure (RepairTriangleWinding + GeometryHelpers)
RepairTriangleWinding: Bulk-reads triangle face list and faceLabels into local buffers; all BFS work operates on local memory; modified triangles written back viacopyFromBuffer.FindElementsContainingVert/FindElementNeighbors(GeometryHelpers.hpp): Chunked bulk I/O with 65K-element chunks for sequential passes. Random neighbor lookups check if candidate is in the current chunk (cache hit) before falling back to per-elementcopyIntoBuffer. Together with RepairTriangleWinding buffering, this reduced SurfaceNets Winding from 515s to 2.9s.Clustering Filters (3 filters)
DBSCAN:DispatchAlgorithm<DBSCANDirect, DBSCANScanline>— chunked grid construction, on-demand per-grid-cell coordinate reads incanMerge. 653s → 12s (54x)ComputeKMedoids:DispatchAlgorithm<Direct, Scanline>— chunkedfindClusters, per-clusteroptimizeClusterswith O(max_cluster_size) peak memory. 74s → 13s (5.7x)ComputeFeatureClustering: Single implementation with feature-level array caching. 203s → 77s (2.6x)Pipeline Prerequisite Filters (2 filters)
MultiThresholdObjects:DispatchAlgorithm<Direct, Scanline>— eliminates O(n)tempResultVectorin OOC pathConvertOrientations: Single implementation with chunked bulk I/O in macro-generated Convertor classes (4096-tuple chunks)Together these reduced the AlignSectionsMisorientation pipeline test from 635s to 5.9s (107x).
OrientationAnalysis Misc (10 filters)
ComputeTwinBoundaries: Bulk-read all face/feature/ensemble arrays into local vectors. 179s → 44s (4x)ComputeKernelAvgMisorientations: Slab-based bulk I/O with cached CrystalStructuresComputeAvgCAxes: Already OOC-optimized (chunked reads, cached feature output). Compute-bound.ReadH5Ebsd:copyFromBufferin CopyData template, phase copy, Euler interleaving. 463s → 241s (1.9x)ComputeGBCDPoleFigure:DispatchAlgorithm<Direct, Scanline>— Direct caches full GBCD, Scanline caches only the phase-of-interest slice (bounded by bin resolution, not cell count). 853s → 0.9s (948x)ComputeFeatureReferenceCAxisMisorientations: Z-slice buffered I/O for all cell-level arrays (featureIds, cellPhases, quats, output). Cached ensemble/feature-level arrays (crystalStructures, avgCAxes). 196s → 5.4s (36x)ComputeFeatureNeighborCAxisMisalignments: Bulk-read all feature-level arrays (featurePhases, featureAvgQuat, crystalStructures) and buffered avgCAxisMisalignment output.MergeTwins: Chunked bulk I/O for voxel-level parent ID fill and assignment loop. Feature-level featureParentIds cached locally for lookup. 67s → 1.8s (37x)ReadCtfData: BulkcopyFromBufferfor all cell arrays (phases, euler angles, bands, error, MAD, BC, BS, X, Y). Euler angle interleave uses chunked 64K buffer. Crystal structures cached locally for hex correction. 231s → 0.25sReadAngData: Same bulkcopyFromBufferpattern. Phase validation done in-place on EbsdLib buffer before single bulk write. Euler interleave chunked.Pipeline-Critical Filters (6 filters)
Optimizations targeting the filters responsible for OOC pipeline timeouts (4 of 5 timed-out pipelines blocked by
ComputeIPFColors):ComputeIPFColors:DispatchAlgorithm<ComputeIPFColorsDirect, ComputeIPFColorsScanline>. Direct keeps parallelParallelDataAlgorithmfor in-core; Scanline uses chunked sequential bulk I/O (65K-tuple chunks) with locally cached crystal structures.ForceOocAlgorithmGuardadded to test. 1,937ms → 90ms (21.5x)ComputeFeatureSizes: ChunkedcopyIntoBufferfor featureIds (ImageGeom path) and featureIds + elemSizes (RectGridGeom path with Kahan summation preserved). 813ms → 28ms (29x)ComputeAvgOrientations: Chunked featureIds/phases/quats reads, locally cached crystal structures and avgQuats (feature-level). BulkcopyFromBufferfor output arrays.ComputeFeatureReferenceMisorientations: Chunked all cell-level arrays (featureIds, phases, quats, GB distances, output misorientations). Locally cached crystal structures, avgQuats, and center quaternions (all feature/ensemble-level). 106ms → 1ms (106x)ComputeFeatureCentroids: ReplacedAbstractDataStoreintermediate arrays (sum, center, count, rangeX/Y/Z) with plainstd::vector— eliminates ~119M virtual dispatch calls per run. Chunked featureIds reads. Inline coordinate computation from spacing/origin. 39,724ms → 25ms (1,589x)RequireMinimumSizeFeatures: Three-part optimization:removeSmallFeatures: Chunked featureIds read/write (65K-tuple batches)assignBadVoxels: 3-slice rolling slab buffer for neighbor voting scan (O(slice) memory), sparse changed-voxel tracking to skip full-volume transfer when few/no voxels changed. 14,592ms → 142ms (103x)RemoveInactiveObjects(shared utility inDataGroupUtilities.cpp): Chunked featureIds renumbering withcopyIntoBuffer/copyFromBuffer. 5,573ms → 50ms (111x)Additional Filters
ComputeEuclideanDistMap: Bulk-read featureIds and distance stores into local vectors; flood-fill operates on local memory; bulk-write output. 116s → 1.1s (105x)AppendImageGeometry: Bulk I/O for mirror operations (scanline-based reversal instead of per-tuple swaps). 469s → 113s (4.2x)GBCD Filter Group (5 filters)
All five GBCD filters optimized for OOC with zero cell-level O(n) allocations, cancel checking, and progress messaging:
ComputeGBCDPoleFigure:DispatchAlgorithm<Direct, Scanline>withForceOocAlgorithmGuardin test. Scanline caches only the phase-of-interest GBCD slice viacopyIntoBuffer.WriteGBCDGMTFile: Phase-of-interest GBCD slice cached viacopyIntoBuffer; crystal structures cached locally.WriteGBCDTriangleData: Chunked triangle I/O (8K chunks), feature-level euler cache, buffered file output viafmt::format_to+fmt::memory_buffer.ComputeGBCD: Feature-level caching (eulers, phases, crystalStructures), chunked triangle array reads per 50K-triangle iteration, GBCD output accumulated in local buffer (bounded by phases × bins) then written back viacopyFromBuffer.ComputeGBCDMetricBased: Eliminated O(n)triIncludedallocation (replaced with per-chunk sequential area accumulation). Feature-level caching (phases, eulers, crystalStructures, featureFaceLabels). Chunked triangle I/O in totalFaceArea scan. Raw pointer access in parallel TrianglesSelector worker.HDF5 Import + Pole Figure Filters (3 filters)
FillOocDataStore(shared infrastructure): Streaming chunked HDF5 hyperslab reads +copyFromBuffer, with zero O(n) temp allocations — batched reads even for partial hyperslabs. Benefits all HDF5 import paths.ReadH5EspritData:copyFromBufferbulk writes from raw HDF5 reader buffers, replacing 9+ per-elementoperator[]writes per point.WritePoleFigure: Chunked iteration over eulerAngles/phases/mask per-phase using bounded buffers (no O(n) pre-caching); bulk-write intensity and image outputs viacopyFromBuffer.ReadHDF5Dataset: Cancel checking + per-dataset progress messages.WritePoleFigureTestandReadHDF5DatasetTestoptimized withcopyIntoBuffer.Core Utilities + Geometry Filters
ImportFromBinaryFile:copyFromBufferinstead of per-element writes. ReadRawBinary Case1: 1076s → 29s (37x)CropImageGeometry: Row-based bulk I/O. 27s → 2.6s (10x)RandomizeFeatureIds(ClusteringUtilities): Chunked bulk I/O for both overloads — benefits all callers (segmentation filters, SharedFeatureFace, MergeTwins).AppendData/CopyData/mirror swaps: Runtime OOC check — chunked bulk I/O for OOC, original code for in-core (verified zero in-core regression)TupleTransfer: AddedquickSurfaceTransferBatchandsurfaceNetsTransferBatchbatch APIs with bulkcopyIntoBuffer/copyFromBufferfor source reads and destination writes. Used by QuickSurfaceMeshScanline and SurfaceNetsScanline.Cancel + Progress Messaging
All in-core and OOC algorithm variants now have:
m_ShouldCancelchecks at the top of major outer loopsThrottledMessenger-based progress reporting with descriptive phase messages and percentage completionOOC Performance Results
All benchmarks on arm64 Release build with
forceOocData = true.Mesh Generation Filters (full ctest wall-clock, OOC build)
Groups B–E (200³ dataset, filter.execute() only)
Pipeline-Critical Filters (filter.execute() only, OOC build)
OrientationAnalysis Filters (full ctest wall-clock, OOC build)
GBCD Filter Group (full ctest wall-clock)
HDF5 Import + Pole Figure Filters (full ctest wall-clock)
Additional Optimizations (full ctest wall-clock)
Test Infrastructure
Rotation Filter Bulk I/O
RotateSampleRefFrame: Slab-based bulk I/O inRotateImageGeometryWithNearestNeighbor— reads source Z-slabs viacopyIntoBuffer, processes output slices into local buffers, writes viacopyFromBuffer. No O(n) allocation.RotateEulerRefFrame: ChunkedcopyIntoBuffer/copyFromBuffer(65K tuples per chunk). 19.5s → 4.8s (4x)Comparison Function Bulk I/O
CompareFloatArraysWithNans,CompareArrays, andCompareDataArraysByComponentin UnitTestCommon.hpp were doing per-elementoperator[]access, causing extreme slowdowns when comparing OOC-backed arrays. Replaced with chunkedcopyIntoBufferreads (40K elements per chunk), matching the existingCompareDataArrayspattern. This alone reduced the ComputeGBCD test from 1500s (timeout) to ~10s — the filter itself runs in ~3s.ForceOocAlgorithmGuardcoverage in all optimized filter tests for both algorithm pathsSIMPLNX_TEST_ALGORITHM_PATHCMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path controlTest Plan