Memory management for target images

If we need to deal with memory issues, here is an idea.

# Background
Our current test case uses 300 brain images and is able to load all the images into memory at once in [nilearn_prototype.py](https://github.com/brain-microstructure-exploration-tools/abcd-data-exploration/blob/main/prototype/nilearn_prototype.py).

Currently in [workflow.py](https://github.com/brain-microstructure-exploration-tools/abcdstats/blob/main/src/abcdstats/workflow.py), we instead create lazily loading `nib.filebasedimages.FileBasedImage` objects for each of the 300 images, and give the list of them to `nilearn.maskers.NiftiMasker.fit_transform()` to mask them.  I don't know the `nilearn` internals, but it is possible that with this approach only the output of `fit_transform()` -- the voxels that survive masking -- are in memory for all images at once.

# Idea
Regardless, a larger set of target images might face memory issues.  The idea for dealing with that is that we'd make multiple calls to `nilearn.mass_univariate.permuted_ols`, each focusing on a different subset of the voxels.  This can be done because each voxel is analyzed separately -- well, see below -- though likely in a pipeline that handles several of them in parallel.  So we could slice the brain into sections of `[:, :, z]` (or `[:, :, min_z:max_z]`) for each image before masking, mask with a similarly sliced mask and give just one slice at a time to `permuted_ols`.

## Implementation details
1. We are going to want to use the likes of
```python
img = nib.load(filename)
slice = img.dataobj[:, :, z]  # Yes
slice = img.get_fdata()[:, :, z]  # No!
```
because `get_fdata()` loads in _and caches_ all voxels, not just the desired ones.

2. We are likely going to want to use `[:, :, z]` rather than `[x, :, :]` because it seems to be the case that we will find that `slice.flags['F_CONTIGUOUS'] == True` -- it is Fortran order not C order for loaded Nifti information.  I suppose our code could check the flag each run and behave accordingly.

1. We may want to use `[:, :, min_z:max_z]` instead of a single `z` value at a time.  We should probably aim for a certain amount of RAM and then compute a value for `max_z - min_z` that gets us about that much RAM when summed across all our images.

1. `nilearn.mass_univariate.permuted_ols` returns p-values.  There _might_ be some sort of multiple-testing correction that is done and, if so, it will not know about the tests done in the other slices.  If multiple-testing correction is being done within `permuted_ols`, hopefully it can be corrected with some simple mathematics on the output of the `permuted_ols` calls.

1. If we ask for threshold-free cluster enhancement (tcfe) from `permuted_ols` -- which we currently are _not_ doing because we can't get it to work reasonably -- then beware that the clusters are based upon voxel neighborhoods.  So, we'd get clusters within each slice, but that's unlikely to be directly useful, and I don't have any big ideas on how to combine the slice clusters across slices.  I suppose it is possible that we could call `permuted_ols` by the slice, without `tcfe`, and collate the answers.  We might then find a routine that applies tcfe (once) to that now recombined data set.  (The recombined output will be much smaller than the input, so memory issues wouldn't be a factor at this point.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory management for target images #10

Background

Idea

Implementation details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Memory management for target images #10

Description

Background

Idea

Implementation details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions