Skip to content

Memory management for target images #10

@Leengit

Description

@Leengit

If we need to deal with memory issues, here is an idea.

Background

Our current test case uses 300 brain images and is able to load all the images into memory at once in nilearn_prototype.py.

Currently in workflow.py, we instead create lazily loading nib.filebasedimages.FileBasedImage objects for each of the 300 images, and give the list of them to nilearn.maskers.NiftiMasker.fit_transform() to mask them. I don't know the nilearn internals, but it is possible that with this approach only the output of fit_transform() -- the voxels that survive masking -- are in memory for all images at once.

Idea

Regardless, a larger set of target images might face memory issues. The idea for dealing with that is that we'd make multiple calls to nilearn.mass_univariate.permuted_ols, each focusing on a different subset of the voxels. This can be done because each voxel is analyzed separately -- well, see below -- though likely in a pipeline that handles several of them in parallel. So we could slice the brain into sections of [:, :, z] (or [:, :, min_z:max_z]) for each image before masking, mask with a similarly sliced mask and give just one slice at a time to permuted_ols.

Implementation details

  1. We are going to want to use the likes of
img = nib.load(filename)
slice = img.dataobj[:, :, z]  # Yes
slice = img.get_fdata()[:, :, z]  # No!

because get_fdata() loads in and caches all voxels, not just the desired ones.

  1. We are likely going to want to use [:, :, z] rather than [x, :, :] because it seems to be the case that we will find that slice.flags['F_CONTIGUOUS'] == True -- it is Fortran order not C order for loaded Nifti information. I suppose our code could check the flag each run and behave accordingly.

  2. We may want to use [:, :, min_z:max_z] instead of a single z value at a time. We should probably aim for a certain amount of RAM and then compute a value for max_z - min_z that gets us about that much RAM when summed across all our images.

  3. nilearn.mass_univariate.permuted_ols returns p-values. There might be some sort of multiple-testing correction that is done and, if so, it will not know about the tests done in the other slices. If multiple-testing correction is being done within permuted_ols, hopefully it can be corrected with some simple mathematics on the output of the permuted_ols calls.

  4. If we ask for threshold-free cluster enhancement (tcfe) from permuted_ols -- which we currently are not doing because we can't get it to work reasonably -- then beware that the clusters are based upon voxel neighborhoods. So, we'd get clusters within each slice, but that's unlikely to be directly useful, and I don't have any big ideas on how to combine the slice clusters across slices. I suppose it is possible that we could call permuted_ols by the slice, without tcfe, and collate the answers. We might then find a routine that applies tcfe (once) to that now recombined data set. (The recombined output will be much smaller than the input, so memory issues wouldn't be a factor at this point.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions