Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t2smap OOM #3125

Open
bpinsard opened this issue Nov 1, 2023 · 4 comments
Open

t2smap OOM #3125

bpinsard opened this issue Nov 1, 2023 · 4 comments
Assignees
Labels

Comments

@bpinsard
Copy link
Collaborator

bpinsard commented Nov 1, 2023

What happened?

Processing session with 6 runs of multi-echo, the jobs get killed by SLURM, despite setting memory reqs of SLURM with a larger buffer to memory given to fmriprep.
tedana t2smap is the one crashing so nipype-set reqs seems not to properly estimate the memory reqs for the nodes.

I know that this problem has been reported before, but it seems to still be present in 23.1.4.
Each echo file nii.gz is approximately .4Gb
Current heuristics is mem_gb=2.5 * mem_gb * len(echo_times).
So it estimates memory reqs at ~3Gb, but core dumps when OOM occurs are 8Gb, and that what basic top gives me as well.

I will try to run memory profiling of t2smap alone on our data to figure out a better heuristic.

What command did you use?

containers-run -m 'fMRIPrep_sub-01/ses-001' -n bids-fmriprep --input sourcedata/templateflow/tpl-MNI152NLin2009cAsym/ --input sourcedata/templateflow/tpl-OASIS30ANTs/ --input sourcedata/templateflow/tpl-fsLR/ --input sourcedata/templateflow/tpl-fsaverage/ --input sourcedata/templateflow/tpl-MNI152NLin6Asym/ --output . --input 'sourcedata/cneuromod.emotion-videos/sub-01/ses-001/fmap/' --input 'sourcedata/cneuromod.emotion-videos/sub-01/ses-001/func/' --input 'sourcedata/cneuromod.anat.smriprep.longitudinal/sub-01/anat/' --input sourcedata/cneuromod.anat.smriprep.longitudinal/sourcedata/cneuromod.anat.freesurfer_longitudinal/sub-01/ -- -w ./workdir --participant-label 01 --anat-derivatives sourcedata/cneuromod.anat.smriprep.longitudinal --fs-subjects-dir sourcedata/cneuromod.anat.smriprep.longitudinal/sourcedata/cneuromod.anat.freesurfer_longitudinal --bids-filter-file code/fmriprep_study-cneuromod.emotion-videos_sub-01_ses-001_bids_filters.json --output-layout bids --ignore slicetiming --use-syn-sdc --output-spaces MNI152NLin2009cAsym T1w:res-iso2mm --cifti-output 91k --notrack --write-graph --skip_bids_validation --omp-nthreads 8 --nprocs 8 --mem_mb 45000 --fs-license-file code/freesurfer.license \--me-output-echos --resource-monitor sourcedata/cneuromod.emotion-videos ./ participant

What version of fMRIPrep are you running?

23.1.4

How are you running fMRIPrep?

Singularity

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

Anatomical derivatives

Please copy and paste any relevant log output.

No response

Additional information / screenshots

No response

@bpinsard bpinsard added the bug label Nov 1, 2023
@bpinsard bpinsard self-assigned this Nov 1, 2023
@effigies
Copy link
Member

effigies commented Nov 1, 2023

Just linking relevant issues/PRs:

@bpinsard
Copy link
Collaborator Author

bpinsard commented Nov 3, 2023

I realized that I get that warning in the logs

231103-02:09:47,698 nipype.workflow WARNING:
         Some nodes exceed the total amount of memory available (45.00GB).

I cannot imagine which operation would require that amount of memory for 5min runs of 2mm iso fMRI.
Looking for a way to get the node mem_gb values, apparently it's not in the exported graph (--write-graph).

Looking at the code, the likely case is that nodes set with mem_gb = mem_gb * 3 * omp_nthreads and maybe using resampled mem_gb estimate, could reach that much memory reqs because I used omp_nthreads=8.

However t2smap is likely not the largest mem_gb req set in that workflow.

@effigies
Copy link
Member

effigies commented Nov 3, 2023

Here's what we calculate:

config.loggers.workflow.debug(
"Creating bold processing workflow for <%s> (%.2f GB / %d TRs). "
"Memory resampled/largemem=%.2f/%.2f GB.",
ref_file,
mem_gb["filesize"],
bold_tlen,
mem_gb["resampled"],
mem_gb["largemem"],
)

def _create_mem_gb(bold_fname):
img = nb.load(bold_fname)
nvox = int(np.prod(img.shape, dtype='u8'))
# Assume tools will coerce to 8-byte floats to be safe
bold_size_gb = 8 * nvox / (1024**3)
bold_tlen = img.shape[-1]
mem_gb = {
"filesize": bold_size_gb,
"resampled": bold_size_gb * 4,
"largemem": bold_size_gb * (max(bold_tlen / 100, 1.0) + 4),
}
return bold_tlen, mem_gb

I don't really remember the logic for the largemem one. But you should be able to see the estimates in your logs.

@effigies
Copy link
Member

effigies commented Nov 7, 2023

Let's go ahead and link ME-ICA/tedana#856.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants