Is there GPU/CUDA support for eddy? #94

finalelement · 2021-06-01T02:50:57Z

The Docker file does not seem to contain any kind of CUDA installation. Is there any kind of future plan to extend this app with GPU support?

mrtrix3_connectome.py: [WARNING] CUDA version of FSL "eddy" present on system, but does not execute successfully; OpenMP version will instead be used

The text was updated successfully, but these errors were encountered:

Lestropie · 2021-06-04T01:35:39Z

Hi Vishwesh,

I have not yet invested time into attempting to get CUDA working within Docker for this tool. Singularity is much more amenable to running on high-performance computing services, and I was able to get the CUDA version of FSL's eddy working from within the Singularity container with fairly minimal guidance from my local HPC sysadmin. This script can also be used in a stand-alone fashion outside of a container, as long as the requisite softwares are installed on your system, and so if CUDA is properly set up on the local system it will be utilized in that scenario also.

If there's demand for getting CUDA working from within Docker, I can have it as a listed issue; but I'm unlikely to be able to implement a solution myself in the near future as I happen to not have easy access to a Linux system with functioning CUDA at the present moment.

Cheers
Rob

bwinsto2 · 2021-07-22T22:27:20Z

Hi Rob, thanks for making this great app! Any tips on getting eddy_cuda working on Singularity? Is there a specific version of cuda I need to match the eddy that's in the container?

In the past (but not on this particular computer), I've had success running singularity run with the --nv flag to get eddy_cuda working on a different BIDS app. On this computer I've gotten eddy_cuda working on a different BIDS app through docker. I also was able to make the nvidia/cuda singularity image output nvidia-smi, meaning my singuliarty/cuda compatibility should be working.

This is my command:

And here's the error I get (assuming the system in this case means the container):

Would be grateful for any advice!

Brian

Lestropie · 2021-07-23T02:33:00Z

Hi Brian,

For me personally, getting CUDA working through Singularity only required a fairly brief exchange with the sysadmins of the relevant HPC. It involved setting envvar SINGULARITYENV_LD_LIBRARY_PATH in order to control the contents of LD_LIBRARY_PATH within the container running environment, and adding a dedicated bind path for the CUDA libraries corresponding to the loaded CUDA module.

If you want to see the particular complaint generated by eddy_cuda, you could run the container with -output_verbosity 4 (#49), which will result in the scratch directory being created in your output path; you can then dig through to the dwifslpreproc scratch directory within that, which should contain the file eddy_cuda_failure_output.txt. Most likely it'll be a "library not found" error, but you could confirm nonetheless.

If there are other tools that are able to provide CUDA support in Singularity in a way that requires less hoops from the user's perspective, I'd be interested in adopting such solutions; but it's not something I can afford to spend a large amount of time investigating.

Hope that's enough to set you on a path to a working solution
Rob

bwinsto2 · 2021-07-28T04:35:10Z

Thanks for the reply, Rob. A colleague and I have been trying to get this working for a few days without success. Do you still have access to the exact command you used? We are running it on a machine with only CUDA 9.1, so versioning shouldn't be an issue, despite getting the same error message as this thread.

We've tried using --env LD_LIBRARY_PATH=/usr/local/cuda-9.1 and also binding in /usr/local/cuda-9.1 with the -B flag but get the same error message, and we're not sure whether the --nv flag is correct to use in this case. I can confirm with a check of env in the container that LD_LIBRARY_PATH includes /usr/local/cuda-9.1, and running the command nvidia-smi in the container also works, so this is perplexing.

If you still have the exact command you used that would be a helpful piece of information.

Cheers,
Brian

Lestropie · 2021-07-28T07:06:33Z

With my own use I'm actually executing a different container, but mrtrix3_connectome.py is invoked from within that. Here's what my SLURM script looks like (with some private paths redacted via envvars):

module load singularity/3.5.2
module load cuda/9.1

BIDSDIR=$SCRATCH/BIDS/
OUTDIR=$SCRATCH/derivatives/
SCRATCHDIR=$SCRATCH/work/
SINGULARITY_IMAGE=$SCRATCH/container.sif
HOST_LD_LIBRARY_PATH=/usr/local/cuda/9.1/extras/CUPTI/lib64:/usr/local/cuda/9.1/lib64:/usr/local/cuda/9.1/lib:/usr/local/cuda/9.1/lib64/stubs:/opt/munge-0.5.11/lib:/opt/slurm-19.05.4/lib:/opt/slurm-19.05.4/lib/slurm

nvidia-smi

cmd="SINGULARITYENV_LD_LIBRARY_PATH=$HOST_LD_LIBRARY_PATH \
    singularity run --nv \
    --bind $PROJECT:$PROJECT,$SCRATCH:$SCRATCH,/usr/local/cuda/9.1:/usr/local/cuda/9.1 \
    $SINGULARITY_IMAGE \
    $BIDSDIR $OUTDIR participant \
    -participant_label $SUBJECT \
    -scratch $SCRATCHDIR "
echo $cmd; eval $cmd

bwinsto2 · 2021-07-31T21:20:11Z

Good lookin' out, @Lestropie! We got it to work. Here is the command we used, in case it's useful to anyone who sees this. We are running on GCP running Rhel 7 with only cuda 9.1 installed:

PROJECT=[redacted]
BIDSDIR=$PROJECT/raw_data
OUTDIR=$PROJECT/derivatives
ANAT=$OUTDIR/fmriprep/sub-105923/anat/sub-105923_desc-preproc_T1w.nii.gz
SINGULARITY_IMAGE=[redacted]/mrtrix3_connectome.sif
HOST_LD_LIBRARY_PATH=/usr/local/cuda-9.1/extras/CUPTI/lib64:/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/lib:/usr/local/cuda-9.1/lib64/stubs
SCRATCHDIR=$PROJECT/work
SUBJECT=105923
cmd="SINGULARITYENV_LD_LIBRARY_PATH=$HOST_LD_LIBRARY_PATH \
    singularity run --nv \
    --bind $PROJECT:$PROJECT,/usr/local/cuda-9.1:/usr/local/cuda-9.1 \
    $SINGULARITY_IMAGE \
    $BIDSDIR $OUTDIR preproc \
    --participant_label $SUBJECT \
    --scratch $SCRATCHDIR \
    --t1w_preproc $ANAT
 "	
echo $cmd; eval $cmd

mschira · 2022-09-08T07:39:45Z

Hey folks, I am on a mission to do the same on my end :-), we have a HPC cluster with GPU and talking to the sysadmin he also raised the concern that the newer cluster with A100 won't support cuda9_1 but would require cuda_11. Has anyone come around that issue? Seems a bit disastrous as it would suggest that one would need a RTX 10xx generation GPU?

Lestropie · 2022-09-21T09:52:42Z

This would depend entirely on FSL compiling the eddy source for newer CUDA versions. I don't know whether their latest patch updates have anything in that regard.

mschira · 2022-09-22T00:09:12Z

Hi
We got eddy_cuda9.1 working without a container on a RTX2080 - which wasn't clear, indeed the FSL site seems to suggest it would not work. I haven't got it working on the A100 of the HPC, yet. I've heard suggestions only CUDA11 works on the A100, but given that similar channels suggest that 9.1 only works on RTX10series, I am skeptical that is true, I will report one I know more.

Also there is NOW an eddy_cuda10.2 so that looks promising.
We are also trying to get the Neurodesk containers working on a HCP system, The containers work, but and it seems including CUDA, but eddy_cuda9.1 (or any other eddy_cuda) not.
But I know Steffen Bollmann is working on fixing that.
I am sure Steffen will make some documentation once successful, I will probably add a second layer of documentation, too, once it is all working. Hopefully that'll make it easier for others to follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there GPU/CUDA support for eddy? #94

Is there GPU/CUDA support for eddy? #94

finalelement commented Jun 1, 2021

Lestropie commented Jun 4, 2021

bwinsto2 commented Jul 22, 2021 •

edited

Lestropie commented Jul 23, 2021

bwinsto2 commented Jul 28, 2021

Lestropie commented Jul 28, 2021

bwinsto2 commented Jul 31, 2021

mschira commented Sep 8, 2022

Lestropie commented Sep 21, 2022

mschira commented Sep 22, 2022

Is there GPU/CUDA support for eddy? #94

Is there GPU/CUDA support for eddy? #94

Comments

finalelement commented Jun 1, 2021

Lestropie commented Jun 4, 2021

bwinsto2 commented Jul 22, 2021 • edited

Lestropie commented Jul 23, 2021

bwinsto2 commented Jul 28, 2021

Lestropie commented Jul 28, 2021

bwinsto2 commented Jul 31, 2021

mschira commented Sep 8, 2022

Lestropie commented Sep 21, 2022

mschira commented Sep 22, 2022

bwinsto2 commented Jul 22, 2021 •

edited