Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there GPU/CUDA support for eddy? #94

Open
finalelement opened this issue Jun 1, 2021 · 9 comments
Open

Is there GPU/CUDA support for eddy? #94

finalelement opened this issue Jun 1, 2021 · 9 comments

Comments

@finalelement
Copy link

The Docker file does not seem to contain any kind of CUDA installation. Is there any kind of future plan to extend this app with GPU support?

mrtrix3_connectome.py: [WARNING] CUDA version of FSL "eddy" present on system, but does not execute successfully; OpenMP version will instead be used

@Lestropie
Copy link
Collaborator

Hi Vishwesh,

I have not yet invested time into attempting to get CUDA working within Docker for this tool. Singularity is much more amenable to running on high-performance computing services, and I was able to get the CUDA version of FSL's eddy working from within the Singularity container with fairly minimal guidance from my local HPC sysadmin. This script can also be used in a stand-alone fashion outside of a container, as long as the requisite softwares are installed on your system, and so if CUDA is properly set up on the local system it will be utilized in that scenario also.

If there's demand for getting CUDA working from within Docker, I can have it as a listed issue; but I'm unlikely to be able to implement a solution myself in the near future as I happen to not have easy access to a Linux system with functioning CUDA at the present moment.

Cheers
Rob

@bwinsto2
Copy link

bwinsto2 commented Jul 22, 2021

Hi Rob, thanks for making this great app! Any tips on getting eddy_cuda working on Singularity? Is there a specific version of cuda I need to match the eddy that's in the container?

In the past (but not on this particular computer), I've had success running singularity run with the --nv flag to get eddy_cuda working on a different BIDS app. On this computer I've gotten eddy_cuda working on a different BIDS app through docker. I also was able to make the nvidia/cuda singularity image output nvidia-smi, meaning my singuliarty/cuda compatibility should be working.

This is my command:
Screen Shot 2021-07-22 at 6 25 16 PM

And here's the error I get (assuming the system in this case means the container):

Screen Shot 2021-07-22 at 6 06 16 PM

Would be grateful for any advice!

Brian

@Lestropie
Copy link
Collaborator

Hi Brian,

For me personally, getting CUDA working through Singularity only required a fairly brief exchange with the sysadmins of the relevant HPC. It involved setting envvar SINGULARITYENV_LD_LIBRARY_PATH in order to control the contents of LD_LIBRARY_PATH within the container running environment, and adding a dedicated bind path for the CUDA libraries corresponding to the loaded CUDA module.

If you want to see the particular complaint generated by eddy_cuda, you could run the container with -output_verbosity 4 (#49), which will result in the scratch directory being created in your output path; you can then dig through to the dwifslpreproc scratch directory within that, which should contain the file eddy_cuda_failure_output.txt. Most likely it'll be a "library not found" error, but you could confirm nonetheless.

If there are other tools that are able to provide CUDA support in Singularity in a way that requires less hoops from the user's perspective, I'd be interested in adopting such solutions; but it's not something I can afford to spend a large amount of time investigating.

Hope that's enough to set you on a path to a working solution
Rob

@bwinsto2
Copy link

Thanks for the reply, Rob. A colleague and I have been trying to get this working for a few days without success. Do you still have access to the exact command you used? We are running it on a machine with only CUDA 9.1, so versioning shouldn't be an issue, despite getting the same error message as this thread.

We've tried using --env LD_LIBRARY_PATH=/usr/local/cuda-9.1 and also binding in /usr/local/cuda-9.1 with the -B flag but get the same error message, and we're not sure whether the --nv flag is correct to use in this case. I can confirm with a check of env in the container that LD_LIBRARY_PATH includes /usr/local/cuda-9.1, and running the command nvidia-smi in the container also works, so this is perplexing.

If you still have the exact command you used that would be a helpful piece of information.

Cheers,
Brian

@Lestropie
Copy link
Collaborator

With my own use I'm actually executing a different container, but mrtrix3_connectome.py is invoked from within that. Here's what my SLURM script looks like (with some private paths redacted via envvars):

module load singularity/3.5.2
module load cuda/9.1

BIDSDIR=$SCRATCH/BIDS/
OUTDIR=$SCRATCH/derivatives/
SCRATCHDIR=$SCRATCH/work/
SINGULARITY_IMAGE=$SCRATCH/container.sif
HOST_LD_LIBRARY_PATH=/usr/local/cuda/9.1/extras/CUPTI/lib64:/usr/local/cuda/9.1/lib64:/usr/local/cuda/9.1/lib:/usr/local/cuda/9.1/lib64/stubs:/opt/munge-0.5.11/lib:/opt/slurm-19.05.4/lib:/opt/slurm-19.05.4/lib/slurm

nvidia-smi

cmd="SINGULARITYENV_LD_LIBRARY_PATH=$HOST_LD_LIBRARY_PATH \
    singularity run --nv \
    --bind $PROJECT:$PROJECT,$SCRATCH:$SCRATCH,/usr/local/cuda/9.1:/usr/local/cuda/9.1 \
    $SINGULARITY_IMAGE \
    $BIDSDIR $OUTDIR participant \
    -participant_label $SUBJECT \
    -scratch $SCRATCHDIR "
echo $cmd; eval $cmd

@bwinsto2
Copy link

Good lookin' out, @Lestropie! We got it to work. Here is the command we used, in case it's useful to anyone who sees this. We are running on GCP running Rhel 7 with only cuda 9.1 installed:

PROJECT=[redacted]
BIDSDIR=$PROJECT/raw_data
OUTDIR=$PROJECT/derivatives
ANAT=$OUTDIR/fmriprep/sub-105923/anat/sub-105923_desc-preproc_T1w.nii.gz
SINGULARITY_IMAGE=[redacted]/mrtrix3_connectome.sif
HOST_LD_LIBRARY_PATH=/usr/local/cuda-9.1/extras/CUPTI/lib64:/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/lib:/usr/local/cuda-9.1/lib64/stubs
SCRATCHDIR=$PROJECT/work
SUBJECT=105923
cmd="SINGULARITYENV_LD_LIBRARY_PATH=$HOST_LD_LIBRARY_PATH \
    singularity run --nv \
    --bind $PROJECT:$PROJECT,/usr/local/cuda-9.1:/usr/local/cuda-9.1 \
    $SINGULARITY_IMAGE \
    $BIDSDIR $OUTDIR preproc \
    --participant_label $SUBJECT \
    --scratch $SCRATCHDIR \
    --t1w_preproc $ANAT
 "	
echo $cmd; eval $cmd

@mschira
Copy link

mschira commented Sep 8, 2022

Hey folks, I am on a mission to do the same on my end :-), we have a HPC cluster with GPU and talking to the sysadmin he also raised the concern that the newer cluster with A100 won't support cuda9_1 but would require cuda_11. Has anyone come around that issue? Seems a bit disastrous as it would suggest that one would need a RTX 10xx generation GPU?

@Lestropie
Copy link
Collaborator

This would depend entirely on FSL compiling the eddy source for newer CUDA versions. I don't know whether their latest patch updates have anything in that regard.

@mschira
Copy link

mschira commented Sep 22, 2022

Hi
We got eddy_cuda9.1 working without a container on a RTX2080 - which wasn't clear, indeed the FSL site seems to suggest it would not work. I haven't got it working on the A100 of the HPC, yet. I've heard suggestions only CUDA11 works on the A100, but given that similar channels suggest that 9.1 only works on RTX10series, I am skeptical that is true, I will report one I know more.

Also there is NOW an eddy_cuda10.2 so that looks promising.
We are also trying to get the Neurodesk containers working on a HCP system, The containers work, but and it seems including CUDA, but eddy_cuda9.1 (or any other eddy_cuda) not.
But I know Steffen Bollmann is working on fixing that.
I am sure Steffen will make some documentation once successful, I will probably add a second layer of documentation, too, once it is all working. Hopefully that'll make it easier for others to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants