Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider providing interim instructions for Linux "happy path" using Docker #802

Open
quietlychris opened this issue Jul 1, 2023 · 2 comments

Comments

@quietlychris
Copy link
Contributor

Hello,

I spent part of this afternoon banging my head against a wall with getting dfdx with the cuda feature enabled up and running on my computer. It turns a big part of this appeared to be that my version (11.2) doesn't really appear to work well with the build.rs script, with errors appearing in multiple steps. As I think I may have mentioned in previous issues, my set-up isn't particularly exotic (just the recent Pop!_OS release with the default NVIDIA drivers), so I suspect that other folks may run into the same issue.

According to System76's docs, the recommended way of dealing with a CUDA version mismatch is just to use Docker. While this isn't ideal (I don't love having to rely on Docker), I can confirm that this solved most of my build issues, by first following the GPU-enabled container instructions in the link above, then building a dfdx-specific container using the Dockerfile below (which takes a hot minute to build).

FROM nvidia/cuda:12.1.0-devel-ubuntu22.04

RUN apt-get update

# Get Ubuntu packages
RUN apt-get install -y \
    build-essential \
    curl \
    git 

# Get Rust
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y
RUN echo 'source $HOME/.cargo/env' >> $HOME/.bashrc

I was just thinking that it might be worth considering adding this kind of process into the crate's documentation to help other people that may run into the same issue, at least until it becomes clear that the base NVIDIA-enabled system configurations being shipped with distro's like Pop!_OS/Ubuntu are able to support the dfdx's build script.

@quietlychris
Copy link
Contributor Author

quietlychris commented Jul 2, 2023

Edited because I'm occasionally Very Dumb(TM) and forgot to actually run this with the GPU passthrough. The following comment is now accurate.

Edit 2: Except this doesn't work with the PyTorch example in image-classification either, so I guess maybe it's something about the NVIDIA Docker image itself 🙃 I'll update whenever I happen to regain the willpower to continue exploring this.


I need to basically remove the nvidia-smi section of the build script in favor of nvcc, but this allows dfdx to compile with cuda enabled, but can't actually run the test suite, exiting with an error

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE, "forward compatibility was attempted on non supported HW")', /usr/local/cargo/git/checkouts/cudarc-2602ad613d9c0487/cc9a8d3/src/driver/safe/core.rs:50:24

In addition, it's recommended to add pkg-config and libssl-dev to the apt-get install list.

@swfsql
Copy link
Contributor

swfsql commented Aug 17, 2023

I'm not sure if you got it working, but I'm trying to learn ML while using this lib and this is my dockerfile dev env:

FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04

# basic tools
RUN apt update \
  && apt install -y --no-install-recommends \
  git vim openssh-client gnupg curl wget ca-certificates unzip zip less zlib1g sudo coreutils sed grep
#

# cargo/rust
ENV RUSTUP_HOME=/usr/local/rustup 
ENV CARGO_HOME=/usr/local/cargo 
ENV PATH=/usr/local/cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# https://blog.rust-lang.org/2022/06/22/sparse-registry-testing.html
ENV CARGO_UNSTABLE_SPARSE_REGISTRY=true
RUN set -eux; \
  apt update \
  && apt install -y --no-install-recommends \
    ca-certificates gcc build-essential; \
  url="https://static.rust-lang.org/rustup/dist/x86_64-unknown-linux-gnu/rustup-init"; \
  wget "$url"; \
  chmod +x rustup-init; \
  ./rustup-init -y --no-modify-path --default-toolchain nightly; \
  rm rustup-init; \
  chmod -R a+w $RUSTUP_HOME $CARGO_HOME; \
  rustup --version; \
  cargo --version; \
  rustc --version;
#

# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
RUN echo "export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}" >> ~/.bashrc

Thats for:

[dependencies.dfdx]
version = "0.13.0"
default-features = false
features = [
    "std",
    "fast-alloc",
    "cpu",
    "cuda",
    "cudnn",
    "safetensors",
    "numpy",
    "nightly",
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants