[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? #413

Nero10578 · 2024-04-16T22:26:11Z

🚀 The feature, motivation and pitch

In the setup.py it checks for CUDA 6.1 as a minimum and that requirement is also stated in the readme. Is there a technical reason CUDA 6.0 is not supported? Is it for INT8 support?

I ask this because there is nothing inherently stopping VLLM which Aphrodite is forked from, from working with CUDA 6.0 on the Tesla P100 cards. As can be seen in this discussion: vllm-project/vllm#963 (comment)

if _is_cuda() and not compute_capabilities:
    # If TORCH_CUDA_ARCH_LIST is not defined or empty, target all available
    # GPUs on the current machine.
    device_count = torch.cuda.device_count()
    for i in range(device_count):
        major, minor = torch.cuda.get_device_capability(i)
        if major < 6 or (major == 6 and minor < 1):
            raise RuntimeError(
                "GPUs with compute capability below 6.1 are not supported.")
        compute_capabilities.add(f"{major}.{minor}")

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

AlpinDale · 2024-04-16T23:20:40Z

It's mostly due to the QuIP# kernels. I'll look into extending support to P100s (we used to support them before) tomorrow.

Nero10578 · 2024-04-17T04:37:31Z

It's mostly due to the QuIP# kernels. I'll look into extending support to P100s (we used to support them before) tomorrow.

Ah I see. So for now it doesn't work only when using Quip# kernels? I was thinking if it was as easy as changing the setup.py and the other quantization would work then it's a non-issue. Just wanted to make sure if it will work at all or if there is a big change in aphrodite as a whole that makes it not work with P100s.

I'm going to put together either a 4xP100 or 4xP40 system to test out the larger models and higher context size models that just came out, so I am just trying to make sure the stuff I want to run on them works first lol. The Tesla P100 are a great deal because they're 16GB cards that has over 2x the bandwidth of the P40 cards. Although if speed is no concern, I guess the P40 are a better deal with 24GBs.

Currently Aphrodite is working great on my 2x3090 so thanks for your work on this project!

dirkson · 2024-04-21T22:48:38Z

I did try myself on the dev branch, but I'm waaaay out of my depth. I got it to build using the runtime and exporting TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0+PTX" , but actually trying to load up a model results in "RuntimeError: CUDA error: no kernel image is available for execution on the device". As near as I understand, pytorch does still ship with kernels for the P100, though, so I'm unsure what's going wrong here.

AlpinDale · 2024-04-29T21:13:37Z

Please check #444. It builds for sm_60, but I haven't tested if it actually runs.

Nero10578 · 2024-05-04T12:21:57Z

Please check #444. It builds for sm_60, but I haven't tested if it actually runs.

I'm waiting on cards from Ebay but will do try when I get them. Thanks!

online2311 · 2024-05-28T09:10:09Z

Can't run it or it still says ‘RuntimeError: CUDA error: no kernel image is available for execution on the device’， Using the latest image of alpindale/aphrodite-engine

AlpinDale · 2024-05-28T17:15:55Z

@online2311 we forgot to bump the build architectures in the dockerfile, this will be fixed by the next release. If you want to build it yourself, edit the Dockerfile like this:

diff --git a/docker/Dockerfile b/docker/Dockerfile
index adcdeb1..330f89c 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -32,7 +32,7 @@ ENV CUDA_HOME=/usr/local/cuda

 ENV HF_HOME=/tmp
 ENV NUMBA_CACHE_DIR=$HF_HOME/numba_cache
-ENV TORCH_CUDA_ARCH_LIST="6.1 7.0 7.5 8.0 8.6 8.9 9.0+PTX"
+ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0+PTX"
 RUN python3 -m pip install --no-cache-dir -e .

 # Workaround to properly install flash-attn. For reference
@@ -44,7 +44,7 @@ ENTRYPOINT ["/app/aphrodite-engine/docker/entrypoint.sh"]

 EXPOSE 7860

-# Service UID needs write access to $HOME to create temporary folders, see #458
+# Service UID needs write access to $HOME to create temporary folders, see #458
 RUN chown 1000:1000 ${HOME}

 USER 1000:0

online2311 · 2024-05-29T00:48:42Z

Thank you very much, I recompiled the image according to your patch and now it is ready for model inference.
docker image address :nodecloud/aphrodite-engine

AlpinDale mentioned this issue Apr 29, 2024

fix: restore backwards compatibility with sm_60 (P100 and GP100) #444

Merged

AlpinDale linked a pull request Apr 30, 2024 that will close this issue

fix: restore backwards compatibility with sm_60 (P100 and GP100) #444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? #413

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? #413

Nero10578 commented Apr 16, 2024 •

edited

AlpinDale commented Apr 16, 2024

Nero10578 commented Apr 17, 2024 •

edited

dirkson commented Apr 21, 2024

AlpinDale commented Apr 29, 2024

Nero10578 commented May 4, 2024

online2311 commented May 28, 2024

AlpinDale commented May 28, 2024

online2311 commented May 29, 2024

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? #413

[Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? #413

Comments

Nero10578 commented Apr 16, 2024 • edited

🚀 The feature, motivation and pitch

Alternatives

Additional context

AlpinDale commented Apr 16, 2024

Nero10578 commented Apr 17, 2024 • edited

dirkson commented Apr 21, 2024

AlpinDale commented Apr 29, 2024

Nero10578 commented May 4, 2024

online2311 commented May 28, 2024

AlpinDale commented May 28, 2024

online2311 commented May 29, 2024

Nero10578 commented Apr 16, 2024 •

edited

Nero10578 commented Apr 17, 2024 •

edited