Incorrect inline ptx device assembly code usage #766

zhiweij1 · 2023-10-13T00:47:04Z

Branch/Tag/Commit

main

Docker Image Version

N/A

GPU name

N/A

CUDA Driver

N/A

Reproduced Steps

https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/decoder_masked_multihead_attention/decoder_masked_multihead_attention_template.hpp#L643

A comma should be inserted between `"r"(a.x)` and `"r"(a.y)`.

Although it can be compiled by nvcc, but when using clang, clang will report an error: https://godbolt.org/z/jxd483a8j

I think it can be fixed in the FasterTransformer source code.

zhiweij1 added the bug Something isn't working label Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect inline ptx device assembly code usage #766

Incorrect inline ptx device assembly code usage #766

zhiweij1 commented Oct 13, 2023

Incorrect inline ptx device assembly code usage #766

Incorrect inline ptx device assembly code usage #766

Comments

zhiweij1 commented Oct 13, 2023

Branch/Tag/Commit

Docker Image Version

GPU name

CUDA Driver

Reproduced Steps