ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1` #125230

fxmarty · 2024-04-30T14:02:56Z

🐛 Describe the bug

I am running in rocm/dev-ubuntu-22.04:6.1.

Building PyTorch 2.3 (https://github.com/pytorch/pytorch/tree/release/2.3) with

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
pip install -r requirements.txt --no-cache-dir
pip install numpy ninja --no-cache-dir
pip uninstall -y triton && \
    git clone --depth 1 --single-branch https://github.com/ROCm/triton.git && \
    cd triton/python && \
    pip install .
_GLIBCXX_USE_CXX11_ABI="1" PYTORCH_ROCM_ARCH="gfx90a;gfx942" BUILD_CAFFE2=0 BUILD_CAFFE2_OPS=0 USE_CUDA=0 USE_ROCM=1 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 USE_FLASH_ATTENTION=0 USE_MEM_EFF_ATTENTION=0 python setup.py install

and getting

#33 891.8 cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
#33 891.8 /pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:26:10: fatal error: aotriton/flash.h: No such file or directory
#33 891.8    26 | #include <aotriton/flash.h>
#33 891.8       |          ^~~~~~~~~~~~~~~~~~
#33 891.8 compilation terminated.

around the end of the build.

It appears that the file aotriton/flash.h is not in PyTorch codebase. It is here though: https://github.com/ROCm/aotriton (which says the install should be smooth through pytorch https://github.com/ROCm/aotriton?tab=readme-ov-file#pytorch-consumption, so I am confused).

Is something wrong in https://github.com/pytorch/pytorch/blob/main/cmake/External/aotriton.cmake? Maybe need an include_directories(build/aotriton/src/include)?

Apparently during the build both ./build/aotriton/src/include/aotriton/flash.h and ./torch/include/aotriton/flash.h are generated, but not used.

Thank you!

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0] (64-bit runtime)
Python platform: /
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 57 bits virtual
Byte Order:                         Little Endian
CPU(s):                             96
On-line CPU(s) list:                0-95
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) Platinum 8480C
CPU family:                         6
Model:                              143
Thread(s) per core:                 1
Core(s) per socket:                 48
Socket(s):                          2
Stepping:                           8
BogoMIPS:                           4000.00
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 avx512vbmi umip waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b fsrm serialize amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          4.5 MiB (96 instances)
L1i cache:                          3 MiB (96 instances)
L2 cache:                           192 MiB (96 instances)
L3 cache:                           210 MiB (2 instances)
NUMA node(s):                       2
NUMA node0 CPU(s):                  0-47
NUMA node1 CPU(s):                  48-95
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Unknown: No mitigations
Vulnerability Retbleed:             Vulnerable
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] optree==0.11.0
[pip3] triton==2.1.0
[conda] mkl-include               2024.1.0              intel_691    intel
[conda] mkl-static                2024.1.0              intel_691    intel
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] optree                    0.11.0                   pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypi

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

The text was updated successfully, but these errors were encountered:

fxmarty · 2024-05-08T13:32:13Z

FYI it looks to me this error occurs only when using USE_FLASH_ATTENTION=0

pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Apr 30, 2024

fxmarty added a commit to fxmarty/pytorch that referenced this issue Apr 30, 2024

tentative patch for pytorch#125230

d058638

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 30, 2024

mht-sharma added a commit to mht-sharma/pytorch that referenced this issue May 3, 2024

tentative patch for pytorch#125230

56b81dd

hongxiayang added the bug label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1` #125230

ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1` #125230

fxmarty commented Apr 30, 2024 •

edited

fxmarty commented May 8, 2024

ROCm: fatal error: aotriton/flash.h: No such file or directory when building with USE_ROCM=1 #125230

ROCm: fatal error: aotriton/flash.h: No such file or directory when building with USE_ROCM=1 #125230

Comments

fxmarty commented Apr 30, 2024 • edited

🐛 Describe the bug

Versions

fxmarty commented May 8, 2024

ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1` #125230

ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1` #125230

fxmarty commented Apr 30, 2024 •

edited