Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC 14.1 internal compiler errors? #126465

Closed
Geremia opened this issue May 16, 2024 · 4 comments
Closed

GCC 14.1 internal compiler errors? #126465

Geremia opened this issue May 16, 2024 · 4 comments
Labels
module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Geremia
Copy link

Geremia commented May 16, 2024

🐛 Describe the bug

/tmp/SBo/pytorch-v2.3.0/aten/src/ATen/native/Histogram.cpp:74:6: internal compiler error: Segmentation fault
   74 | void histogramdd_check_inputs(const Tensor& input, const TensorList& bins, const c10::optional<Tensor>& weight) {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~
0x1bf15a6 internal_error(char const*, ...)
        ???:0
0xdaf679 statistics_fini_pass()
        ???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Perhaps this is a GCC bug?

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Slackware Linux  (x86_64)
GCC version: (GCC) 13.2.0
Clang version: 18.1.5
CMake version: version 3.29.3
Libc version: glibc-2.39

Python version: 3.11.9 (main, Apr  2 2024, 13:43:44) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-6.6.30-x86_64-AMD_Ryzen_Threadripper_2990WX_32-Core_Processor-with-glibc2.39
Is CUDA available: N/A
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: Quadro RTX 4000
Nvidia driver version: 550.54.14
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        43 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen Threadripper 2990WX 32-Core Processor
CPU family:                           23
Model:                                8
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            1
Stepping:                             2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   73%
CPU max MHz:                          3000.0000
CPU min MHz:                          2200.0000
BogoMIPS:                             5999.26
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization:                       AMD-V
L1d cache:                            1 MiB (32 instances)
L1i cache:                            2 MiB (32 instances)
L2 cache:                             16 MiB (32 instances)
L3 cache:                             64 MiB (8 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-7,32-39
NUMA node1 CPU(s):                    16-23,48-55
NUMA node2 CPU(s):                    8-15,40-47
NUMA node3 CPU(s):                    24-31,56-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] numpy==1.26.3
[conda] Could not collect

cc @malfet @seemethere

@shink
Copy link
Contributor

shink commented May 17, 2024

Thanks for your report! Could you please share your minimal reproducible example?

@Geremia
Copy link
Author

Geremia commented May 17, 2024

@shink I wish I could isolate the issue better.
I get a similar Caffe2 / aten issue when compiling with GCC 14.1.0, too:

/usr/bin/cmake: symbol lookup error: /usr/lib64/libstdc++.so.6: undefined symbol: _ZNKSt7__cxx110messagesIcE7do_openERKNS_12basic_stringIcSt11char_traitsIcESaIcEEERKSt6locale, version GLIBCXX_3.4.21
make[2]: *** [caffe2/CMakeFiles/ATEN_CUDA_FILES_GEN_TARGET.dir/build.make:7118: aten/src/ATen/ops/bitwise_right_shift_cpu_dispatch.h] Error 127
make[2]: *** Deleting file 'aten/src/ATen/ops/bitwise_right_shift_cpu_dispatch.h'
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:1122: caffe2/CMakeFiles/ATEN_CUDA_FILES_GEN_TARGET.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

I'm not sure why it says GLIBCXX_3.4.21; my Libc version is 2.39.

Updated versions from collect_env.py

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Slackware Linux  (x86_64)
GCC version: (GCC) 14.1.0
Clang version: 18.1.5
CMake version: version 3.29.3
Libc version: glibc-2.39

Python version: 3.11.9 (main, Apr  2 2024, 13:43:44) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-6.9.0-x86_64-AMD_Ryzen_Threadripper_2990WX_32-Core_Processor-with-glibc2.39
Is CUDA available: N/A
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: Quadro RTX 4000
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/share/cuda/lib64/libcudnn.so.9.1.1
/usr/share/cuda/lib64/libcudnn_adv.so.9.1.1
/usr/share/cuda/lib64/libcudnn_cnn.so.9.1.1
/usr/share/cuda/lib64/libcudnn_engines_precompiled.so.9.1.1
/usr/share/cuda/lib64/libcudnn_engines_runtime_compiled.so.9.1.1
/usr/share/cuda/lib64/libcudnn_graph.so.9.1.1
/usr/share/cuda/lib64/libcudnn_heuristic.so.9.1.1
/usr/share/cuda/lib64/libcudnn_ops.so.9.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        43 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            AuthenticAMD
BIOS Vendor ID:                       Advanced Micro Devices, Inc.
Model name:                           AMD Ryzen Threadripper 2990WX 32-Core Processor
BIOS Model name:                      AMD Ryzen Threadripper 2990WX 32-Core Processor Unknown CPU @ 3.0GHz
BIOS CPU family:                      107
CPU family:                           23
Model:                                8
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            1
Stepping:                             2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   74%
CPU max MHz:                          3000.0000
CPU min MHz:                          2200.0000
BogoMIPS:                             5999.96
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization:                       AMD-V
L1d cache:                            1 MiB (32 instances)
L1i cache:                            2 MiB (32 instances)
L2 cache:                             16 MiB (32 instances)
L3 cache:                             64 MiB (8 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-7,32-39
NUMA node1 CPU(s):                    16-23,48-55
NUMA node2 CPU(s):                    8-15,40-47
NUMA node3 CPU(s):                    24-31,56-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] numpy==1.26.3
[conda] Could not collect

@Geremia Geremia changed the title Histogram.cpp:74:6: internal compiler error: Segmentation fault Caffe2 / aten build issue with GCC 13.2.0 and 14.1.0 May 17, 2024
@Geremia Geremia changed the title Caffe2 / aten build issue with GCC 13.2.0 and 14.1.0 symbol lookup error: /usr/lib64/libstdc++.so.6: undefined symbol: _ZNKSt7__cxx110messagesIcE7do_openERKNS_12basic_stringIcSt11char_traitsIcESaIcEEERKSt6locale May 19, 2024
@Geremia
Copy link
Author

Geremia commented May 19, 2024

Removing the -fPIC flag, the build continued further, but I encountered the same issue again:

Fatal glibc error: malloc.c:4376 (_int_malloc): assertion failed: (unsigned long) (size) >= (unsigned long) (nb)
during GIMPLE pass: dce
/tmp/SBo/pytorch-v2.3.0/aten/src/ATen/FunctionalInverses.cpp: In static member function ‘static at::Tensor at::functionalization::FunctionalInverses::_nested_get_values_inverse(const at::Tensor&, const at::Tensor&, at::functionalization::InverseReturnMode)’:
/tmp/SBo/pytorch-v2.3.0/aten/src/ATen/FunctionalInverses.cpp:315:8: internal compiler error: Aborted
  315 | Tensor FunctionalInverses::_nested_get_values_inverse(const Tensor& base, const Tensor& mutated_view, InverseReturnMode inverse_return_mode) {
      |        ^~~~~~~~~~~~~~~~~~
0x1fc8df8 internal_error(char const*, ...)
        ???:0
0x7feb1f696aab __pthread_kill
        ???:0
0x7feb1f642e11 __GI_raise
        ???:0
0x7feb1f62849e abort
        ???:0
0x7feb1f6292c9 __libc_message_impl.cold
        ???:0
0x7feb1f639e02 __libc_assert_fail
        ???:0
0x7feb1f6a3c84 _int_malloc
        ???:0
0x7feb1f6a3f51 _int_realloc
        ???:0
0x7feb1f6a51a5 __libc_realloc
        ???:0
0x2058b30 xrealloc
        ???:0
0xacce0e get_dominated_to_depth(cdi_direction, basic_block_def*, int)
        ???:0
0xacce9a get_all_dominated_blocks(cdi_direction, basic_block_def*)
        ???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

It seems to be a compiler bug.

There is a GCC bug reported related to compiling GridSamplerKernel.cpp. The solution was to use -fno-strict-aliasing, but that didn't help things in my case.

@Geremia Geremia changed the title symbol lookup error: /usr/lib64/libstdc++.so.6: undefined symbol: _ZNKSt7__cxx110messagesIcE7do_openERKNS_12basic_stringIcSt11char_traitsIcESaIcEEERKSt6locale GCC 14.1 internal compiler errors? May 19, 2024
@mikaylagawarecki mikaylagawarecki added module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 20, 2024
@Geremia
Copy link
Author

Geremia commented May 20, 2024

I was encountering these compiler errors because of a hardware issue; my DRAM MHz was set too high.

@Geremia Geremia closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants