aarch64 build for AWS Linux - Failed to load image Python extension #8305

elkay · 2024-03-09T20:13:46Z

🐛 Describe the bug

Built Torch 2.1.2 and TorchVision 0.16.2 from source and running into the following problem:

/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1017SymbolicShapeMeta18init_is_contiguousEv'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?

previously the error was about missing libs and not undefined symbol, so I believe the libs are correctly installed now. Building says:

Compiling extensions with following flags:
   FORCE_CUDA: False
   FORCE_MPS: False
   DEBUG: False
   TORCHVISION_USE_PNG: True
   TORCHVISION_USE_JPEG: True
   TORCHVISION_USE_NVJPEG: True
   TORCHVISION_USE_FFMPEG: True
   TORCHVISION_USE_VIDEO_CODEC: True
   NVCC_FLAGS:
 Compiling with debug mode OFF
 Found PNG library
 Building torchvision with PNG image support
   libpng version: 1.6.37
   libpng include path: /home/ec2-user/conda/envs/textgen/include/libpng16
 Running build on conda-build: False
 Running build on conda: True
 Building torchvision with JPEG image support
   libjpeg include path: /home/ec2-user/conda/envs/textgen/include
   libjpeg lib path: /home/ec2-user/conda/envs/textgen/lib
 Building torchvision without NVJPEG image support
 Building torchvision with ffmpeg support
   ffmpeg version: b'ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers\nbuilt with gcc 10.2.0 (crosstool-NG 1.22.0.1750_510dbc6_dirty)\nconfiguration: --prefix=/opt/conda/conda-bld/ffmpeg_1622823166193/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1622823166193/_build_env/bin/aarch64-conda-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264\nlibavutil      56. 31.100 / 56. 31.100\nlibavcodec     58. 54.100 / 58. 54.100\nlibavformat    58. 29.100 / 58. 29.100\nlibavdevice    58.  8.100 / 58.  8.100\nlibavfilter     7. 57.100 /  7. 57.100\nlibavresample   4.  0.  0 /  4.  0.  0\nlibswscale      5.  5.100 /  5.  5.100\nlibswresample   3.  5.100 /  3.  5.100\nlibpostproc    55.  5.100 / 55.  5.100\n'
   ffmpeg include path: ['/home/ec2-user/conda/envs/textgen/include']
   ffmpeg library_dir: ['/home/ec2-user/conda/envs/textgen/lib']
 Building torchvision without video codec support

So I believe I do have things set up correctly to be able to do image calls (I don't care about video). Any idea why I would still be getting the undefined symbol warning? Thanks!

Versions

Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A

OS: Amazon Linux 2023.3.20240304 (aarch64)
GCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.34

Python version: 3.10.9 (main, Mar 8 2023, 10:41:45) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.1.79-99.164.amzn2023.aarch64-aarch64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA T4G
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r3p1
BogoMIPS: 243.75
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 4 MiB (4 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2
[pip3] torchvision==0.16.2+cu121
[pip3] triton==2.1.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypi

The text was updated successfully, but these errors were encountered:

NicolasHug · 2024-03-12T13:01:38Z

 [pip3] torchvision==0.16.2+cu121
 [conda] torchvision 0.16.2+cu121 pypi_0 pypi

Try uninstalling these versions first?

elkay · 2024-03-12T13:07:53Z

 [pip3] torchvision==0.16.2+cu121
 [conda] torchvision 0.16.2+cu121 pypi_0 pypi
Try uninstalling these versions first?

What would that accomplish? That's literally the package that I'm trying to use and that is throwing the error.

NicolasHug · 2024-03-12T16:01:14Z

Built Torch 2.1.2 and TorchVision 2.1.2 from source

What version of torchvision are you building from source, exactly? There's no torchvision 2.x. The latest stable version is 0.17.

The fact that there already is a stable 0.16.2 version installed while you're trying to build from source is very likely to be causing some issues.

elkay · 2024-03-12T18:01:07Z

Built Torch 2.1.2 and TorchVision 2.1.2 from source

What version of torchvision are you building from source, exactly? There's no torchvision 2.x. The latest stable version is 0.17.

The fact that there already is a stable 0.16.2 version installed while you're trying to build from source is very likely to be causing some issues.

Updated original post, torchvision version was a typo.

I did finally get torchvision to build and be functional, but only by forcibly editing the build scripts to pull in my custom build of torch+cuda 2.1.2. The build scripts were importing a non-cuda build because there is no aarch64 torch+cuda out there for pip to pull down. So finally, after forcing my own torch+cuda 2.1.2 whl into the torchvision build, now my torchvision actually works.

I need to say - it's been PAINFUL dealing with building anything that relies on torch because all the build scripts pull down the non-cuda version and mess up the builds. Every time I want to build something relying on torch, now I need to hack in pulling my own torch whl instead for them to work (this also resolved issues I was having building a few other things).

I reaaaaaally hope official aarch64 torch+cuda builds start to be made available so I don't have to keep doing this hackjob.

NicolasHug · 2024-03-12T18:07:47Z

What build script are you referring to? Can you share the build command you used?

elkay · 2024-03-12T18:52:19Z

The box is shut down but I believe it was pyproject.toml that I had to update to point directly at my torch whl and the command I used was "python setup.py bdist_wheel". I had the same outcomes with "pip install -v ." to directly install from source, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aarch64 build for AWS Linux - Failed to load image Python extension #8305

aarch64 build for AWS Linux - Failed to load image Python extension #8305

elkay commented Mar 9, 2024 •

edited

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024 •

edited

aarch64 build for AWS Linux - Failed to load image Python extension #8305

aarch64 build for AWS Linux - Failed to load image Python extension #8305

Comments

elkay commented Mar 9, 2024 • edited

🐛 Describe the bug

Versions

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024

NicolasHug commented Mar 12, 2024

elkay commented Mar 12, 2024 • edited

elkay commented Mar 9, 2024 •

edited

elkay commented Mar 12, 2024 •

edited