New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 build for AWS Linux - Failed to load image Python extension #8305
Comments
Try uninstalling these versions first? |
What would that accomplish? That's literally the package that I'm trying to use and that is throwing the error. |
What version of torchvision are you building from source, exactly? There's no torchvision 2.x. The latest stable version is 0.17. The fact that there already is a stable |
Updated original post, torchvision version was a typo. I did finally get torchvision to build and be functional, but only by forcibly editing the build scripts to pull in my custom build of torch+cuda 2.1.2. The build scripts were importing a non-cuda build because there is no aarch64 torch+cuda out there for pip to pull down. So finally, after forcing my own torch+cuda 2.1.2 whl into the torchvision build, now my torchvision actually works. I need to say - it's been PAINFUL dealing with building anything that relies on torch because all the build scripts pull down the non-cuda version and mess up the builds. Every time I want to build something relying on torch, now I need to hack in pulling my own torch whl instead for them to work (this also resolved issues I was having building a few other things). I reaaaaaally hope official aarch64 torch+cuda builds start to be made available so I don't have to keep doing this hackjob. |
What build script are you referring to? Can you share the build command you used? |
The box is shut down but I believe it was pyproject.toml that I had to update to point directly at my torch whl and the command I used was "python setup.py bdist_wheel". I had the same outcomes with "pip install -v ." to directly install from source, though. |
馃悰 Describe the bug
Built Torch 2.1.2 and TorchVision 0.16.2 from source and running into the following problem:
/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/ec2-user/conda/envs/textgen/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1017SymbolicShapeMeta18init_is_contiguousEv'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?previously the error was about missing libs and not undefined symbol, so I believe the libs are correctly installed now. Building says:
So I believe I do have things set up correctly to be able to do image calls (I don't care about video). Any idea why I would still be getting the undefined symbol warning? Thanks!
Versions
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A
OS: Amazon Linux 2023.3.20240304 (aarch64)
GCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.34
Python version: 3.10.9 (main, Mar 8 2023, 10:41:45) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.1.79-99.164.amzn2023.aarch64-aarch64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA T4G
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-12.2/targets/sbsa-linux/lib/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r3p1
BogoMIPS: 243.75
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 4 MiB (4 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2
[pip3] torchvision==0.16.2+cu121
[pip3] triton==2.1.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypi
The text was updated successfully, but these errors were encountered: