Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS #23

Open
yanchenmochen opened this issue Aug 16, 2022 · 3 comments

Comments

@yanchenmochen
Copy link

When I use the code to compute T1050.fasta, which is composed of 700 residuses, the command line output the problem。
The Environment is GPU: A100, Ubuntu,but I use higher version jax and jaxlib, is it the problem causing this?

(parafold) root@node33-a100:~# pip list | grep jax
jax 0.3.15
jaxlib 0.3.15+cuda11.cudnn82

@yanchenmochen
Copy link
Author

2022-08-17 11:26:20.226278: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 12524123136 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
2022-08-17 11:26:20.226316: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 12524123136
2022-08-17 11:26:23.693074: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 11271710720 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
2022-08-17 11:26:23.693112: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 11271710720
2022-08-17 11:26:28.900144: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
2022-08-17 11:26:28.900185: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184
2022-08-17 11:26:44.115027: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:796] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
2022-08-17 11:26:44.115072: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 17179869184

@Zuricho
Copy link
Owner

Zuricho commented Aug 19, 2022

I'm not sure about this. Maybe it's the jax version issue as you said, but I didn'tmet this before.

@yanchenmochen
Copy link
Author

I changed another Machine to Run Protein Prediction, I think Now It is correct now, Maybe It is the jaxlib causing the problem, but the Linux which is used by many staffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants