Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Singularity Pull for NVIDIA TensorFlow Container #672

Open
visakhraja opened this issue Mar 22, 2024 · 4 comments
Open

Error on Singularity Pull for NVIDIA TensorFlow Container #672

visakhraja opened this issue Mar 22, 2024 · 4 comments

Comments

@visakhraja
Copy link

Failed when attempting to install the nvcr.io/nvidia/tensorflow:24.02-tf2-py3-igpu container using SHPC (Supercontainers HPC)

Error log:
singularity pull --name /p/home/jusers/sivaprasad1/jureca/easybuild/jurecadc/modules/containers/nvcr.io/nvidia/tensorflow/24.02-tf2-py3-igpu/nvcr.io-nvidia-tensorflow-24.02-tf2-py3-igpu-sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9.sif docker://nvcr.io/nvidia/tensorflow@sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: while fetching image: initializing source oci:/p/home/jusers/sivaprasad1/jureca/.apptainer/cache/blob:c0cd6cdc1f956b77ac8ce780ac33b216cb41449d438966dd51f487a853ee0578: choosing an image from manifest list docker://nvcr.io/nvidia/tensorflow@sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9: no image found in manifest list for architecture amd64, variant "", OS linux

Traceback (most recent call last):
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/bin/shpc", line 8, in
sys.exit(run_shpc())
^^^^^^^^^^
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/client/init.py", line 556, in run_shpc
main(args=args, parser=parser, extra=extra, subparser=helper)
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/client/install.py", line 27, in main
cli.install(
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/modules/base.py", line 467, in install
if not module.container_path:
^^^^^^^^^^^^^^^^^^^^^
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/modules/module.py", line 146, in container_path
return self.add_container()
^^^^^^^^^^^^^^^^^^^^
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/modules/module.py", line 94, in add_container
self._container_path = self.container.registry_pull(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/container/singularity.py", line 258, in registry_pull
self.pull(container_uri, container_path)
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/container/singularity.py", line 334, in pull
return self._pull_regular(uri, dest)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/shpc/main/container/singularity.py", line 347, in _pull_regular
for line in lines:
File "/p/software/jurecadc/stages/2024/software/shpc/0.1.26-GCCcore-12.3.0/lib/python3.11/site-packages/spython/utils/terminal.py", line 148, in stream_command
raise subprocess.CalledProcessError(return_code, cmd)
subprocess.CalledProcessError: Command '['singularity', 'pull', '--name', '/p/home/jusers/sivaprasad1/jureca/easybuild/jurecadc/modules/containers/nvcr.io/nvidia/tensorflow/24.02-tf2-py3-igpu/nvcr.io-nvidia-tensorflow-24.02-tf2-py3-igpu-sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9.sif', 'docker://nvcr.io/nvidia/tensorflow@sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9']' returned non-zero exit status 255.

Support
@surak

@vsoch
Copy link
Member

vsoch commented Mar 22, 2024

It’s telling you it doesn’t have an architecture that matches for that digest. Did you read the error message?

@surak
Copy link
Contributor

surak commented Mar 22, 2024

Hi @vsoch!

The problem is that this is the default "latest" for TensorFlow nvidia's container. Therefore, if one does a shpc install nvcr.io/nvidia/tensorflow in a x86-64, it will fail, and I find it hard to accept that there is no x86_64 package available for something.

docker: nvcr.io/nvidia/tensorflow
url: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow/tags
maintainer: '@vsoch'
description: TensorFlow is an open-source software library for high-performance numerical
  computation. Its flexible architecture allows easy deployment of computation across
  a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers
  to mobile and edge devices.
latest:
  24.02-tf2-py3-igpu: sha256:3de8a232b25d658d7c5ae34c4fa04d1a9823b0a681636c8864f76d109a9528c9

Checking the nvidia website, there is a 24.02-tf2-py3 and a 24.02-tf2-py3-igpu, which is arm64 only.

@vsoch
Copy link
Member

vsoch commented Mar 22, 2024

There are over 8K containers in the registry, and they are added in an automated fashion, and indeed we don't check for that. If you'd like to PR to the registry to remove this tag and choose a better one, or just select another one, please feel free.

@surak
Copy link
Contributor

surak commented Mar 22, 2024

Ah, ok!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants