Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images availability for OS/ARCH other than linux/amd64 #396

Open
j34ni opened this issue Oct 18, 2022 · 30 comments · May be fixed by #399
Open

Images availability for OS/ARCH other than linux/amd64 #396

j34ni opened this issue Oct 18, 2022 · 30 comments · May be fixed by #399

Comments

@j34ni
Copy link

j34ni commented Oct 18, 2022

@scottyhq

I would like to use your docker images (in particular pangeo/pangeo-notebook and pangeo/ml-notebook) on an IBM POWER9 machine and I was wondering if there was any plan to make these available for linux/ppc64le?

@scottyhq
Copy link
Member

I was wondering if there was any plan to make these available for linux/ppc64le

Unfortunately, there is no plan for that. We only support and test linux/amd64, mainly because that is the main cloud architecture available for the Pangeo-supported JupyterHubs these images are designed for.

That said, I'd be interested to hear how it goes if anybody does put some time into building these images for other platforms! https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/

@j34ni
Copy link
Author

j34ni commented Oct 18, 2022

@scottyhq

That is a shame, because beside this POWER9 machine I was planing to run these images on ARM-based machines (and I guess that others may also be interested)

I started to rebuild with buildx build --platform=linux/ppc64le and that seems to work for the base-image
The only changes I had to make to the Dockerfile were FROM ppc64le/ubuntu:22.04 and Mambaforge-Linux-ppc64le.sh

However the pangeo-notebook failed at

 => ERROR [7/1] RUN echo "Checking for pip 'requirements.txt'..."         ; [ -d binder ] && cd binder         ; [ -d .binder ] && cd .binder         ;   3.5s 
------                                                                                                                                                         
 > [7/1] RUN echo "Checking for pip 'requirements.txt'..."         ; [ -d binder ] && cd binder         ; [ -d .binder ] && cd .binder         ; if test -f "requirements.txt" ; then         /srv/conda/envs/notebook/bin/pip install --no-cache -r requirements.txt         ; fi:                                           
#0 1.559 Checking for pip 'requirements.txt'...                                                                                                                
#0 1.563 /bin/sh: 1: /srv/conda/envs/notebook/bin/pip: not found                                                                                               
------

But this `/srv/conda/envs/notebook/bin/pip` does exist

Any idea about how I should adapt that?

@yuvipanda
Copy link
Member

I do think we should provide ARM builds though :D

@scottyhq
Copy link
Member

scottyhq commented Oct 18, 2022

There are two parts for this, and I suggest starting without docker and just getting a conda environment for your architecture:

  1. Do the conda packages exist (either cross-platform or platform-specific)?
  2. If yes, build a docker image that encapsulates that conda environment

I quickly tried re-creating the pangeo-notebook environment on AWS (c6g.medium -> amazon/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-arm64-server-20220912) with no luck because of various packages that I suspect have compiled dependencies:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-aarch64.sh  
bash Mambaforge-Linux-aarch64.sh  
mamba env create --file https://raw.githubusercontent.com/pangeo-data/pangeo-docker-images/master/pangeo-notebook/environment.yml
Encountered problems while solving:
  - nothing provides requested ciso
  - nothing provides requested esmpy
  - nothing provides requested parcels
  - nothing provides requested xcape
  - nothing provides esmpy needed by xesmf-0.1.2-py_0
  - nothing provides pykdtree needed by satpy-0.10.0-pyh326bf55_0

@j34ni
Copy link
Author

j34ni commented Oct 18, 2022

The very same packages are also missing for the linux/ppc64le architecture

@ngam
Copy link
Contributor

ngam commented Oct 18, 2022

The very same packages are also missing for the linux/ppc64le architecture

We usually do "arch migrations" in conda-forge where both aarch64 and ppc64le go hand-in-hand. If you want this to progress, you will have to push for these packages to be added. Start here: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/main/recipe/migrations/arch_rebuild.txt --- submit a PR with missing packages and then the bots will take care of ensuring to take care of dependencies and opening PRs for all missing ones that any package depends on, etc. and then you will want to follow up in each feedstock with progress

@ngam
Copy link
Contributor

ngam commented Oct 18, 2022

For example, just looking at ciso, https://github.com/conda-forge/ciso-feedstock/blob/main/recipe/meta.yaml, it looks rather straightforward, so I suspect it will be an easy migration for that one.

yuvipanda added a commit to yuvipanda/pangeo-docker-images that referenced this issue Oct 19, 2022
The version of mambaforge appropriate for the architecture that
the docker build is running on is built this way.

Ref pangeo-data#396
@yuvipanda yuvipanda linked a pull request Oct 19, 2022 that will close this issue
@yuvipanda
Copy link
Member

#399 should remove the only hardcoding of amd64 in the dockerfiles. After this, images can be built to run arm64 / ppc46le once conda-forge has all the packages required for them.

yuvipanda added a commit to yuvipanda/pangeo-docker-images that referenced this issue Oct 19, 2022
The version of mambaforge appropriate for the architecture that
the docker build is running on is built this way.

Ref pangeo-data#396
@yuvipanda
Copy link
Member

yuvipanda commented Oct 19, 2022

Apparently I got nerdsniped by this, and #399 now implements most of what is needed in this repo. Based on #399, here is the list of packages so far that need to be supported in conda-forge:

pangeo-notebook

Both arm64 and ppc64le

ppce64le only

  • tiledb-py

ML Notebook (Tensorflow)

Both arm64 and ppc64le

PyTorch Notebook

Both arm64 and ppc64le

Forge Image (not needed after #477)

Both arm64 and ppc64le

- [ ] apache-beam-with-gcp

Once these are done, #399 should work and we can then add the setup required to build multi-arch images. I think we should definitely support arm64, although I'm not sure about ppc64le. In particular, does this mean if we try to add a new package in the future but it doesn't support ppc64le, we don't add it?

@yuvipanda
Copy link
Member

I don't have any conda-forge experience so I can't really work on these. If someone else does and gets all these packages working, please ping me here again and I'll try get #399 to completion.

@ngam
Copy link
Contributor

ngam commented Oct 19, 2022

  • h5py

when I read this I thought we would be in big trouble, but it is actually already done: https://anaconda.org/conda-forge/h5py (I would have been super surprised if h5py wasn't already migrated!)

Getting tensorflow, pytorch, and jaxlib ready for ppc will be a HUGE undertaking, especially that nobody that I know (and I am involved in all these three packages) has access to ppc machines.

The pangeo-notebook image should be doable though.

@yuvipanda
Copy link
Member

@ngam yeah, hence I think ARM is more likely than PPC64. I think there are significant cloud cost savings to be had with ARM...

@yuvipanda
Copy link
Member

@ngam tensorflow already supports aarch64 (see https://pypi.org/project/tensorflow/2.10.0/#files for wheel), and so does torch https://pypi.org/project/torch/#files. jaxlib supports arm64 macos but not an aarch64 wheel (https://pypi.org/project/jaxlib/0.3.22/#files) but i suspect that shouldn't be too much work?

@yuvipanda
Copy link
Member

@ngam here is the error I get for h5py when making ppc64le:

  - package h5py-2.10.0-mpi_mpich_py36he791a77_6 requires python_abi 3.6.* *_cp36m, but none of the providers can be installed

yuvipanda added a commit to yuvipanda/conda-forge-pinning-feedstock that referenced this issue Oct 19, 2022
We are trying to build arm64 packages for the pangeo project, and these
are the missing packages: pangeo-data/pangeo-docker-images#396 (comment)
@yuvipanda
Copy link
Member

@ngam okkkk i made a PR conda-forge/conda-forge-pinning-feedstock#3533

@ngam
Copy link
Contributor

ngam commented Oct 19, 2022

- package h5py-2.10.0-mpi_mpich_py36he791a77_6 requires python_abi 3.6.* *_cp36m, but none of the providers can be installed

This looks like a wild misleading error from mamba/conda. Did you try that in a fresh env? It could be some random dep.

--

jaxlib should be okay, we are just waiting for some final touches from upstream before pursuing it. Pytorch (cpu only) is also already available for aarch64 (linux arm) in conda-forge. I will try to get something going for tensorflow soon for aarch64.

We could potentially get a cuda-enabled aarch64 setup going as well!

@yuvipanda
Copy link
Member

@ngam it is just from conda-lock, I ran:

cd pangeo-notebook
conda-lock lock --mamba -k explicit -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml  -p linux-ppc64le

@ngam
Copy link
Contributor

ngam commented Oct 19, 2022

Ah, I see, the solver is definitely getting confused by the other packages.

@j34ni
Copy link
Author

j34ni commented Oct 19, 2022

The pangeo-notebook image should be doable though.

This is the image we use most, so that would be a great start

@j34ni
Copy link
Author

j34ni commented Oct 19, 2022

For now I simply commented the lines of the environment related to missing ppc64el packages and the build still fails with this error:

#10 582.7 Preparing transaction: ...working... done
#10 676.5 Verifying transaction: ...working... done
#10 890.8 Executing transaction: ...working... /srv/conda/envs/notebook/bin/.gdk-pixbuf-post-link.sh: line 13: /srv/conda/envs/notebook/bin/gdk-pixbuf-query-loaders: No such file or directory
#10 1145.5 ERROR: Failed to update gdk-pixbuf's cache, some plugins may not be found.
#10 1145.5 To fix this, activate the environment and run:
#10 1145.5     gdk-pixbuf-query-loaders --update-cache

@j34ni
Copy link
Author

j34ni commented Oct 19, 2022

For now I simply commented the lines of the environment related to missing ppc64el packages and the build still fails

I eventually got it built, the error was due to the presence in the same directory of the conda-linux-64.lock file .

I guess that this would not happen if there was no hard-coded CPU architecture in the Dockerfiles

@ngam
Copy link
Contributor

ngam commented Oct 19, 2022

For now I simply commented the lines of the environment related to missing ppc64el

Which ones are these? Same as above? I have just merged the PR for pykdtree. So that can be taken out of the list.

@ngam
Copy link
Contributor

ngam commented Oct 19, 2022

ciso should also be good to go conda-forge/ciso-feedstock#21. We can deal with the timeout issues later.

@j34ni
Copy link
Author

j34ni commented Oct 19, 2022

Which ones are these? Same as above? I have just merged the PR for pykdtree. So that can be taken out of the list.

Yes, it is:

ciso
esmpy
parcels
tiledb-py
xcape
xesmf
pykdtree

@j34ni
Copy link
Author

j34ni commented Oct 20, 2022

It seems OK for ciso and pykdtree, but there is now pyresample >=1.10.3 (also needed by satpy) which is missing
So I still comment satpy

@scottyhq
Copy link
Member

scottyhq commented Dec 8, 2022

Just wanted to revive this discussion in case people are motivated for pushing on package availability on other systems. It looks like win-64 is also an issue for some of the same packages:

Could not lock the environment for platform win-64
Could not solve for environment specs
Encountered problems while solving:
  - nothing provides requested esmpy
  - nothing provides requested xcape
  - nothing provides esmpy needed by xesmf-0.1.2-py_0

@weiji14 weiji14 linked a pull request Jun 27, 2023 that will close this issue
@weiji14
Copy link
Member

weiji14 commented Jun 27, 2023

Jun 2023 update. Tried running the following for aarch64:

cd pangeo-notebook/
conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-aarch64

and it looks like only 2 packages in pangeo-notebook (parcels and pyresample) are not working on aarch64:

Could not solve for environment specs
The following packages are incompatible
├─ parcels   does not exist (perhaps a typo or a missing channel);
└─ satpy   is uninstallable because there are no viable options
   ├─ satpy [0.10.0|0.11.0|...|0.42.2] would require
   │  └─ pyresample >=1.10.3 , which does not exist (perhaps a missing channel);
   └─ satpy [0.9.0|0.9.1|0.9.2|0.9.3|0.9.4] would require
      └─ pyresample >=1.10.0 , which does not exist (perhaps a missing channel).

For ppc64le, it looks like more package migrations are needed:

conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-ppc64le
Could not lock the environment for platform linux-ppc64le
Could not solve for environment specs
The following packages are incompatible
├─ parcels   does not exist (perhaps a typo or a missing channel);
├─ python 3.10**  is installable and it requires
│  └─ python_abi 3.10.* *_cp310, which can be installed;
├─ satpy   is uninstallable because there are no viable options
│  ├─ satpy [0.10.0|0.11.0|...|0.42.2] would require
│  │  └─ pyresample >=1.10.3 , which does not exist (perhaps a missing channel);
│  └─ satpy [0.9.0|0.9.1|0.9.2|0.9.3|0.9.4] would require
│     └─ pyresample >=1.10.0 , which does not exist (perhaps a missing channel);
├─ seaborn   is uninstallable because it requires
│  └─ statsmodels [ |>=0.10 |>=0.5.0 |>=0.8.0 ] but there are no viable options
│     ├─ statsmodels 0.11.1 would require
│     │  └─ python_abi 3.6.* *_cp36m, which conflicts with any installable versions previously reported;
│     ├─ statsmodels 0.11.1 would require
│     │  └─ python_abi 3.7.* *_cp37m, which conflicts with any installable versions previously reported;
│     └─ statsmodels 0.11.1 would require
│        └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
└─ tiledb-py   does not exist (perhaps a typo or a missing channel).

I'll update Yuvi's checklist at #396 (comment) to keep track.

@ngam
Copy link
Contributor

ngam commented Jul 5, 2023

@weiji14 let's try to get these two packages done:

Note we added aarch64 to jaxlib recently, so jax is good to go (so is pytorch).

On the other hand, ppc64le is going to be much harder, I think... so I'd focus our efforts on aarch64 for now.

@weiji14
Copy link
Member

weiji14 commented Jul 20, 2023

Thanks @ngam for getting those last packages over the finish line for linux-aarch64! I just tried locking pangeo-notebook and it worked 🎉 Pytorch and tensorflow doesn't seem ready yet, but we can work on those later.

Shall we revive @yuvipanda's PR at #399 to add in linux-aarch64 first for pangeo-notebook only?

@yuvipanda
Copy link
Member

yay amazing, that would be great, @weiji14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants