New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating an AMUSE binary distribution #984
Comments
Solving this would make AMUSE much more accessible and portable, so I think it should be high priority. |
I'm not sure if/how mpi4py (used in AMUSE) could be configured to use an MPI library that would be included with AMUSE. This would be an important thing to find out. |
A slightly more complex (for the user) but much easier (for us) solution could be to write an installation script, that sets up a Python virtual environment with MPI (and other libraries) installed pre-compiled, and distribute binary AMUSE packages for this specific environment. We can't use PyPI for this then, since the binaries would fail on any other environment than the specific one they were compiled for. |
Well, that would certainly be a project 😄 I've been looking at the build system in the past couple of weeks (after MESA failed to compile for me), and I've been going through the community codes (I'm about 1/3 of the way through) looking at languages and build systems and licenses and such. A few things that come to mind:
|
One more thought: we could go native, and create There would have to be some infrastructure for building on all the various platforms, probably based on Docker or VMs, and we'd have to figure out how it works, but if you support a couple of recent Ubuntu versions and a couple of recent macos versions then you're probably covering a good chunk of the users. |
I agree. HPC will probably always need custom builds - which is fine. That's the way we currently work, and that should always be supported.
Right. Most of the dependencies are searched for in configure, but indeed especially MESA has its own extra dependencies.
I would prefer not to have a hard requirement on Docker, since that brings its own issues.
Conda is indeed widely used. We have had lots of issues with Conda (see many open issues), it would be really good to solve these and to have an AMUSE Conda package. Also here, I think HPC would not be the main target for this.
Yes... This is something we haven't paid much attention to but perhaps/probably we should pay closer attention, especially if/when we start distributing binaries.
Flash is explicitly not distributed with AMUSE for exactly this reason. Also, indeed we suggest (but don't require) that people cite our papers / those of the community code. AMUSE itself uses the Apache license.
A diverse testing environment is certainly a good idea. We started setting this up but so far we're testing only limited environments/codes. |
Yes, this is also something we considered (see open issues on macports/homebrew and debian). It would be helpful. We need to be careful about requirements but this could probably work. |
I don't like the idea of a Docker dependency either, so let's drop that option. We could go with Conda for the desktop and EasyBuild for HPC, which would give us two mostly standardised environments to deal with. Although not all HPC machines have EasyBuild, and Conda is also somewhat messy. And Mac users may prefer HomeBrew or MacPorts. On the other hand, getting this packaged up for Debian is also very attractive. Many people run a Debian derivative, so having it available by default from the repos would be great. And as a long-time Linux user, being in the Debian repo also feels like your software has officially arrived in the FOSS world. A potential downside would be that when running say Ubuntu LTS (as I do) you may well end up with a two year old version. We could solve that with a PPA along the lines of deadsnakes. Still, one way or another it will have to build in a range of environments. For inclusion into Debian, the licensing situation needs to be clear for sure, which can be tricky. To really do it properly, we'd have to contact the universities, explain that one of their employees wrote some code several decades ago, explain to them what code is, what copyright is, what an open source license is, and then ask nicely if they're willing to license it to make the status quo of everyone copying it everywhere legal, at which point they'll panic and punt because oh my legal issues. At least, that's the worst case scenario 😄, and there are places that know how to do this, but it's still pretty early days. Of course, this should be resolved anyway, but I think EB and Conda are a bit less strict and we're already redistributing these codes so at least we wouldn't be making it worse. Perhaps a first step would be to see if we can set up some robust infrastructure? As far as I can see, one CI currently runs Python tests of the core, and the other does a test with two community codes and different MPI versions, but there's no complete build and test with all the community codes, let alone in multiple environments. This may end up stretching the free GitHub resources a bit, but let's see. Another issue is the setuptools setup. For the development environment that uses a custom setup.py command, which is deprecated and will disappear from setuptools at some point. It's also very complicated, there's some model-specific stuff in the setuptools code that should perhaps be in the corresponding per-package part of the system, there's #814 still partially open, there's a It seems to me that there could be some room for cleanup there, but I'm completely new to this project and I worry that I'm missing things. So maybe we (I 😄) should focus on tests first? Ensure that they cover everything we need the build system to do, and then get working on making changes. At least that would reduce the chance of regressions, and get the requirements clear. |
yes - focus on tests first is probably best. we should sit together and discuss how to move forward from there. |
Some relevant issues:
|
I've been looking at Python wheels with binary code a bit, as the only ones I've done so far are Python-only ones and those are easy. Note that the below assumes a desktop installation, HPC is another kettle of fish. Wheels and their limitationsA wheel is a ZIP file with Python files and anything else needed for the package, which for packages that contain native code typically means precompiled dynamically linked libraries. These libraries are not allowed to be dynamically linked against any other libraries, unless those are also included or (for Linux) they are on a very short list of core system libraries. In our case, the wheels would contain precompiled workers, for which presumably the same rules would hold. PyPI will refuse wheels that link to libraries outside the wheel that aren't on the exempt list. Even if we can get around this (e.g. by creating an installer script that determines which system it is on and then downloads an appropriate wheel for that specific OS version from a separate server), it's clear that you're not supposed to do that. We'd also have to support a potentially large number of combinations of installed library versions. MPI and CUDA dependenciesWe cannot link our workers against an included MPI library, because the Python side uses mpi4py, which links against the system's MPI on installation and that may be different and incompatible on the wire. So our workers need to be linked against the system MPI library, which would have to happen during installation. So at least a compiler is needed then, as well as MPI dev packages. CUDA may be even trickier, as it's proprietary and has a C++ API, and C++ doesn't have a stable ABI. This means any CUDA code would pretty much have to be compiled locally. We could try to compile some of the code into static archives, which could then be shipped in the wheel and linked with local MPI and CUDA libraries on installation. For MPI this might even work, it having a standardised C API and C having a standard ABI, but more likely it will just break in many hard-to-debug ways. For CUDA this very likely won't work unless we have exactly the same version on either side. Wheels + AMUSE = a bad idea?Given the above, it seems that there's no good solution for shipping wheels for use with So I think it makes more sense to ship binaries using a package manager that's designed for this scenario, such as Anaconda or Homebrew or MacPorts or dpkg or RPM. Improving the sdist installationThere are likely to always be cases where AMUSE will have to be installed from source. HPC machines come to mind, any other place where you cannot use the above package managers or really need to use a virtualenv, development installs, or any system that doesn't have a binary available for some reason. The current source build on PyPI is error-prone. Users normally expect to just be able to do One option may be to get rid of the packages on PyPI altogether, and have users clone the repo or download a tarball and then use Perhaps the least bad solution here would be to make a separate installer program, which would be written in plain Python and which would detect the OS environment, install any necessary packages in consultation with the user "I see that we're on a Mac and that you have Homebrew installed, shall I use Homebrew to install the dependencies for you?", and then build from source. Then you could |
Hi! I've been spending some time trying to get AMUSE installed on a UNIX server (CentOS) without any success (either pip or the development version), so I would welcome binaries being available for easier installation! I found this thread and just wanted to mention the existence of tools such as auditwheel and delocate, which will copy necessary libraries into wheels and replace all references to them in the code to the local version. This makes it much easier to install packages that require system libraries, because users don't have to pre-install any requirements. I'm not sure whether this would work with your |
Thanks Jo! |
Thanks for the offer! I'm not quite following the recommended installation guidelines (I think), so I will try a bit further on my end, but otherwise will reach out in another issue. |
Likewise I'll be happy to help. I'm aware of auditwheel and delocate, but indeed they wouldn't solve the problem of mpi4py compatibility. We'd have to convince the mpi4py maintainers to create a binary wheel with some specific MPI library included, and then bundle and use exactly the same library in AMUSE, carefully keep those synchronised, and then hope that the bundled MPI library doesn't clash at runtime with any MPI already installed on the system. I had a look at the CUDA EULA and that does actually permit redistribution of some of the libraries, but there's an issue there with some of the codes or dependencies having GPL licenses, and still the potential for version mismatches between a bundled CUDA library and the installed CUDA driver. Although I guess that latter problem will exist with Anaconda too. |
Related mpi4py issue: mpi4py/mpi4py#28 |
A long standing wish is to have a binary (pre-compiled) distribution of AMUSE.
The main problem in realising this is that AMUSE depends on quite a few libraries, most importantly an MPI distribution. These need to be the same version as the one AMUSE was built with.
One possible solution could be to distribute these libraries pre-built with AMUSE. It would be good to check if this is feasible, both technologically and legally (would there be any license conflicts?).
Ideally, installing AMUSE as a binary would work via Pip, and packages would automatically be built on GitHub when a new release is created.
The text was updated successfully, but these errors were encountered: