Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh install on Mac OS X segfaults or produces a Fortran Runtime error #57

Open
appetrosyan opened this issue Jul 2, 2020 · 25 comments

Comments

@appetrosyan
Copy link
Contributor

MWE:

from pypolychord.settings import PolyChordSettings
import matplotlib.pyplot as plt
import pypolychord as ppc
from pypolychord.priors import UniformPrior

def quantile(cube):
    return UniformPrior(-10, 10)(cube)


def lnL(theta):
    return theta**2


settings = PolyChordSettings(2, 0)
ppc.run_polychord(lnL, 2, 0, settings, quantile)

produces

[app-mbp:02918] *** Process received signal ***
[app-mbp:02918] Signal: Segmentation fault: 11 (11)
[app-mbp:02918] Signal code: Address not mapped (1)
[app-mbp:02918] Failing at address: 0x5
[app-mbp:02918] [ 0] 0   libsystem_platform.dylib            0x00007fff6eb4a5fd _sigtramp + 29
[app-mbp:02918] [ 1] 0   ???                                 0x000000011242f53d 0x0 + 4601345341
[app-mbp:02918] *** End of error message ***
fish: 'python3 0.0\ simple\ run.py' terminated by signal SIGSEGV (Address boundary error)
@williamjameshandley
Copy link
Member

Could you give a bit more detail about what a 'fresh' install is on OSX? The script you provide runs on my machine, so it likely that this is an OSX-issue, possibly specific to your compilers/MPI setup.

The travis setup shows that there are indeed some unresolved issues associated with OSX + MPI, which look very similar to your own. Lacking an OSX machine testing this is difficult (particularly as the travis spin-up time is 15 minutes for OSX).

Does your code run successfully without MPI?

@appetrosyan
Copy link
Contributor Author

I can’t run anything PolyChord. The shortest error message I get is

FORTRAN RUNTIME ERRROR: array rank of PUT is not 1.

And that’s what I get without MPI. With MPI it segfaults.

@appetrosyan
Copy link
Contributor Author

A fresh install is a clear venv, brew MPI, brew GFortran. Pip install every dependency.

@williamjameshandley
Copy link
Member

If you run the equivalent commands encoded in the .travis.yml (if MPI=0), do you still get this error?

@appetrosyan
Copy link
Contributor Author

Exactly the same error messages.

Running with mpirun just wraps the segfaults into MPI warnings.

@williamjameshandley
Copy link
Member

Did you install polychord and its dependencies in the same manner as travis? (I ask, because we know that it can work on OSX without segfaults if you do that, and we can then try to isolate what in your setup is causing the segfault on your machine).

@appetrosyan
Copy link
Contributor Author

Nope. It fails with the following.

[app-mbp:07451] *** Process received signal ***
[app-mbp:07451] Signal: Segmentation fault: 11 (11)
[app-mbp:07451] Signal code: Address not mapped (1)
[app-mbp:07451] Failing at address: 0x1
[app-mbp:07451] [ 0] 0   libsystem_platform.dylib            0x00007fff6eb4a5fd _sigtramp + 29
[app-mbp:07451] *** End of error message ***
[app-mbp:07452] *** Process received signal ***
[app-mbp:07452] Signal: Segmentation fault: 11 (11)
[app-mbp:07452] Signal code: Address not mapped (1)
[app-mbp:07452] Failing at address: 0x1
[app-mbp:07452] [ 0] 0   libsystem_platform.dylib            0x00007fff6eb4a5fd _sigtramp + 29
[app-mbp:07452] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node app-mbp exited on signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------

@williamjameshandley
Copy link
Member

  1. Do the fortran versions work on your machine?
make gaussian
./bin/gaussian ini/gaussian.ini
  1. Does compiling with debug flags give us any further detail:
make veryclean
make DEBUG=1 MPI=0 libchord.so
python setup.py install 
python <my script>

@appetrosyan
Copy link
Contributor Author

  1. Fortran works like a charm.

  2. Nope. Same bare info:

[app-mbp:07995] *** Process received signal ***
[app-mbp:07995] Signal: Segmentation fault: 11 (11)
[app-mbp:07995] Signal code: Address not mapped (1)
[app-mbp:07995] Failing at address: 0x129e97000
[app-mbp:07995] [ 0] 0   libsystem_platform.dylib            0x00007fff6eb4a5fd _sigtramp + 29
[app-mbp:07995] [ 1] 0   ???                                 0x000000000000000f 0x0 + 15
[app-mbp:07995] *** End of error message ***
fish: 'python3 run_pypolychord.py' terminated by signal SIGSEGV (Address boundary error)

@williamjameshandley
Copy link
Member

How far through does it get. Can you do python -c 'import pypolychord' OK?

@williamjameshandley
Copy link
Member

Would you be able to paste the stdout during this procedure:

make veryclean
make DEBUG=1 MPI=0 libchord.so
python setup.py install 
python <my script>

@appetrosyan
Copy link
Contributor Author

On my system python is python2.7, so here's the output of the same with python3.

@appetrosyan
Copy link
Contributor Author

This is not too critical, as Cobaya still uses PolyChord v1.16, which runs OK on my machine.

@williamjameshandley
Copy link
Member

ah -- it's still using mpicc for the pypolychord compilation. Does

CC=gcc CXX=g++ python setup.py install

fix that?

Alternatively, you could try

python setup.py install --no-mpi

@williamjameshandley
Copy link
Member

pypolychord 1.17.1 is already the active version in easy-install.pth

You should also make sure that there aren't any other installations of pypolychord on the system (python virtual environments are not completely failsafe)

@appetrosyan
Copy link
Contributor Author

error: option --no-mpi not recognized

Every other combination with any other compiler (e.g. clang, icc, gcc) give me the same problem.

python virtual environments are not completely failsafe

Cobaya taught me that the hard way....

@lukashergt
Copy link

Not sure whether this is relevant here, but on the topic of installation from scratch and virtual environments and MPI, I found that it is safer to use --no-cache-dir when pip installing things (mpi4py in particular...).

pip install --no-cache-dir mpi4py

@williamjameshandley
Copy link
Member

error: option --no-mpi not recognized

Sorry, 'obviously' it should be

python setup.py --no-mpi install

@appetrosyan
Copy link
Contributor Author

Compiled with

make veryclean && make DEBUG=1 MPI=0 libchord.so && CC=gcc CXX=g++ python3 setup.py --no-mpi install && python3 run_pypolychord.py

Same segfault.

@williamjameshandley
Copy link
Member

OK, so on #58 I have now managed to get it to run with both python2 and python3 without MPI. With MPI I get a very similar segfault to the one that you are finding, so it would be good to isolate whether the issue you are finding is with MPI, or a separate segfault.

Could you try to make things as travis-like as possible. This means:

  • uninstalling any other instances of pypolychord
  • making sure that gcc, python2, python3 and openmpi are all brew installed
brew unlink python@2
brew link python@3
python3 -m pip install virtualenv
virtualenv venv -p python3
source venv/bin/activate
pip install numpy scipy

First without mpi:

pip install . --global-option="--no-mpi"
python run_pypolychord.py

Then with (this may segfault):

pip install .
pip install mpi4py
mpirun -np 2 python run_pypolychord.py

Thank you for your help with this -- this seems to be a segfault that only affects a subset of OSX users, and lacking a mac (or a willing user with a segfaulting system), it is likely that this issue has been annoying some prospective users for a while.

@tilmantroester
Copy link
Contributor

My suspicion is that clang is the culprit, as on mac OS clang is always the default compiler. That is, brew packages (such as openmpi) are compiled/linked against clang, even if gcc is installed. Python is built with clang as well.
Since the python binding to polychord uses CPython, it has to link to libraries compiled with different compilers (clang for CPython, gfortran for libchord).

With MPI, things get even more complicated, since mpicxx (which is used to compile the python extension) points to clang, while mpifort (used to compile libchord) points to gfortran (i.e., gcc).

Rewriting the python interface in terms of ctypes might be an option, since it would eliminate the need to compile a python extension.

@appetrosyan
Copy link
Contributor Author

Fresh clean install on a completely new factory reset system, with explicit references to home-brew libraries. This produces

>>> import pypolychord 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/app/Git/PolyChordLite/pypolychord/__init__.py", line 4, in <module>
    import _pypolychord
ImportError: dlopen(/usr/local/lib/python3.8/site-packages/pypolychord-1.17.1-py3.8-macosx-10.15-x86_64.egg/_pypolychord.cpython-38-darwin.so, 2): Symbol not found: ___addtf3
  Referenced from: /usr/local/opt/gcc/lib/gcc/10/libquadmath.0.dylib
  Expected in: /usr/lib/libSystem.B.dylib
 in /usr/local/opt/gcc/lib/gcc/10/libquadmath.0.dylib

@tillahoffmann
Copy link

I had the same problem and was able to compile things as follows to get a working install.

$ git clone git@github.com:PolyChord/PolyChordLite.git
$ cd PolyChordLite
$ make MPI=0 libcord.so
$ python setup.py --no-mpi install
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$ python --version
Python 3.8.1

@appetrosyan
Copy link
Contributor Author

Didn't work for me, besides I'm trying to debug a problem I had with MPI, so I can't just ignore MPI support.

@zwei-beiner
Copy link

zwei-beiner commented Jul 8, 2022

Had the same issue with MPI. run_pypolychord.py ran successfully and produced the anesthetic PDF file, but MPI threw an error.

I managed to install PolychordLite without any problems only without MPI:

brew install mpich  
python3 -m venv venv  
source venv/bin/activate
pip install pip setuptools --upgrade
pip install numpy scipy mpi4py
pip install git+https://github.com/PolyChord/PolyChordLite@master --global-option="--no-mpi"
pip install git+https://github.com/williamjameshandley/anesthetic@master
$ python3 --version
Python 3.9.10
$ gcc --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants