Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patchelf 0.17.0 can lead to broken repairs #1421

Open
behrisch opened this issue Nov 27, 2022 · 13 comments
Open

patchelf 0.17.0 can lead to broken repairs #1421

behrisch opened this issue Nov 27, 2022 · 13 comments
Labels

Comments

@behrisch
Copy link

behrisch commented Nov 27, 2022

After updating to the specified tag several of our applications started to segfault directly on startup, see eclipse-sumo/sumo#12142. Going back to 2022-11-14-1226cfc solved the problem. The given library is the only one that changed.
To reproduce you can compare running marouter after doing pip install -i https://test.pypi.org/simple eclipse-sumo==1.15.0.post434 (last working version before the change) with the versions post450 (first broken) and post616 (fixed).

@mayeut
Copy link
Member

mayeut commented Nov 27, 2022

Thanks for the report.
Do you have an un-repaired wheel somewhere ? This is taking taking ages to build on my laptop and a simple usage of libcrypt.so.1 is not showing a segfault (ported a test from auditwheel, using libcrypt.so.1 instead of libcrypt.so.2).

@behrisch
Copy link
Author

pip install -i https://test.pypi.org/simple eclipse-sumo==1.15.0.post450 should give you a broken wheel

@mayeut
Copy link
Member

mayeut commented Nov 27, 2022

yes but it's been processed by auditwheel. I need the wheel before it's been repaired to investigate. It's been building for more than 4 hours now (I just cloned the project & started ./tools/build/build_wheels.sh).

PS: I could reproduce the behavior mentioned in eclipse-sumo/sumo#12142 with the link provided so thanks for that but that's not enough to investigate.

@behrisch
Copy link
Author

So you need the wheel before auditwheel runs, correct? I do not have that but I could set it up. But it will take probably an hour to build.

@mayeut
Copy link
Member

mayeut commented Nov 27, 2022

So you need the wheel before auditwheel runs, correct?

Exactly. I'd like to see if it comes from the libxcrypt update or the patchelf update. Having the wheel before auditwheel runs is the easiest path for that investigation.

@behrisch
Copy link
Author

The build is here: https://github.com/behrisch/sumo/actions/runs/3561204237 and it should be in the raw wheels artifact

@mayeut mayeut changed the title broken libcrypt since 2022-11-19-1b19e81 patchelf 0.17.0 can lead to broken repairs Dec 4, 2022
@mayeut
Copy link
Member

mayeut commented Dec 4, 2022

@behrisch,

Thanks for the raw wheels.
I could verify that the patchelf update is the culprit for your builds.
Using the following bisect script:

cat ../bisect.sh
#!/bin/bash

set -euxo pipefail

cd $(dirname ${BASH_SOURCE[0]})

pushd patchelf
./bootstrap.sh || exit 125
./configure || exit 125
make || exit 125
make install
make clean
popd
patchelf --version

rm -rf wheelhouse || true
./cp38/bin/pip uninstall -y eclipse_sumo || true
auditwheel repair eclipse_sumo-0.0.1-cp38-cp38-linux_x86_64.whl
./cp38/bin/pip install wheelhouse/eclipse_sumo-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 
./cp38/bin/marouter --version || exit 1

lead to:

42394e880bc5524122234fe2c2eaa043063ac581 is the first bad commit
commit 42394e880bc5524122234fe2c2eaa043063ac581
Author: Jörg Thalheim <joerg@thalheim.io>
Date:   Sat Nov 5 09:08:17 2022 -0400

    write out replace sections in original order
    
    Libc and other programs sometimes make assumption in which order
    sections.
    
    i.e. glibc expects that the strtab is after the symtab section: https://github.com/bminor/glibc/blob/9cc9d61ee12f2f8620d8e0ea3c42af02bf07fe1e/elf/dl-fptr.c#L179
    
    To decrease the likelyhood of breakages we keep the relative order the
    same when replacing section.

 src/patchelf.cc | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)
bisect found first bad commit

A workaround for your builds would be to pin patchelf for now using pipx install -f patchelf==0.16.1.0
It still requires some work to report the breakage with more info upstream.

@mayeut mayeut added the bug label Dec 4, 2022
@mayeut
Copy link
Member

mayeut commented Dec 4, 2022

It seems that only the marouter executable is broken (for this specific test). Using all libraries from patchelf 0.16.1 or 0.17.0 with an 0.16.1 repaired marouter executable works. Using all libraries from patchelf 0.16.1 or 0.17.0 with an 0.17.0 repaired marouter executable doesn't work.

@behrisch
Copy link
Author

behrisch commented Dec 4, 2022

Thanks a lot! I used your workaround and it seems to work just fine now. Also for the other executables which failed before. I am doing more extensive tests now.

@behrisch
Copy link
Author

behrisch commented Dec 9, 2022

Our tests are running fine now. Anything I can do to help with the bugreport for upstream?

@mayeut
Copy link
Member

mayeut commented Dec 10, 2022

@behrisch, it might be related to NixOS/patchelf#446 so checking if NixOS/patchelf#447 solves your issue might be worth a try (& commenting there if that's the case).
Given the bisect returns another commit here than the one mentioned in NixOS/patchelf#447, that might not work in your case in which case opening an issue there with the info found here is probably a good starting point even though the reproducer is not that simple.

@pramodk
Copy link

pramodk commented Jan 20, 2023

Hello @behrisch @mayeut!

We are having the same issue while building wheels for the NEURON project using latest manylinux2014_x86_64 images (containing patchelf >= 0.17.0). Similar to #1421 (comment), our binary also fails starting from the commit 42394e880bc5524122234fe2c2eaa043063ac581 (i.e. NixOS/patchelf/pull/430).

Given the bisect returns another commit here than the one mentioned in NixOS/patchelf#447, that might not work in your case in which case opening an issue there with the info found here is probably a good starting point even though the reproducer is not that simple.

I tried 0.17.2 containing NixOS/patchelf/pull/447 but that didn't help us. I have provided details in the NixOS/patchelf#446 (comment).

@rdb
Copy link
Contributor

rdb commented Aug 3, 2023

We also hit this on the Panda3D project; of the version 1.10.13 wheels we have on PyPI, several of our binaries segfault on run. Even calling ldd on them fails.

Downgrading to patchelf<0.17 fixes the problem.

I would suggest that manylinux ship an older patchelf for now until these issues are fixed upstream, since it is affecting multiple projects.

rdb added a commit to panda3d/buildbot-panda3d that referenced this issue Aug 3, 2023
Due to bugs causing executables to be corrupted, see pypa/manylinux#1421 and NixOS/patchelf#446

Fixes panda3d/panda3d#1504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants