Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTO breaks StaticObject on Fedora 37 #783

Open
benmwebb opened this issue Mar 20, 2023 · 1 comment
Open

LTO breaks StaticObject on Fedora 37 #783

benmwebb opened this issue Mar 20, 2023 · 1 comment

Comments

@benmwebb
Copy link

Code built with g++ with link time optimization (LTO) fails with a "Trying to save an unregistered polymorphic type" exception. The same code works fine without LTO. This is on a stock Fedora 37 machine (with gcc 12.2.1, cereal 1.3.2).

My code is a large mixed C++ and Python project, but I boiled it down to a minimal reproducer here: cereal_test.zip

A.h defines and registers a polymorphic type Wrapped and a class Container that stores a shared_ptr<Wrapped>. B.h registers a Wrapped subclass BWrapped. We build two dynamic libraries libA.so and libB.so and wrap each with SWIG so they can be used from Python as A.py and B.py (note we build only the A wrapper with -flto):

g++ -fPIC -Wall -shared A.cpp -o libA.so
g++ -fPIC -Wall -shared B.cpp -o libB.so
swig -python -c++ A.i
swig -python -c++ B.i
g++ -flto -fPIC -shared A_wrap.cxx -I/usr/include/python3.11 -o _A.so -L. -lA
g++ -fPIC -shared B_wrap.cxx -I/usr/include/python3.11 -o _B.so -L. -lA -lB

If we then try to serialize a Container object that contains a BWrapped in Python (the _get_as_binary method uses cereal to write Container to a BinaryOutputArchive and then returns the resulting data), it fails:

$ cat test.py
import A, B
w = B.BWrapped()
c = A.Container(w)
print(c._get_as_binary())
$ python3 test.py
terminate called after throwing an instance of 'cereal::Exception'
  what():  Trying to save an unregistered polymorphic type (BWrapped).

If we rebuild A without LTO though, it works fine:

$ g++ -fPIC -shared A_wrap.cxx -I/usr/include/python3.11 -o _A.so -L. -lA
$ python3 test.py
b'\x01\x00\x00\x80\x08\x00\x00\x00\x00\x00\x00\x00BWrapped\x01\x00\x00\x80'

It looks like the problem is that LTO causes StaticObject to not work correctly. If we add to A.h a function

void show_a_output_binding_map() {
  auto const & bindingMap = cereal::detail::StaticObject<cereal::detail::OutputBindingMap<cereal::BinaryOutputArchive>>::getInstance().map;
  std::cerr << "A map is at " << &bindingMap << std::endl;
}

and a similar function to B.h then with LTO we see

$ cat test.py
import A, B
A.show_a_output_binding_map()
B.show_b_output_binding_map()
w = B.BWrapped()
c = A.Container(w)
print(c._get_as_binary())
$ python3 test.py
A map is at 0x7f029b6fec40
B map is at 0x7f029b3ff540
terminate called after throwing an instance of 'cereal::Exception'
  what():  Trying to save an unregistered polymorphic type (BWrapped).

i.e. StaticObject is not a singleton so when B registers BWrapped, A cannot see it. (Without LTO, the address printed for A map and B map is the same.)

I see cereal has specific code (in detail/static_object.hpp) to try to prevent link optimization from breaking StaticObject, but it seems not to be working here. Obviously an easy workaround is "don't use LTO" but I'd like to find a better solution. I can modify the SWIG interface, so perhaps I can add some code to the generated modules that explicitly references StaticObject and so persuades the linker not to mangle the code?

benmwebb added a commit to salilab/imp that referenced this issue Mar 22, 2023
In order to correctly serialize a polymorphic pointer
we need the most-derived type. cereal includes machinery
for this but it relies on the linker to make sure that
certain objects are unique in the process, and this doesn't
work well with link time optimization, or on Windows,
as per USCiLab/cereal#783. Provide our own similar
mechanism that registers Object subclasses in precisely
one place - the Object class itself in IMP.kernel.
@benmwebb
Copy link
Author

FWIW, I see the exact same issue when building for Windows (I use MSVS 2015, for 64-bit). (The reproducer code is similar, except that functions need the usual dllexport/import tags so that DLLs work.)

Our workaround for now, linked above, adds a map of serialize/deserialize functions to our application itself, so we can be sure they're stored only in one place. Works for us but it is definitely not as general as cereal's polymorphic machinery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant