Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop bundling shared libs into wheels and reconsider your packaging and building practices #334

Open
KOLANICH opened this issue May 22, 2023 · 9 comments

Comments

@KOLANICH
Copy link

KOLANICH commented May 22, 2023

Is your feature request related to a problem? Please describe.
Given the grave bugs related to shared libs building and installation (#329, #330, #332, #333 at least) and taking into account other negative consequences of bundling libraries:

  • bloat
  • not getting improvements of deps
  • incompatibility with distros policies

it is proposed to change the way this lib is built and installed.

Describe the solution you'd like

  1. No dependencies should be vendored, inlined and submoduled. The code must be compatible to latest versions of each of its dependency. The mandatory features of each deps needed should be documented.
  2. All the deps should be installed manually by a builder from packages using the package manager specific to his system before building this lib. It can be automated via a bash script detecting the system and installing the needed packages by names (and given a user directions how to use CMake and CPack to download, build and install the missing packages for his system, if they are not in the repos, but it shouldn't be done automatically.
  3. End users of prebuilt packages usually don't have to install anything manually - if the metadata is right, package manager will do it for them.
  4. For Windows MSYS2 is a free and open-source set of stuff needed for building (compilers, build tools, prebuilt shared libs and their headers and so on) using pacman package manager. For each dep there should be code creating a pacman package, since there is no support of pacman in CPack.
  5. All the packages with C++ libs are made of 2 parts, one is the lib package with lib* name containing the shared lib itself and only it, and lib*-dev (Debian convention) and lib*-devel (RPM-based distros convention) SDK package.
  6. C++ libs should be built separately with CMake, packaged with CPack. It is not the way distros build packages, but the packages generated by CPack are good enough for end users to build and install them.
  7. setup.py builds only the cext for Python and links to the shared libs installed into the system. It DOESN'T bundle any other libs.

Describe alternatives you've considered
There is no alternatives other than keeping the current mess.

@chao-qu-skydio
Copy link
Contributor

No dependencies should be vendored, inlined and submoduled. The code must be compatible to latest versions of each of its dependency. The mandatory features of each deps needed should be documented.

I agree with this. I've previously tried to make a conda package for symforce but wasn't successful due to various reasons.
For example, I don't think we are able to use the latest fmt version (version 10) since it deprecated the ostream operator.

@oursland
Copy link

@chao-qu-skydio I am also interested in a conda package for symforce. Do you have a feedstock repo that you were working on that I may look at?

@chao-qu-skydio
Copy link
Contributor

There is this closed PR from my previous attempt.
conda-forge/staged-recipes#21361

@aaron-skydio
Copy link
Member

Generally I agree with a lot of this, but want to be very specific about what is desired here.

We currently distribute SymForce through PyPI, and not through any system package managers. We would love to also distribute SymForce through other means, and welcome contributions towards this, including Conda (#212 / conda-forge/staged-recipes#21361), or package managers associated with various Linux distros. However, we aren't going to stop distributing through PyPI.

The standard for distribution of python wheels on Linux is PEP 600, as far as I'm aware. The PEP explains all of this and the reasoning better than I would regurgitate it here. Specifically this applies to your point (6) and handling of shared libraries.

No dependencies should be vendored

We currently vendor* exactly two dependencies that are available elsewhere, symengine and symenginepy (eigen_lcm and skymarshal are other Skydio projects that we've open-sourced with SymForce and only live in symforce/third_party). symengine and symenginepy are somewhat heavily modified, SymForce does not work with stock distributed SymEngine. If we could use them unmodified, we wouldn't vendor them.

*I define vendor as "include the source in the SymForce repo"

No dependencies should be inlined and submoduled

I'm not sure exactly what you mean here. I assume you're referring to use of FetchContent to pull in libraries that are not already installed on the system, but this is not mandatory for any of our dependencies, you're welcome to compile SymForce against installed versions of all of those things. I think maybe the one exception is METIS, but we could make that work as well. Generally, I'm not sure what the negative consequences are of this specifically.

The code must be compatible to latest versions of each of its dependency

I don't think anyone would disagree that this is desirable. Feel free to contribute compatibility with newer releases of any dependency you'd like, or call out specific released versions of dependencies that are important / blocking, otherwise we'll do this as we can / as it's needed for other reasons. The main one we're aware of is fmt / spdlog, which already has a PR #326 , discussion should happen there or on a separate issue. The other one is probably SymEngine, which we'd like to upgrade, but is complicated for the reasons mentioned above and below, and isn't a compatibility issue in the same way since we are only compatible with our modified version of SymEngine, which is built and shipped with SymForce.


Primarily, I think one thing we could do here is distribute our copy of SymEngine on its own, and not build our copy of SymEngine as part of the SymForce build - it's by far the most bug-prone and convoluted part of our build. Plus, the majority of people building SymForce from source are not modifying SymEngine. The tradeoff is that it makes the build process more annoying for people who are modifying SymEngine, and requires we track and depend on the specific version of our copy of SymEngine associated with the current version of SymForce, but that's probably worth it.

@isuruf
Copy link

isuruf commented Jul 19, 2023

A SymEngine maintainer here. What modifications of SymEngine are you referring to here? Are they upstreamable?

@aaron-skydio
Copy link
Member

They are! (at least most of them) We would love to upstream them.

The "biggest" one in terms of code changes is making the type of Add::dict_ configurable, so you can use a std::map or other container that's deterministic across compilers/toolchains/stdlib implementations, from talking with Ondrej it sounded like yall are aware of this nondeterminism but I haven't looked at upstream symengine in a while to know if that's already configurable there. That one is cbb2618 in symforce.

1e5d381 has other misc derivatives and functions, which are mostly upstreamable assuming yall haven't already added them. There are some minor assumptions there that we'd want to refine (e.g. for reals vs complex).

And then there are some other assorted changes, the full commit log for git log third_party/symengine* (symengine and symenginepy) is here:
log.txt

@isuruf
Copy link

isuruf commented Jul 19, 2023

The "biggest" one in terms of code changes is making the type of Add::dict_ configurable, so you can use a std::map or other container that's deterministic across compilers/toolchains/stdlib implementation

We do know that they are non-deterministic. However, that should be an implementation detail and the user should not be affected by that. For example, when we print the expression we sort them, so that the user is not affected by this implementation detail. If a user is affected, I would consider that a bug. (How to fix any bug without sacrificing performance might be hard, but probably not impossible)

It's not configurable at the moment, but I don't think anyone would mind making it configurable. However, you'll have to ask distro managers to use that configure option to make symforce work with symengine in distro packages.

Other changes look good from cursory look.

@aaron-skydio
Copy link
Member

aaron-skydio commented Jul 19, 2023

We do know that they are non-deterministic. However, that should be an implementation detail and the user should not be affected by that. For example, when we print the expression we sort them, so that the user is not affected by this implementation detail. If a user is affected, I would consider that a bug. (How to fix any bug without sacrificing performance might be hard, but probably not impossible)

We were definitely seeing effects, our generated functions were different. We could try and put together a minimal example that shows this if yall are interested in digging into that? It's possible we could add sorts in the appropriate places, we didn't attempt this. It'd definitely already be a big step in the right direction for us to be able to just build upstream symengine, or even symengine with some much smaller patches, with some compile flags, even if it doesn't take us all the way to being able to use an already-distributed copy.

We won't necessarily have time soon to actually pull things out and make PRs to upstream symengine, if someone on the symengine team or anyone else is interested in doing that that would be awesome, and we're happy to assist - otherwise we'll get around to it eventually

@isuruf
Copy link

isuruf commented Jul 19, 2023

We could try and put together a minimal example that shows this if yall are interested in digging into that?

Yes, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants