Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove scipy from required dependencies #1471

Closed
laurenyu opened this issue May 7, 2020 · 6 comments
Closed

Remove scipy from required dependencies #1471

laurenyu opened this issue May 7, 2020 · 6 comments
Assignees
Projects
Milestone

Comments

@laurenyu
Copy link
Contributor

laurenyu commented May 7, 2020

One recurring issue is that the Python SDK is too big for AWS Lambda (#1200). The Lambda limit is 250 MB, and the Python SDK currently takes up 278 MB via a pip install sagemaker --target . and du -sh. The two biggest dependencies, at this point, are numpy and scipy. According to du -s, numpy takes 86 MB and scipy takes 126 MB.

scipy is used only for one function, and that function isn’t used anywhere in the Python SDK, so users who use the function likely already have scipy installed. We can follow in the line of Local Mode and TensorFlow dependencies (#1130), and use a DeferredError for the scipy import. Removing scipy would bring the Python SDK down to 152 MB.

Removing numpy in addition to scipy would bring the Python SDK down to 66 MB. Unfortunately, numpy is needed more widely: 1Ps, Scikit-learn, PyTorch, and Chainer. Given numpy‘s ubiquity, in addition to the fact that it’s not strictly needed for falling under the Lambda limit, let's keep numpy as a required dependency for now.

@ajaykarpur ajaykarpur added this to the v2.0.0 milestone May 7, 2020
@laurenyu laurenyu added this to To do in v2.0.0 May 7, 2020
@laurenyu laurenyu modified the milestones: v2.0.0, v2.0.0.rc0 May 7, 2020
@laurenyu laurenyu moved this from To do to In progress in v2.0.0 May 12, 2020
@rgommers
Copy link

Hi, NumPy/SciPy maintainer here. We get questions about binary size and AWS Lambda semi-regularly, and try to keep binary sizes reasonable / working for Lambda. Here's an example with perhaps useful info numpy/numpy#13465.

The total sizes for NumPy and SciPy you report here seem a little high. It's possible that binary stripping is missing for older releases, we just found a problem with that for the wheels on PyPI (MacPython/numpy-wheels#87). If I now check the 1.18.4 release for Linux wheels, it's 17.4 MB on PyPI and 55.1 MB after extracting the zip archive locally. Compared to your 86 MB above. SciPy 1.4.1 extracted from PyPI is 90.7 MB, compared to your 126 MB. Maybe you can strip binaries of your build to get to a similar size for NumPy?

Cc @rlucas7

@laurenyu
Copy link
Contributor Author

@rgommers thanks for reaching out! @rlucas7 had contacted me separately, and I'd been meaning to send an email to the scipy-dev list.

Good to know that downloading on Mac has an effect - scipy is still 126 MB on Mac for me, but on Ubuntu and Amazon Linux it is 91 MB, so that explains the discrepancy with the sizes.

I read through some of the links you listed in numpy/numpy#13465, but they all look like they require some work to be done after pip installing libraries. We've previously suggested ways for users to reduce the size of this SDK for use with AWS Lambda (e.g. #1200 (comment)), but that still feels like more of a workaround than a solution. Do you know if there's any way of stripping numpy/scipy through how we define it as a dependency in setup.py, setup.cfg, etc.?

@rgommers
Copy link

Last time I checked (which admittedly is a while ago) the Amazon Linux Python wasn't compatible with PyPI wheels, so one needed to rebuild anyway. Did that change?

Do you know if there's any way of stripping numpy/scipy through how we define it as a dependency in setup.py, setup.cfg, etc.?

It's kind of manual, one would have to write a script for it that basically cd's into the installed location and then strips binaries. There's not really a good way of hooking that into an install_requires entry.

@laurenyu
Copy link
Contributor Author

Last time I checked (which admittedly is a while ago) the Amazon Linux Python wasn't compatible with PyPI wheels, so one needed to rebuild anyway. Did that change?

Using the amazonlinux:latest Docker image, I think it worked for me:

bash-4.2# yum install python3
[...]
bash-4.2# pip3 install scipy
[...]
bash-4.2# python3
Python 3.7.6 (default, Feb 26 2020, 20:54:15)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy.special import cbrt
>>> cbrt([27, 64])
array([3., 4.])
>>>

It's kind of manual, one would have to write a script for it that basically cd's into the installed location and then strips binaries. There's not really a good way of hooking that into an install_requires entry.

Yeah, so we definitely could improve our documentation and maybe offer a tool that helps strip down this library, but I think ultimately there still is value in not listing scipy under install_requires because there are plenty of use cases with this library where a user wouldn't need scipy (e.g. most TensorFlow workflows that I've seen). We did something similar with pandas from the outset (source) because of concerns around size.

@rgommers
Copy link

I think ultimately there still is value in not listing scipy under install_requires because there are plenty of use cases with this library where a user wouldn't need scipy

Yes definitely, not disagreeing with that.

Yeah, so we definitely could improve our documentation and maybe offer a tool that helps strip down this library

That would be great.

@laurenyu laurenyu moved this from In progress to Under review in v2.0.0 May 21, 2020
@laurenyu
Copy link
Contributor Author

laurenyu commented Jun 9, 2020

since #1518 is merged, closing in favor of #1200

@laurenyu laurenyu closed this as completed Jun 9, 2020
@laurenyu laurenyu moved this from Under review to Done in v2.0.0 Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v2.0.0
  
Done
Development

No branches or pull requests

4 participants