New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove scipy from required dependencies #1471
Comments
Hi, NumPy/SciPy maintainer here. We get questions about binary size and AWS Lambda semi-regularly, and try to keep binary sizes reasonable / working for Lambda. Here's an example with perhaps useful info numpy/numpy#13465. The total sizes for NumPy and SciPy you report here seem a little high. It's possible that binary stripping is missing for older releases, we just found a problem with that for the wheels on PyPI (MacPython/numpy-wheels#87). If I now check the 1.18.4 release for Linux wheels, it's 17.4 MB on PyPI and 55.1 MB after extracting the zip archive locally. Compared to your 86 MB above. SciPy 1.4.1 extracted from PyPI is 90.7 MB, compared to your 126 MB. Maybe you can strip binaries of your build to get to a similar size for NumPy? Cc @rlucas7 |
@rgommers thanks for reaching out! @rlucas7 had contacted me separately, and I'd been meaning to send an email to the scipy-dev list. Good to know that downloading on Mac has an effect - scipy is still 126 MB on Mac for me, but on Ubuntu and Amazon Linux it is 91 MB, so that explains the discrepancy with the sizes. I read through some of the links you listed in numpy/numpy#13465, but they all look like they require some work to be done after pip installing libraries. We've previously suggested ways for users to reduce the size of this SDK for use with AWS Lambda (e.g. #1200 (comment)), but that still feels like more of a workaround than a solution. Do you know if there's any way of stripping numpy/scipy through how we define it as a dependency in |
Last time I checked (which admittedly is a while ago) the Amazon Linux Python wasn't compatible with PyPI wheels, so one needed to rebuild anyway. Did that change?
It's kind of manual, one would have to write a script for it that basically |
Using the
Yeah, so we definitely could improve our documentation and maybe offer a tool that helps strip down this library, but I think ultimately there still is value in not listing scipy under |
Yes definitely, not disagreeing with that.
That would be great. |
One recurring issue is that the Python SDK is too big for AWS Lambda (#1200). The Lambda limit is 250 MB, and the Python SDK currently takes up 278 MB via a
pip install sagemaker --target .
anddu -sh
. The two biggest dependencies, at this point, are numpy and scipy. According todu -s
, numpy takes 86 MB and scipy takes 126 MB.scipy is used only for one function, and that function isn’t used anywhere in the Python SDK, so users who use the function likely already have scipy installed. We can follow in the line of Local Mode and TensorFlow dependencies (#1130), and use a
DeferredError
for the scipy import. Removing scipy would bring the Python SDK down to 152 MB.Removing numpy in addition to scipy would bring the Python SDK down to 66 MB. Unfortunately, numpy is needed more widely: 1Ps, Scikit-learn, PyTorch, and Chainer. Given numpy‘s ubiquity, in addition to the fact that it’s not strictly needed for falling under the Lambda limit, let's keep numpy as a required dependency for now.
The text was updated successfully, but these errors were encountered: