Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The example_model.py file grinds my laptop to a hold on a 16RAM memory hardware #132

Open
Raynos opened this issue Jan 29, 2023 · 1 comment

Comments

@Raynos
Copy link

Raynos commented Jan 29, 2023

I have a reasonable recent laptop.

The example_model maxed out all 16gb of my RAM and used 9gb of swap and my whole laptop was unusable for anything else.

Is there a way to run the program and limit it's RAM usage to 8gb or something where I can continue to use my laptop for browsing or code editing experiences whilst having the model be trained ?

Or should the minimal system requirements be bumped to 32gb of RAM ?

Could the parquet files be read and written from in a format key value db like lmdb or rocksdb to reduce the reliance on having to upgrade my laptop from 16gb of ram to 32gb of ram ?

Alternatively should we add instructions on how to SSH into an EC2 allocated with 32gb of RAM for the purposes of running the example scripts ?

Laptop overview: ( 6 core i7 @ 2.6ghz, 16gb ram, 256gb SSD )

image

@Nimmerfall
Copy link

There are some things you can do, to optimize Memory usage:

  • Only read features / targets you really want to work with. If you don't use all of them, it might be worth a try to store the list of features / targets, that you are actually using.
  • Consider downcasting the data types or work with integer dataset. You can usually downcast the dtypes to float16 or int8 without loosing noticably precision.
  • Take a closer look at parquet functions. There are a few things you can consider. You can read data in smaller batches and work with it (e.g. downcast datatypes, or train your model iteratively). Also there is a filter parameter, which you can use (e.g. only read every X era, or other conditions)
  • As mentioned, train your model iteratively. If your type of model allows it you can do the training in iterations. This not only helps with memory usage, if you build your pipeline that way, you can easily include additional training data to your model as more data is given every week
  • Try to optimize garbage collection usage and avoid copies if you dont need them. Always think about what objects you really need and when / how long you need them.

This can help for memory usage for sure, but some of these strategies might have a tradeoff in time consumtion (read in batches) or prediction performance (downcasting dtypes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants