Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed Up The Library #183

Open
Optimox opened this issue Sep 9, 2020 · 6 comments
Open

Speed Up The Library #183

Optimox opened this issue Sep 9, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@Optimox
Copy link
Collaborator

Optimox commented Sep 9, 2020

Feature request

We should try to implement all these training tricks in order to speed up as much as possible the implementaiton : https://www.youtube.com/watch?v=9mS1fIYj1So&ab_channel=ArunMallya

What is the expected behavior?

What is motivation or use case for adding/changing the behavior?

How should this be implemented in your opinion?

Are you willing to work on this yourself?
yes

@Optimox Optimox added the enhancement New feature or request label Sep 9, 2020
@Optimox
Copy link
Collaborator Author

Optimox commented Sep 9, 2020

Good list here :

@Optimox Optimox mentioned this issue Oct 12, 2020
@Optimox
Copy link
Collaborator Author

Optimox commented Oct 12, 2020

  • pin_memory=True
  • replace optimizer.zero_grad()

@Optimox
Copy link
Collaborator Author

Optimox commented Oct 29, 2021

can ghost batch norm take less time ? see #243 for more information and possible ideas

@Optimox
Copy link
Collaborator Author

Optimox commented Nov 18, 2022

try NVIDIA's merlin : https://github.com/NVIDIA-Merlin/Merlin ?

@Optimox Optimox reopened this Nov 18, 2022
@eduardocarvp
Copy link
Collaborator

Are you thinking about using NVTabular?
The package as a whole seems do deal exclusively with recommendation models, but maybe I missed something.

@Optimox
Copy link
Collaborator Author

Optimox commented Nov 21, 2022

I don't see why this would be specific to recommender systems.

from https://medium.com/nvidia-merlin/why-isnt-your-recommender-system-training-faster-on-gpu-and-what-can-you-do-about-it-6cb44a711ad4

Most people take this aspect of training for granted. Point the framework’s dataloader at the directory or files that you want the model to train on and you’re good to go. In situations where the model is dominated by compute this approach is often okay. An asynchronous dataloader only ever has to feed data faster than the forward and backward pass of the model. As long as it’s able to get the next batch ready before the GPU is done processing the current batch it has done its job.

For NLP or Vision architectures compute is significant relative to time taken to get the data to the model. You’re also usually working with small batches of large examples and the strategies that work there aren’t optimal for the kind of data that recommender systems use. Properly tuning the dataloader and I/O is important, and I highly recommend you check out the work being done by the DALI team at NVIDIA if your workload includes images or audio, but you can’t apply the same principles to recommendation.

For starters, not only is the compute smaller, but at the example level and even at the batch level recommender system data is usually quite small. Most dataloaders work by aggregating randomly selected examples into batches and then passing that information to the GPU. In future blog posts we’ll deep dive into the specifics of the different dataloaders available in deep learning frameworks but for this blog we’ll focus on the common dataloader case where batches are aggregated from random examples. Even if we try to solve the problem by piling on more workers to create batches, we’re still hammering on memory (or more likely disk) in an access pattern that is horribly inefficient. Pulling data example by example just doesn’t make sense for tabular data.

So I guess it is worth a try, might work, might not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants