Speed Up The Library #183

Optimox · 2020-09-09T11:40:57Z

Feature request

We should try to implement all these training tricks in order to speed up as much as possible the implementaiton : https://www.youtube.com/watch?v=9mS1fIYj1So&ab_channel=ArunMallya

What is the expected behavior?

What is motivation or use case for adding/changing the behavior?

How should this be implemented in your opinion?

Are you willing to work on this yourself?
yes

Optimox · 2020-09-09T11:48:31Z

Good list here :

https://twitter.com/karpathy/status/1299921324333170689/photo/1
also we could use an optimizer specialized in large batch size (LARS, LAMB, NVLAMB, NovoGrad?)
could we @torch.jit.script some functions? like sparsemax?

Optimox · 2020-10-12T17:24:47Z

pin_memory=True
replace optimizer.zero_grad()

Optimox · 2021-10-29T09:27:14Z

can ghost batch norm take less time ? see #243 for more information and possible ideas

Optimox · 2022-11-18T17:21:18Z

try NVIDIA's merlin : https://github.com/NVIDIA-Merlin/Merlin ?

eduardocarvp · 2022-11-18T17:50:50Z

Are you thinking about using NVTabular?
The package as a whole seems do deal exclusively with recommendation models, but maybe I missed something.

Optimox · 2022-11-21T09:20:25Z

I don't see why this would be specific to recommender systems.

from https://medium.com/nvidia-merlin/why-isnt-your-recommender-system-training-faster-on-gpu-and-what-can-you-do-about-it-6cb44a711ad4

Most people take this aspect of training for granted. Point the framework’s dataloader at the directory or files that you want the model to train on and you’re good to go. In situations where the model is dominated by compute this approach is often okay. An asynchronous dataloader only ever has to feed data faster than the forward and backward pass of the model. As long as it’s able to get the next batch ready before the GPU is done processing the current batch it has done its job.

For NLP or Vision architectures compute is significant relative to time taken to get the data to the model. You’re also usually working with small batches of large examples and the strategies that work there aren’t optimal for the kind of data that recommender systems use. Properly tuning the dataloader and I/O is important, and I highly recommend you check out the work being done by the DALI team at NVIDIA if your workload includes images or audio, but you can’t apply the same principles to recommendation.

For starters, not only is the compute smaller, but at the example level and even at the batch level recommender system data is usually quite small. Most dataloaders work by aggregating randomly selected examples into batches and then passing that information to the GPU. In future blog posts we’ll deep dive into the specifics of the different dataloaders available in deep learning frameworks but for this blog we’ll focus on the common dataloader case where batches are aggregated from random examples. Even if we try to solve the problem by piling on more workers to create batches, we’re still hammering on memory (or more likely disk) in an access pattern that is horribly inefficient. Pulling data example by example just doesn’t make sense for tabular data.

So I guess it is worth a try, might work, might not work.

Optimox added the enhancement New feature or request label Sep 9, 2020

Optimox assigned Hartorn, Optimox, j-abi and eduardocarvp Sep 9, 2020

Optimox mentioned this issue Oct 12, 2020

feat: speedups #200

Merged

Optimox unassigned j-abi Nov 9, 2021

Optimox closed this as completed Sep 20, 2022

Optimox reopened this Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed Up The Library #183

Speed Up The Library #183

Optimox commented Sep 9, 2020

Optimox commented Sep 9, 2020 •

edited

Optimox commented Oct 12, 2020 •

edited

Optimox commented Oct 29, 2021

Optimox commented Nov 18, 2022

eduardocarvp commented Nov 18, 2022

Optimox commented Nov 21, 2022

Speed Up The Library #183

Speed Up The Library #183

Comments

Optimox commented Sep 9, 2020

Feature request

Optimox commented Sep 9, 2020 • edited

Optimox commented Oct 12, 2020 • edited

Optimox commented Oct 29, 2021

Optimox commented Nov 18, 2022

eduardocarvp commented Nov 18, 2022

Optimox commented Nov 21, 2022

Optimox commented Sep 9, 2020 •

edited

Optimox commented Oct 12, 2020 •

edited