`CUDA out of memory` for larger datasets during attribution #191

lsickert · 2023-06-03T16:22:59Z

🐛 Bug Report

When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size. I believe that is is caused by the call to self.encode inattribution_model.py lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.

🔬 How To Reproduce

Steps to reproduce the behavior:

Load any model without pre-generated targets
Load a larger dataset with at least 1000 samples
Call the .attribute() method with any batch_size parameter

Code sample

Environment

OS: macOS
Python version: 3.10
Inseq version: 0.4.0

Expected behavior

The input texts should ideally only be encoded or moved to the GPU once they are actually processed.

Additional context

The text was updated successfully, but these errors were encountered:

lsickert · 2023-06-03T16:24:54Z

@gsarti Maybe you could try to confirm this behavior since I only seem to come across this when I run inseq on GPU.

gsarti · 2023-06-03T16:53:09Z

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

lsickert · 2023-06-03T17:26:32Z

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

You are correct. Since I can only reproduce this on CUDA (Habrok cluster), I did not have the version from main but the latest published version installed. It seems to be the same issue as #183 and is mostly fixed when I run it with the current version.

I do still encounter some issues, though, when I try to run my full dataset (1m samples) with it at once (which I did not have in previous experiments since I was running inseq in a batched loop). So I think the calls to self.encode() that already move all the inputs to the GPU might still be a bit of an issue with large datasets.

Maybe it makes sense to only move them to the GPU during the generate function and later on during the attribution function when they are batched?

gsarti · 2023-07-19T12:45:53Z

Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU memory will crash the entire attribution halfway.

A possible solution would be to have the option to sort inputs by length in terms of # of tokens in model.attribute (true by default). In this way, the top N (batch_size) longest examples will be attributed first, ensuring the process to either crash there or terminate later. Importantly, if sorting is activated, it should apply to all position-dependent fields (all parameters except step_scores where the argument is a list of strings)

g8a9 · 2023-07-19T13:08:09Z

I'm not super keen on the idea of internal sorting by length as it could break some outer user-define logic. But I don't even know what's the best option here (maybe looking for the maximum length batch and running that first?)

Also, keep in mind that this dry run might take quite some time if sentences are very long -- so you might want not to repeat it if successful.

gsarti · 2023-07-19T13:20:05Z

Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there shouldn't be a problem for user logics right?

g8a9 · 2023-07-19T13:38:29Z

Yea if the original order is preserved I guess that works.

lsickert · 2023-07-21T16:12:52Z

Might it also be an option to catch these errors and either skip the relevant example (with a logged warning, of course) or move just this one to the CPU for processing and then continue with the next batch on GPU?

gsarti · 2023-07-24T07:26:24Z

Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that a lower batch size is required

lsickert added the bug Something isn't working label Jun 3, 2023

gsarti added this to the v0.5 milestone Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CUDA out of memory` for larger datasets during attribution #191

`CUDA out of memory` for larger datasets during attribution #191

lsickert commented Jun 3, 2023

lsickert commented Jun 3, 2023

gsarti commented Jun 3, 2023

lsickert commented Jun 3, 2023 •

edited

gsarti commented Jul 19, 2023 •

edited

g8a9 commented Jul 19, 2023

gsarti commented Jul 19, 2023

g8a9 commented Jul 19, 2023

lsickert commented Jul 21, 2023

gsarti commented Jul 24, 2023

CUDA out of memory for larger datasets during attribution #191

CUDA out of memory for larger datasets during attribution #191

Comments

lsickert commented Jun 3, 2023

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior

Additional context

lsickert commented Jun 3, 2023

gsarti commented Jun 3, 2023

lsickert commented Jun 3, 2023 • edited

gsarti commented Jul 19, 2023 • edited

g8a9 commented Jul 19, 2023

gsarti commented Jul 19, 2023

g8a9 commented Jul 19, 2023

lsickert commented Jul 21, 2023

gsarti commented Jul 24, 2023

`CUDA out of memory` for larger datasets during attribution #191

`CUDA out of memory` for larger datasets during attribution #191

lsickert commented Jun 3, 2023 •

edited

gsarti commented Jul 19, 2023 •

edited