You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi ,
thank you for this awesome project.
I want to apply DiffuSeq on a larger datasets (~17M sentences) but the tokenizing keeps blowing up my RAM, even though I have 200GB available! Is there a functionality that I am missing that uses cached tokens or is this work in progress?
Thanks again & best!
The text was updated successfully, but these errors were encountered:
It`s still not working properly. But I think that has something to do with padding and my sequence lengths. I have to investigate that further, but thank you for your help! :)
I found a small thing that accelerated the data loading time a lot:
Hi ,
thank you for this awesome project.
I want to apply DiffuSeq on a larger datasets (~17M sentences) but the tokenizing keeps blowing up my RAM, even though I have 200GB available! Is there a functionality that I am missing that uses cached tokens or is this work in progress?
Thanks again & best!
The text was updated successfully, but these errors were encountered: