Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

learn_bpe memeory issue is back (on luajit) ? #481

Open
vince62s opened this issue Jan 1, 2018 · 3 comments
Open

learn_bpe memeory issue is back (on luajit) ? #481

vince62s opened this issue Jan 1, 2018 · 3 comments

Comments

@vince62s
Copy link
Member

vince62s commented Jan 1, 2018

Hi,
After this commit ccd7e03
I had no issue to learn a BPE model with many millions of sentences even on Luajit.

On master the memory issue is back:

[01/01/18 12:53:44 INFO] Generating merge operations to output	
/torch/install/bin/luajit: not enough memory

I am trying to go back in time to see when it came back.

@vince62s
Copy link
Member Author

vince62s commented Jan 2, 2018

update:
this seems to happen only with a specific corpus. another one bigger than this works fine.

@jsenellart
Copy link
Contributor

Hello Vincent, would you mind sending me a link to your corpus so that I can try to reproduce? learn_bpe should not be using much memory.

@vince62s
Copy link
Member Author

vince62s commented Jan 3, 2018

it's really weird.
in the 4 files of the corpus, 1 seems to be an issue.
however if I trim that file removing some "very long words" (eg words > 40 characters) the file is fine.
BUT if I learn_bpe with the 4 files altogether I still have an error:

[01/03/18 11:03:04 INFO] Getting pair statistics from vocabulary
[01/03/18 11:07:08 INFO] Generating merge operations to output
PANIC: unprotected error in call to Lua API (not enough memory)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants