Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs with PyTorch data loaders when num_workers > 0 #34

Open
ntoxeg opened this issue Mar 22, 2024 · 0 comments
Open

Hangs with PyTorch data loaders when num_workers > 0 #34

ntoxeg opened this issue Mar 22, 2024 · 0 comments

Comments

@ntoxeg
Copy link

ntoxeg commented Mar 22, 2024

OS: Ubuntu 22.04
Python version: 3.11.8
PyTorch version: 2.2.1
Tokenmonster package version: 1.1.12
Other libraries: lightning==2.2.1, datasets==2.18.0

Like in the title, I load the tokenizer with load_multiprocess_safe, the dataset is just a bunch of plain text files to load and tokenize. I have tested each stage of loading and there are no problems until I wrap it in a DataLoader and use num_workers > 0, it hangs forever then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant