Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we train by Parallel Computing or Multithreading or multi-Progress #366

Open
joytianya opened this issue Jul 12, 2019 · 6 comments
Open
Assignees
Labels
feature request Add new feature

Comments

@joytianya
Copy link

can we train by Parallel Computing or Multithreading or multi-Progress?
Speed up training
thank you

@yutkin
Copy link

yutkin commented Jul 22, 2019

@joytianya Yes, we can! For example, look at YouTokenToMe. This BPE implementation quite efficiently uses parallel processing.

@taku910
Copy link
Collaborator

taku910 commented Aug 2, 2019

Thank you. I will take a look. Actually, the current BPE algorithm is a little conservative to find the most frequent pairs.

@taku910 taku910 added the feature request Add new feature label Jan 10, 2021
@taku910 taku910 self-assigned this May 2, 2023
@taku910
Copy link
Collaborator

taku910 commented May 2, 2023

Will work on it in the next release.

@lockmatrix
Copy link

I am really looking forward to parallel training,
as running Asian language corpora on multi-core computers is extremely slow, making me feel like I am wasting my CPU...

@heyaudace
Copy link

Will work on it in the next release.

Hi @taku910 - I was wondering whether this feature was released. Thank you

@ganeshkrishnan1
Copy link

Just tagging along: is it possible to use multi threaded tokenization for multi-cpu training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Add new feature
Projects
None yet
Development

No branches or pull requests

6 participants