Issues: google/sentencepiece
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
pip subprocess to install build dependencies did not run successfully. │ exit code: 1
#989
opened Mar 21, 2024 by
Anubiiss
High frequency token segmented into letter sequence when input is a tsv file
bug
#967
opened Jan 30, 2024 by
TingxunShi
A recent EMNLP work to share about task-adaptive tokenization with variable segmentation
#924
opened Oct 24, 2023 by
lsy641
Unexpected behavior with sampling of repeated character sequence.
#904
opened Aug 14, 2023 by
kellymarchisio
Python from source on armv7l raises ' undefined symbol: __atomic_fetch_add_8 '
#865
opened May 17, 2023 by
FrancescoScandiffio
tokens listed in user_defined_symbols tokenized as unknowns when using the "word" model_type
bug
#801
opened Dec 15, 2022 by
lintangsutawika
Sentencepiece with pre-defined vocabulary
feature request
Add new feature
help wanted
#571
opened Oct 22, 2020 by
vladmosin
How to create new model file with restricted vocabulary?
feature request
Add new feature
help wanted
#522
opened Jul 23, 2020 by
sshleifer
can we train by Parallel Computing or Multithreading or multi-Progress
feature request
Add new feature
#366
opened Jul 12, 2019 by
joytianya
Guidance on how to implement subword sampling at train time
sample code
Asks toprovide sample code
#103
opened Jun 14, 2018 by
sooheon
ProTip!
What’s not been updated in a month: updated:<2024-04-16.