Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Runtime error while calling the tokenizer #93

Open
patilaum opened this issue Apr 10, 2023 · 0 comments
Open

Getting Runtime error while calling the tokenizer #93

patilaum opened this issue Apr 10, 2023 · 0 comments

Comments

@patilaum
Copy link

patilaum commented Apr 10, 2023

Hi,
Thanks for the great repo.

Getting following error

Traceback (most recent call last):
  File "marathi_support_file.py", line 241, in <module>
    print(tokenize(hindi_text, "mr"))
  File "/home/aum/my_tensorflow/marenv/lib/python3.7/site-packages/inltk/inltk.py", line 62, in tokenize
    tok = LanguageTokenizer(language_code)
  File "/home/aum/my_tensorflow/marenv/lib/python3.7/site-packages/inltk/tokenizer.py", line 14, in __init__
    self.base = EnglishTokenizer(lang) if lang == LanguageCodes.english else IndicTokenizer(lang)
  File "/home/aum/my_tensorflow/marenv/lib/python3.7/site-packages/inltk/tokenizer.py", line 63, in __init__
    self.sp.Load(str(model_path))
  File "/home/aum/my_tensorflow/marenv/lib/python3.7/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/aum/my_tensorflow/marenv/lib/python3.7/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 

while running following code snippet

from inltk.inltk import setup

setup('mr')

from inltk.inltk import tokenize

hindi_text = """संभाजीनगरमध्ये घडलेली घटना दुर्दैवी आहे. काही लोकांकडून भडकाऊ भाषण देऊन परिस्थिती चिघळवण्याचा प्रयत्न 
सुरू आहे. अशा परिस्थितीत काय बोलावं, याचं भान प्रत्येकाने ठेवायला हवं. सर्वांनी शांतता राखायला हवी. 
आपलं शहर शांत ठेवण्याची जबाबदारी प्रत्येकाची आहे. या घटनेला कोणी राजकीय रंग देत असतील तर यापेक्षा जास्त दुर्दैवी काहीही नाही, 
अशी प्रतिक्रिया देवेंद्र फडणवीस यांनी दिली."""
print(tokenize(hindi_text, "mr"))

I thought it was issue with version of torch, so I install python3.7 and install torch 0.3.0+cpu on virtualenv of python3.7, but still getting same issue.

Can you please help me with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant