New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArabicStemmer AttributeError #1852
Comments
@richbalmer Thanks for reporting the issue. @LBenzahia Could you help to look into this? Thanks in advance! |
Hi @richbalmer thank you for reporting, First word 'تسدد' is the best possible stem because Snowball arabic stemmer based on light stemming algorithm deals with prefixes/suffixes, if you are looking for the root of "تسدد" you can use ISRI (root-based stemmer/deep stemming), The second word 'من' is a stop word, you should use stop word filter before start using Snowball ArabicStemmer, Also this stemmer doesn't deal with the case when the word have 2 letters. |
@LBenzahia thanks for looking into this so quickly! I'm getting:
Which also appears to be causing the tests to fail on Jenkins (https://nltk.ci.cloudbees.com/job/pull_request_tests/454/TOXENV=py27-jenkins,jdk=jdk8latestOnlineInstall/testReport/nose.failure/Failure/runTest/). I think all you need to do is put Also, after fixing that locally I get a UnicodeWarning:
It might be worth making those stopwords unicode strings. Other than that it looks like your fix works nicely for me - thanks again! p.s. One other suggestion: testing set inclusion is quite a lot faster than list inclusion, so it might be worth making that stopword list a set instead. |
@richbalmer are you using python2.7 ? ,
done for python2.7 , test it again and tell me,It works fine for me. i've updated the PR |
Yup I'm using 2.7. Looking good @LBenzahia - thanks again! |
Still having the error : I'm using python 3 |
Fix issue ArabicStemmer AttributeError #1852
I'm failing to stem certain Arabic terms using the SnowballStemmer. Many terms are stemmed successfully but some terms cause an AttributeError to be raised. Please see below for a minimal example that fails on the term 'from'.
The text was updated successfully, but these errors were encountered: