“IndexError: string index out of range” with nltk library - to delete #1732

Diyago · 2017-05-21T19:41:29Z

I'm using last possible version of nltk library - 3.2.4 with python 2.7+, but the error is still persist, which was first time described here (SO) and here #1261

The goal is to apply stemmer to dataframe:

import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer

import nltk
porter = PorterStemmer()
train = pd.read_csv("train.csv")
    
def stem_str(x,stemmer=SnowballStemmer('english')):
      x = text.re.sub("[^a-zA-Z0-9]"," ", x)
      x = (" ").join([stemmer.stem(z) for z in x.split(" ")])
      x = " ".join(x.split())
      return x
    
train['col2'] = train['col1'].astype(str).apply(lambda x:stem_str(x.lower(),porter))

As a result I get such error:

/home/.../anaconda2/lib/python2.7/site-packages/nltk/stem/porter.pyc in _ends_double_consonant(self, word)
    212         """
    213         return (
--> 214             len(word) >= 2 and
    215             word[-1] == word[-2] and
    216             self._is_consonant(word, len(word)-1)

IndexError: string index out of range

Full stack of the code:

IndexError                                Traceback (most recent call last)
<ipython-input-25-58ca95c5b364> in <module>()
----> 1 main()

<ipython-input-24-1a1fab0e5ac4> in main()
     15     print('Generate porter')
     16 
---> 17     train['question1_porter'] = train['question1'].astype(str).apply(lambda x:stem_str(x.lower(),porter))
     18     test['question1_porter'] = test['question1'].astype(str).apply(lambda x:stem_str(x.lower(),porter))
     19 

/home/analyst/anaconda2/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)()

<ipython-input-24-1a1fab0e5ac4> in <lambda>(x)
     15     print('Generate porter')
     16 
---> 17     train['question1_porter'] = train['question1'].astype(str).apply(lambda x:stem_str(x.lower(),porter))
     18     test['question1_porter'] = test['question1'].astype(str).apply(lambda x:stem_str(x.lower(),porter))
     19 

<ipython-input-18-3b87bf648e19> in stem_str(x, stemmer)
     37 def stem_str(x,stemmer=SnowballStemmer('english')):
     38         x = text.re.sub("[^a-zA-Z0-9]"," ", x)
---> 39         x = (" ").join([stemmer.stem(z) for z in x.split(" ")])
     40         x = " ".join(x.split())
     41         return x

/home/analyst/anaconda2/lib/python2.7/site-packages/nltk/stem/porter.pyc in stem(self, word)
    663             return word
    664 
--> 665         stem = self._step1a(stem)
    666         stem = self._step1b(stem)
    667         stem = self._step1c(stem)

/home/analyst/anaconda2/lib/python2.7/site-packages/nltk/stem/porter.pyc in _step1b(self, word)
    374             (
    375                 '',
--> 376                 'e',
    377                 lambda stem: (self._measure(stem) == 1 and
    378                               self._ends_cvc(stem))

/home/analyst/anaconda2/lib/python2.7/site-packages/nltk/stem/porter.pyc in _apply_rule_list(self, word, rules)
    256         """
    257         for rule in rules:
--> 258             suffix, replacement, condition = rule
    259             if suffix == '*d' and self._ends_double_consonant(word):
    260                 stem = word[:-2]

/home/analyst/anaconda2/lib/python2.7/site-packages/nltk/stem/porter.pyc in _ends_double_consonant(self, word)
    212         """
    213         return (
--> 214             len(word) >= 2 and
    215             word[-1] == word[-2] and
    216             self._is_consonant(word, len(word)-1)

IndexError: string index out of range

The text was updated successfully, but these errors were encountered:

Diyago · 2017-05-21T19:42:45Z

Link to the SO to the same problem

Diyago · 2017-05-22T06:13:40Z

Please delete the issue, it`s just update problem. My bad

alvations · 2017-05-22T06:22:09Z

@Diyago, if it's convenient, before closing this issue, please tell us what was the update problem so that we can document it just in case another user had the same issue.

Was it something to do with the pip install -U nltk command not upgrading the correct python site-packages?

Diyago · 2017-05-22T06:45:49Z

@alvations Shame on me, rly) I've been using ipython. Usually newly installed library is automatically visible after installation. But the behavior with upgrading is different. I manually restarted the kernel, but I believe old version persisted. Only restarting the notebook fixed the initial problem

alvations · 2017-05-22T06:56:02Z

@Diyago Thank you for documenting the problem.

Diyago changed the title ~~“IndexError: string index out of range” with nltk library~~ “IndexError: string index out of range” with nltk library - do delete May 22, 2017

Diyago changed the title ~~“IndexError: string index out of range” with nltk library - do delete~~ “IndexError: string index out of range” with nltk library - to delete May 22, 2017

Diyago closed this as completed May 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

“IndexError: string index out of range” with nltk library - to delete #1732

“IndexError: string index out of range” with nltk library - to delete #1732

Diyago commented May 21, 2017 •

edited

Diyago commented May 21, 2017

Diyago commented May 22, 2017

alvations commented May 22, 2017 •

edited

Diyago commented May 22, 2017

alvations commented May 22, 2017

“IndexError: string index out of range” with nltk library - to delete #1732

“IndexError: string index out of range” with nltk library - to delete #1732

Comments

Diyago commented May 21, 2017 • edited

Diyago commented May 21, 2017

Diyago commented May 22, 2017

alvations commented May 22, 2017 • edited

Diyago commented May 22, 2017

alvations commented May 22, 2017

Diyago commented May 21, 2017 •

edited

alvations commented May 22, 2017 •

edited