Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclosed file in stopwords corpora #1928

Closed
iliaschalkidis opened this issue Jan 3, 2018 · 11 comments
Closed

Unclosed file in stopwords corpora #1928

iliaschalkidis opened this issue Jan 3, 2018 · 11 comments

Comments

@iliaschalkidis
Copy link

/Users/kiddo/anaconda/lib/python3.6/site-packages/nltk/corpus/reader/wordlist.py:28: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/kiddo/nltk_data/corpora/stopwords/english'>
return concat([self.open(f).read() for f in fileids])

That's a warning that I found on debugging mode. I thought that maybe you would like to fix that before the next release.

@maykulkarni
Copy link

Hi, can I take this issue?

@sks4903440
Copy link

sks4903440 commented Jan 21, 2018

@iliaschalkidis @alvations How can I reproduce the warning on linux?

@iliaschalkidis
Copy link
Author

iliaschalkidis commented Jan 22, 2018

@sks4903440 Using version 3.2.5, you may try to run the following script in your command line:

test.py

import warnings
import nltk
warnings.filterwarnings('error', category=ResourceWarning)
stop_words = nltk.corpus.stopwords.words('english')

$ python test.py

You should get:

ResourceWarning: unclosed file <_io.BufferedReader name='/Users/kiddo/nltk_data/corpora/stopwords/english'>

@sks4903440
Copy link

Fixed in #1945

@alvations
Copy link
Contributor

alvations commented Jan 26, 2018

Hmmm.. Inheriting the io.BufferedReader to the StreamCorpusReader is an interesting solution but maybe closing the file properly with context managers with scope might be a better fix.

And I think Python3.6 has some special requirements for files that are different from previous versions. We have to read the change log from CPython to be sure what we're doing is not just a bandaid =)

@sks4903440
Copy link

@alvations Using with would surely be a good idea. I would try to incorporate that. Since in CPython, the garbage collector automatically closes the file after zero reference counts, I had not used that. Also for with statement to work, we will have to use io.BufferedReader or implement __enter__ and __exit__ methods. What do you think is better?

@alvations
Copy link
Contributor

alvations commented Jan 30, 2018

I think we don't have to implement the enter/exit methods since we'll not be inheriting from the BufferedReader but using the context to open and close and then let handle io module handle the gc (garbage collection).

This is tricky, the io.BufferedReader already has the seek() like function and when SeekableUnicodeStreamReader inherits from that without doing any super __init__(), I'm not exactly sure what it's taking from BufferedReader.

And actually, we can't really wrap the with inside the read() because that'll prevent seek and tell functions from work unless we hack the buffer within the with context. Hmm...

@annargrs
Copy link

Any news on this? Python 3.6 still complains about NLTK 3.3, on pretty much every resource:

/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1107: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/lexnames'>
  for i, line in enumerate(self.open('lexnames')):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adj'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adv'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.noun'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.verb'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adj.exc'>
  for line in self.open('%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adv.exc'>
  for line in self.open('%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/noun.exc'>
  for line in self.open('%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/verb.exc'>

@purificant
Copy link
Member

Proposed fix: #2165

@gmotzespina
Copy link

Another issue that seems that it has been completed, it might be good to close the issue.

@alvations
Copy link
Contributor

Thanks everyone for raising the issue and @purificant for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants