-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panlex_lite installation via nltk.download() appears to fail #1338
Comments
@grayben – would you please install the current version of NLTK and report if you still have this issue? |
@stevenbird sorry for my delay in replying - you know how uni assignments can be! |
@grayben How did you install NLTK? Do you have an error when downloading a single corpus, e.g. |
|
Additional information: a number of my classmates have reported what appears to be the same problem, though I can't comment on their configurations or exactly what they did to encounter the issue. |
@grayben could you run the following lines of code and see whether you get the same >>> import zipfile
>>> plzip = '/Users/beng/nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474] |
On the command line outside python, what is the output for the following?:
|
Your code -> my output: >>> import zipfile
>>> plzip = ' /Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1009, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: ' /Users/beng//nltk_data/corpora/panlex_lite.zip' I then changed >>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
self._RealGetContents()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file |
|
This suggests that when downloading, the file gets corrupted (possibly due to broken internet connection): >>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
self._RealGetContents()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file Go to |
I did the following (three times):
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /Users/beng/nltk_data...
[nltk_data] Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
>>> However, please also note the following command input/output: >>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474] |
Can you also do i.e.: $ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474] I couldn't reproduce your alvas@ubi:~/nltk_data/corpora$ ls panlex_
panlex_lite.zip panlex_swadesh.zip
alvas@ubi:~/nltk_data/corpora$ cd
alvas@ubi:~$ python
Python 2.7.11 (default, Dec 15 2015, 16:46:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nltk.download('panlex_lite')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'nltk' is not defined
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /home/alvas/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True
>>> exit()
alvas@ubi:~$ python3
Python 3.5.1 (default, Dec 18 2015, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /home/alvas/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True BTW, if you're not going to use |
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /Users/beng/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True Furthermore, through the downloader GUI, downloading "all" finally succeeds, with all fields marked "installed". |
Great! So there's no Enjoy playing around NLTK! Tell your friends/classmates to do the same too: $ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474] |
Thanks! |
I get the exact same problem with latest NLTK 3.2.1 both on Ubuntu 16.04 (which crashes my whole OS) and on OSX I get the same errors as OP. I'm surprised that this case has been closed as if there was nothing wrong with it. When trying the workaround it fails after this step, as it tried to extract it automatically right after downloading it:
Thanks |
@houmie what is your output for: $ rm /Users/houmie//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/houmie//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/houmie//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474] |
This is not fixed - it's happening for python 2.7, 3.4.3, and 3.5.1. The panlex_lite download hangs for quite a while, and then unzipping freezes the GUI and/or causes the OSError. |
I hit the same issue on my Macbook Pro with (OS X EI Capitain, Anaconda 1.4.0+python 3.5.2) and I tried NLTK version on both "conda install nltk" with 3.2.1 and "sudo python3 setup.py install" with github master branch. The interesting part is that I never got the CRC [0, 448887900, 85839474] but [0, 448887900, 84607019] always after I tried to download panlex_lite.zip more than 5 times. Any hint or clue? |
Unfortunately they refuse the problem would even exist. I reported this in May 2016 and still no acknowledgement of the problem. I just tried it again via the GUI download and still get this error message shown in the console:
This is a massive pain to me, as I need to go through the code and delete all the references to Pantex in order to get the packages working. |
Hi, same here, hopefully if enough people report it then it's going to get fixed at some point ... |
okay there, here's what I've done d = nltk.downloader.Downloader()
d._packages.pop('panlex_lite')
d.download()
# error message
d._packages.pop('panlex_lite')
/usr/local/lib/python3.5/site-packages/nltk/downloader.py in info(self, id)
876 if id in self._packages: return self._packages[id]
877 if id in self._collections: return self._collections[id]
--> 878 raise ValueError('Package %r not found in index' % id)
879
880 def xmlinfo(self, id): I guess, we could add something like But, as for me, the easiest way looks like this:
Aaaaaaand.... |
I'd like to understand that you mention that
that means
=>
|
@demidovakatya, |
Downloading panlex_lite should work fine now |
Again not working. |
I don't have bandwidth to test this. Our nltk_data page points at the April 1 version, which was not touched when the May 1 version was added recently. @kamholz: would you mind doing the following to check if it still works please? |
Sorry this keeps happening. It's hard to debug, because I often can't reproduce the reported errors. In this case, when I run <package author="David Kamholz" checksum="3156099b9acb623725d63c727fd8591d" id="panlex_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2357864277" subdir="corpora" unzip="1" unzipped_size="5993562112" url="https://db.panlex.org/panlex_lite-20170401.zip" webpage="http://panlex.org/" /> I have also updated the URL above (but that shouldn't have made a difference for this issue, since the old one redirects), and the sizes. |
Thanks for this @kamholz . I've pushed a corrected index file using these checksums. |
I tried: python -m nltk.downloader -u https://gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml all and other url but all forbidden http 403 error. Any suggestions or new url that will work? |
@SokhnaVor this is caused by #1787 |
@alvations thanks! I see: |
Platform: Python 3.5 on Mac OS X 10.11.2
Steps to reproduce:
Symptoms:
Partial console write:
[nltk_data] | Downloading package panlex_lite to
[nltk_data] | /Users/beng/nltk_data...
[nltk_data] | Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 543, in incr_download
for msg in self.incr_download(info.children, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 529, in incr_download
for msg in self._download_list(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 572, in _download_list
for msg in self.incr_download(item, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
The text was updated successfully, but these errors were encountered: