tarfile is not able to extract Enron dataset #310

pg30 · 2020-12-25T15:50:19Z

In lesson 1, after running startup.py and downloading dataset, it gives the following error while extracting it

IOError: [Errno 22] invalid mode ('wb') or filename: './maildir/blair-l/personnel___promotions/1.'

I think this is because the files are ending with a dot.

MedAmr · 2020-12-26T21:44:18Z

can you post a whole screenshot of your issue?

pg30 · 2020-12-26T21:54:51Z

Sure, This is what happens when i run startup.py after the file is downloaded.

checking for nltk
checking for numpy
checking for scipy
checking for sklearn
unzipping Enron dataset (this may take a while)
Traceback (most recent call last):

  File "/media/pranay/PG/PRANAY/udacity courses/Intro to machine learning/ud120-projects/tools/startup.py", line 45, in <module>
    tfile.extractall(".")

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2024, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2065, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2137, in _extract_member
    self.makefile(tarinfo, targetpath)

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2178, in makefile
    with bltn_open(targetpath, "wb") as target:

OSError: [Errno 22] Invalid argument: './maildir/blair-l/personnel___promotions/1.'

pg30 · 2020-12-26T22:04:50Z

Basically according to me the problem is in the filenames ending with dot (like 1. , 2.) because i renamed (1. to 1) and when extracted that single file, it got extracted successfully except for others with dot but this is impossible to do for all the files.
I am running ubuntu 20.04

trsvchn · 2021-01-15T20:29:36Z

Hi! @pg30 I've ported this ud120 code to python3.8 and jupyter, I've reorganized most of components (e.g. startup script) and fixed some issues. Feel free to take a look at my fork trsvchn/ud120-projects-py3-jupyter.

olabod67 · 2021-03-24T05:34:28Z

Hi,
Can someone assist with the problem I'm having running startup.py?
I'm currently running it on python 3.8 and find the screen shot of the error message below:

checking for nltk
checking for numpy
checking for scipy
checking for sklearn
downloading the Enron dataset (this may take a while)
to check on progress, you can cd up one level, then execute <ls -lthr>
Enron dataset should be last item on the list, along with its current size
download will complete at about 423 MB

AttributeError Traceback (most recent call last)
in
33 import urllib
34 url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tar.gz"
---> 35 urllib.urlretrieve(url, filename="../enron_mail_20150507.tar.gz")
36 print("download complete!")
37

AttributeError: module 'urllib' has no attribute 'urlretrieve'

olabod67 · 2021-03-26T04:49:02Z

Please, I need the community assistance on this. Can you help?

AkhileshManda · 2021-05-12T07:59:19Z

olabod67 I am getting the same error. Please let me know if you have found a solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tarfile is not able to extract Enron dataset #310

tarfile is not able to extract Enron dataset #310

pg30 commented Dec 25, 2020 •

edited

MedAmr commented Dec 26, 2020

pg30 commented Dec 26, 2020

pg30 commented Dec 26, 2020

trsvchn commented Jan 15, 2021 •

edited

olabod67 commented Mar 24, 2021

olabod67 commented Mar 26, 2021

AkhileshManda commented May 12, 2021

tarfile is not able to extract Enron dataset #310

tarfile is not able to extract Enron dataset #310

Comments

pg30 commented Dec 25, 2020 • edited

MedAmr commented Dec 26, 2020

pg30 commented Dec 26, 2020

pg30 commented Dec 26, 2020

trsvchn commented Jan 15, 2021 • edited

olabod67 commented Mar 24, 2021

olabod67 commented Mar 26, 2021

AkhileshManda commented May 12, 2021

pg30 commented Dec 25, 2020 •

edited

trsvchn commented Jan 15, 2021 •

edited