Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile is not able to extract Enron dataset #310

Open
pg30 opened this issue Dec 25, 2020 · 7 comments
Open

tarfile is not able to extract Enron dataset #310

pg30 opened this issue Dec 25, 2020 · 7 comments

Comments

@pg30
Copy link

pg30 commented Dec 25, 2020

In lesson 1, after running startup.py and downloading dataset, it gives the following error while extracting it

IOError: [Errno 22] invalid mode ('wb') or filename: './maildir/blair-l/personnel___promotions/1.'

I think this is because the files are ending with a dot.

@MedAmr
Copy link

MedAmr commented Dec 26, 2020

can you post a whole screenshot of your issue?

@pg30
Copy link
Author

pg30 commented Dec 26, 2020

Sure, This is what happens when i run startup.py after the file is downloaded.

checking for nltk
checking for numpy
checking for scipy
checking for sklearn
unzipping Enron dataset (this may take a while)
Traceback (most recent call last):

  File "/media/pranay/PG/PRANAY/udacity courses/Intro to machine learning/ud120-projects/tools/startup.py", line 45, in <module>
    tfile.extractall(".")

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2024, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2065, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2137, in _extract_member
    self.makefile(tarinfo, targetpath)

  File "/home/pranay/anaconda3/lib/python3.8/tarfile.py", line 2178, in makefile
    with bltn_open(targetpath, "wb") as target:

OSError: [Errno 22] Invalid argument: './maildir/blair-l/personnel___promotions/1.'

@pg30
Copy link
Author

pg30 commented Dec 26, 2020

Basically according to me the problem is in the filenames ending with dot (like 1. , 2.) because i renamed (1. to 1) and when extracted that single file, it got extracted successfully except for others with dot but this is impossible to do for all the files.
I am running ubuntu 20.04

@trsvchn
Copy link

trsvchn commented Jan 15, 2021

Hi! @pg30 I've ported this ud120 code to python3.8 and jupyter, I've reorganized most of components (e.g. startup script) and fixed some issues. Feel free to take a look at my fork trsvchn/ud120-projects-py3-jupyter.

@olabod67
Copy link

Hi,
Can someone assist with the problem I'm having running startup.py?
I'm currently running it on python 3.8 and find the screen shot of the error message below:

checking for nltk
checking for numpy
checking for scipy
checking for sklearn
downloading the Enron dataset (this may take a while)
to check on progress, you can cd up one level, then execute <ls -lthr>
Enron dataset should be last item on the list, along with its current size
download will complete at about 423 MB

AttributeError Traceback (most recent call last)
in
33 import urllib
34 url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tar.gz"
---> 35 urllib.urlretrieve(url, filename="../enron_mail_20150507.tar.gz")
36 print("download complete!")
37

AttributeError: module 'urllib' has no attribute 'urlretrieve'

@olabod67
Copy link

Please, I need the community assistance on this. Can you help?

@AkhileshManda
Copy link

olabod67 I am getting the same error. Please let me know if you have found a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants