Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential solution to SSL certificates issue on MacOS #320

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

peircej
Copy link

@peircej peircej commented Mar 12, 2018

On mac the certificates aren't found properly by urlopen()
Can be resolved by certifi.where() as long as that's available. Currently this code isn't providing an error if not (nor added to the requirements)

On mac the certificates aren't found properly by `urlopen()`
Can be resolved by `certifi.where()` as long as that's available. Currently this code isn't providing an error if not (nor added to the requirements)
@peircej
Copy link
Author

peircej commented Mar 12, 2018

fixes #318

@coveralls
Copy link

coveralls commented Mar 12, 2018

Coverage Status

Coverage decreased (-0.04%) to 91.023% when pulling 94eb84d on peircej:master into 5c8c440 on imageio:master.

@almarklein
Copy link
Member

almarklein commented Mar 13, 2018

I don't feel comfortable with Mac users installing imageio and then still seeing a warning every time that imageio gets imported, unless they install another package.

I'd say we either make certifi a dependency, or display the warning only when urlopen is actually used (and at most once).

@jni here's a chance to help by weighing in on this decision :D

@almarklein
Copy link
Member

For reference, urlopen is used:

  • When the user uses the imageio.plugins.ffmpeg.download() function.
  • When the user reads an image from http, e.g. imageio.imread('http://github.com/favicon.ico')
  • When the user reads an imagio standard image that was not packaged or downloaded earlier, e.g. imageio.imread('imageio:clock.png')

@peircej
Copy link
Author

peircej commented Mar 13, 2018

Yes, I understand that. Actually, this isn't quite working anyway (for a bizarre reason that I don't understand)...

I tested the method by monkey patching your urllib (which is what I'll probably do to get around this in the psychopy package that I write) and that worked fine:

from imageio.core import util
import certifi

def urlopen(*args, **kwargs):
    """ Compatibility function for the urlopen function. Raises an
    RuntimeError if urlopen could not be imported (which can occur in
    frozen applications.
    """
    try:
        import urllib2
        from urllib2 import urlopen
    except ImportError:
        try:
            from urllib.request import urlopen, URLError  # Py3k
        except ImportError:
            raise RuntimeError('Could not import urlopen.')
    if sys.platform=='darwin':
        import certifi
        r = urlopen(cafile=certifi.where(), *args, **kwargs)
    else:
        r = urlopen(*args, **kwargs)
    return r
    
import imageio
util.urlopen = urlopen
imageio.plugins.ffmpeg.download()

BUT when I then copy and paste that exact same code into imageio.core.util (as in the pull request) it no longer works!! I have no idea what that's about

@peircej
Copy link
Author

peircej commented Mar 13, 2018

OK, I did have to resort to using requests package to get this done reliably. The code below replaces _fetch_file() instead of altering url_open. Obviously it's a much bigger change than I was aiming for before, so I haven't done the whole job and made the PR.

Just to say though that switching to use the requests package would help in various ways because:

  • it becomes Py3/Py2 compatible without you having to jump through extra hoops
  • it's more robust then the multiple built-in options
  • it handles streaming chunk iteration for you (no need for your additional functions to do that)

So, it would be a fair bit of refactoring, but I think you'd get something more robust by having the right tool rather than writing requests code yourself! :-)

For my own purposes I'll use a monkey patch, so I don't mind either way!

def _fetch_file(url, file_name, print_destination=True, chunk_size=8192):
    """Load requested file, downloading it if needed or requested

    Parameters
    ----------
    url: string
        The url of file to be downloaded.
    file_name: string
        Name, along with the path, of where downloaded file will be saved.
    print_destination: bool, optional
        If true, destination of where file was saved will be printed after
        download finishes.
    resume: bool, optional
        If true, try to resume partially downloaded files.
    """
    # Adapted from NISL:
    # https://github.com/nisl/tutorial/blob/master/nisl/datasets.py
    
    print('Imageio: %r was not found on your computer; '
          'downloading it now.' % os.path.basename(file_name))
    
    temp_file_name = file_name + ".part"
    local_file = None
    initial_size = 0
    errors = []
    with requests.get(url, timeout=5.0, stream=True) as remote_file:
        file_size = int(remote_file.headers['Content-Length'].strip())
        size_str = _sizeof_fmt(file_size)
        print('Try %i. Download from %s (%s)' % (tries+1, url, size_str))
        #create progress bar
        progress = StdoutProgressIndicator('Downloading')
        progress.start('', 'bytes', file_size)
        with open(temp_file_name, 'wb') as fd:
            for chunk in remote_file.iter_content(chunk_size=chunk_size):
                fd.write(chunk)
                progress.increase_progress(len(chunk))
    
    shutil.move(temp_file_name, file_name)
    if print_destination is True:
        sys.stdout.write('File saved as %s.\n' % file_name)
    
    progress.finish('Done')

@almarklein
Copy link
Member

Would it be possible to use requests when available (preferably in util.urlopen()) similar to how you optionally used certifi? If we then go with a warning on osx only when urlopen is used and requests is not available, I think we should be fine.

@peircej
Copy link
Author

peircej commented Mar 13, 2018

I did it in _fetch_file because of the file streaming aspect (so you still get the progress bar). I don't think there's any way for it to be a simple drop-in replacement for urlopen() though as the response object it returns has different attribs. The output from urlopen has a read() method and the requests response object has contents or iter_contents() depending whether you're streaming

@almarklein
Copy link
Member

Ah, I see. Can we talk about that original idea of using certifi for a bit?

Actually, this isn't quite working anyway (for a bizarre reason that I don't understand)...

Do you have more on that?

OK, I did have to resort to using requests package to get this done reliably

Can you expand on that? Could it be related to urlopen on Py2k not having a cafile argument?

(I'm planning on doing a release of imageio this week. With a bit of luck we can get a fix for this in ...)

@peircej
Copy link
Author

peircej commented Mar 14, 2018

Actually, this isn't quite working anyway (for a bizarre reason that I don't understand)...

Do you have more on that?

Not sure what else I can tell you. As I say, when I used a monkey-patch to replace your urlopen function the SSL errors went away. But when I moved the same certifi.where() code into your own function (as in the pull request) the SSL errors returned! I don't understand it. Best guess is that something else had subsequently changed the certificates or the value returned by certifi.where(). But it's beyond my pay grade ;-)

OK, I did have to resort to using requests package to get this done reliably

Can you expand on that? Could it be related to urlopen on Py2k not having a cafile argument?

No, the cafile arg did exist in Py>2.7.9. I'm just saying that it seemed not to be very consistent (sometimes it worked and sometimes not and I couldn't work out why). The error wasn't about an incorrect call or anything; just that the certificates still weren't being used as expected so the SSL errors returned.

@peircej
Copy link
Author

peircej commented Mar 14, 2018

By the way, I realised I had copy/pasted an error in the requests solution posted above (the file copy step at the end was inside the write-chunk loop by mistake). That's now fixed

@almarklein
Copy link
Member

Thanks for the update!

@jni
Copy link
Contributor

jni commented Mar 15, 2018

Thanks for the ping @almarklein!

I'm over my head here, but from a usability perspective, I'd certainly like the ability to pass quiet=True to silence all the printing and progress bars. In terms of the warnings, I agree that they should only happen when a web request is made, never at import time.

@peircej so are you saying that the certifi solution:

  • works on Python 3+
  • sporadically works on Python 2?

Personally I'm ok with this state of affairs. 😈 😛

@almarklein
Copy link
Member

so are you saying that the certifi solution: [...] works on Python 3+ [...] sporadically works on Python 2?

No, that was just a guess of mine. Apparently, it just works intermittently ... I am hesitant to a solution that adds a requirement on requests, and the certifi-only solution seems not to work reliably. Since @peircej noted that he is fine with a monkey patch (at least for now), let's hold this off and see if other people run into similar issues. Or maybe the certification thing will be fixed in Python itself.

@scivision
Copy link
Contributor

I would respectfully start by adding an osx test to Travis-CI using Miniconda and Python 2.7, to see if @peircej issue is replicated. It could be a Python 2.7 point release issue, where they were backporting security fixes.

I.e. if the latest Python 2.7.x release doesn't have this issue on Mac OSX, maybe we respectfully ask Mac OS Python 2.7 users to update to the latest 2.7.x release, as we don't want to workaround things already fixed in latest CPython 2.7.x

@almarklein
Copy link
Member

This wont be longer necessary for ffmpeg. Though it might still apply to the downloading of standard images I suppose?

@EnyMan EnyMan mentioned this pull request Jan 22, 2020
@almarklein
Copy link
Member

almarklein commented Feb 18, 2020

More ppl are running into this, e.g. #500. I think I'm ready to add requests as a dependency if that resolves things. Anyone willing to make a PR, or revive this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants