Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp proxy support #259

Open
jpkeith4 opened this issue Sep 13, 2017 · 9 comments
Open

aiohttp proxy support #259

jpkeith4 opened this issue Sep 13, 2017 · 9 comments

Comments

@jpkeith4
Copy link

jpkeith4 commented Sep 13, 2017

There seems to be no support for proxies when communicating with external storage services such as Google Drive. In the file: waterbutler/core/provider.py for instance I can see aiohttp being used to make a request:

response = await aiohttp.request(method, url, *args, **kwargs)

but aiohttp doesn't have built-in support for using proxies, so it must manually be coded in like:

response = await aiohttp.request(method, url, connector=aiohttp.ProxyConnector(proxy="PROXY_ADDRESS"), *args, **kwargs)

just as an example. This seems to allow an initial connection to be created from behind a proxy, however it seems to fail shortly after so I'm guessing either there are more places in the code that also need proxy support or the above example isn't the most correct way to do it.
The error given on a failed requests (when using the ProxyConnector example above) is:

RuntimeError: File descriptor 10 is used by transport <_SelectorSocketTransport fd=10 read=polling write=<idle, bufsize=0>>

Ideally I suppose the best solution would be to read environment variables such as $HTTP_PROXY and $NO_PROXY to decide whether to apply proxies or not. Although I'm not sure why the aiohttp library isn't already doing this itself as its fairly standard.

@jpkeith4
Copy link
Author

It seems to me that it might be caused by creating the ProxyConnector from within "async" functions.
If I try to move it outside the asynchronous calls and pass the connector in as a parameter the error occurs less frequently. However, it looks like the async calls are fairly nested.
Any idea where would be a good place in the code to create the ProxyConnector outside of any async calls?
I will keep looking.

@jpkeith4
Copy link
Author

I think I have found a solution. The problem was indeed creating the ProxyConnector inside async calls. To workaround this I simply made a new module which declares the ProxyConnector and import it into all of the relevant python modules using aiohttp

@felliott
Copy link
Member

Hey @jpkeith4!

Did you have to modify the aiohttp.request calls for this or were you able to wrap and inject them externally? If the latter, that sounds like an interesting approach to customizing WaterButler. Would you be willing to write up a doc patch explaining how to do this?

Cheers,
@felliott

@jpkeith4
Copy link
Author

I added a "connector" parameter to all the aiohttp.request calls. I can write up the changes as soon as I really have it all worked out. Currently it seems to be working as long as you don't use folders inside google drive. I seem to be having an issue with files inside of folders wanting to use "%2F" instead of forward slashes in the URI causing it to not load. If I manually change it then it loads.

Although I wonder if that has anything to do with using a version of the OSF that's several months old, maybe something changed in a dependency or the API google uses.

@felliott
Copy link
Member

Hey @jpkeith4!

Can you give me an example of where you're running into this? Our googledrive provider definitely has some issues with encoding. We mostly work around this by using the file/folder IDs where we can.

Cheers,
Fitz

@jpkeith4
Copy link
Author

jpkeith4 commented Sep 25, 2017

[EDIT] Actually now that I think about it, this issue might be more related to OSF than waterbutler, however the proxy issue earlier was waterbutler related [/EDIT]

For example, just trying to open a text file stored in Google Drive. If the textfile is not inside any folder but just at the lowest level in Google Drive then OSF loads and renders it properly. However if it is located inside a folder in Google Drive and I click to access it then there is a 404 and the URI shows

/93vtb/files/googledrive/folderName%2FfileName

If I manually change it to

/93vtb/files/googledrive/folderName/fileName

in the address bar then it loads the page and displays the file.

@jpkeith4
Copy link
Author

jpkeith4 commented Sep 29, 2017

Well for the time being I'll post the specific changes I did, although it still doesn't completely work. The reason I did it this way is because adding a proxy connector inside the async calls seems to cause issues, so the idea was basically just to create the proxy connector outside of async calls and since I don't know the code very well it seemed like importing a module would be the simplest way. So I just created the file waterbutler/proxy_workaround.py with the contents:

import aiohttp

HTTP_PROXY = 'http://my.proxy.com:PORT'
NO_PROXY = ['www.no_proxy_domain_1.com', 'www.no_proxy_domain_2.com']

noproxy_connector = aiohttp.TCPConnector(force_close=False)
proxy_connector = aiohttp.ProxyConnector(proxy=HTTP_PROXY, force_close=False)

def get_connector(url):
    global proxy_connector
    global noproxy_connector

    domain = url.replace('https://','').replace('http://','').split('/')[0].split(':')[0]

    if domain in NO_PROXY:
        return noproxy_connector
    else:
        return proxy_connector

Then I go to every file which makes a "aiohttp.request" call and at the top I add from waterbutler import proxy_workaround and inside the aiohttp.request calls themselves I add the parameter connector=proxy_workaround.get_connector(URL) where URL is the same url already being supplied to the request.

This mostly seems to solve the issues and allow storage services to connect, file uploads work, deleting files works, but there is an issue with renaming the files. When attempting to rename a file I get the error: RuntimeError: Task <Task pending coro=<move() running at /code/waterbutler/core/provider.py:232> cb=[_run_until_complete_cb() at /usr/local/lib/python3.5/asyncio/base_events.py:176]> got Future <Future pending> attached to a different loop
Full traceback: https://pastebin.com/wzmRG5QV

@felliott
Copy link
Member

Hey @jpkeith4,

This is a shot in the dark, but it looks like the *Connector caches the current event loop when created. Long-running requests in WB (usually moves and copies) get shunted off into a celery queue and aren't guaranteed to have the same event loop when they're woken up. You may need to adjust this approach so that the Connector is created lazily, so that it gets the event loop it's run under, rather than the one it was created under.

Cheers,
@felliott

@jpkeith4
Copy link
Author

jpkeith4 commented Oct 12, 2017

Got it working, your suggestion helped a lot actually, thanks.
So I think the only changes that are necessary to add proxy support now are to just add either the TCPConnector or ProxyConnector accordingly to any aiohttp.request calls, and also pass into the connector "loop=asyncio.get_event_loop()". For example:

response = await aiohttp.request(method, url, *args, connector=proxy_workaround.get_connector(url, asyncio.get_event_loop()),  **kwargs)

Where the "proxy_workaround.py" module is:

import aiohttp

HTTP_PROXY = 'http://my.proxy.com:PORT'
NO_PROXY = ['www.no_proxy_domain_1.com', 'www.no_proxy_domain_2.com']

def get_connector(url, loop):

    domain = url.replace('https://','').replace('http://','').split('/')[0].split(':')[0]

    if domain in NO_PROXY:
        return aiohttp.TCPConnector(force_close=False, loop=loop)
    else:
        return aiohttp.ProxyConnector(proxy="http://proxyout1.lanl.gov:8080", force_close=False, loop=loop)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants