Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curl.perform() blocks SIGINT during the start of a SOCKS transfer #706

Open
fsbs opened this issue Oct 7, 2021 · 10 comments
Open

Curl.perform() blocks SIGINT during the start of a SOCKS transfer #706

fsbs opened this issue Oct 7, 2021 · 10 comments

Comments

@fsbs
Copy link
Contributor

fsbs commented Oct 7, 2021

Curl.perform() and CurlMulti.perform() can't be interrupted during DNS stage when a "socks5h://" proxy is set - i.e. when domain name is remotely resolved.

At the same time it is possible to interrupt in the following cases:

  • torsocks + pycurl-without-proxy
  • curl --proxy "socks5h://..."

You can try running the following examples, but you should hit Ctrl+C immediately to see the difference.

Example 1: pycurl.PROXY [not interruptible]

import pycurl
import random

# use new circuit each time to prevent caching by tor
PROXY = f'socks5h://pycurl:{random.randint(0, 1024)}@127.0.0.1:9050'

# onion.debian.org
URL = 'http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/'

c = pycurl.Curl()
c.setopt(pycurl.VERBOSE, 1)
c.setopt(pycurl.PROXY, PROXY)  # or pycurl.PRE_PROXY
c.setopt(pycurl.URL, URL)

def easy():
    c.perform()

def multi():
    m = pycurl.CurlMulti()
    m.add_handle(c)
    while m.perform()[1]:
        m.select(1.0)

#easy()
multi()

Example 2: torsocks + pycurl [interruptible]

Same as above, but without the c.setopt(pycurl.PROXY, PROXY) line. Run with torsocks:

torsocks --isolate python3 example2.py

This is interruptible, probably because pycurl isn't communicating to the socks proxy on its own, instead that is delegated to the torsocks wrapper without pycurl knowing anything about it. So the above issue is probably located in how pycurl handles proxies.

Example 3: curl --proxy [interruptible]

#!/bin/bash
PROXY="socks5h://curl:$(($RANDOM % 1024))@127.0.0.1:9050"

URL='http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/'

curl --proxy "$PROXY" --verbose "$URL"

This is the same as the first example, but with curl instead of pycurl. It is interruptible like expected, so the issue doesn't go as deep as the libcurl level - it is pycurl-specific.

Versions

  • pycurl:
    PycURL/7.44.1 libcurl/7.68.0 GnuTLS/3.6.13 zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
  • curl:
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
  • python:
    Python 3.8.10
@swt2c
Copy link
Contributor

swt2c commented Oct 8, 2021

Does it make any difference if you set the NOSIGNAL option?

@fsbs
Copy link
Contributor Author

fsbs commented Oct 10, 2021

@swt2c No, that has no effect.

Same issue also with socket_action.

Might not be specifically related to remotely resolving domains - it's just that this is part of the first stage of a SOCKS connection.

As far as I can see pycurl delegates SOCKS logic to libcurl anyway, but this issue could be caused by pycurl's GIL logic instead. It's possible libcurl is invoking some additional callbacks when you use a proxy since additional steps are necessary to set up such a connection. Such callbacks is where the problems with GIL logic can occur, since you get nested calls leading back to pycurl at a different spot where wrong assumptions can be made about GIL state.

Note also that libcurl invokes callbacks mostly at the very start of a transfer, which is when this issue occurs.

@fsbs
Copy link
Contributor Author

fsbs commented Oct 12, 2021

My bad, there was a regression regarding SOCKS proxies in libcurl itself, already fixed in 7.71.0: curl/curl#5710 (comment)

Those fixes take care of multi.perform and multi.socket_action. (I'll also note they also fixed socket_action being a blocking call during SOCKS kickstart, so SOCKS transfers now play nice with async event loops.)

However easy.perform() still blocks SIGINT when a SOCKS proxy is used. Not important for me personally, but I'll leave the issue open and change the title accordingly.

@fsbs fsbs changed the title Remote domain name resolution (via socks proxy) blocks signals Curl.perform() blocks SIGINT during the start of a SOCKS transfer Oct 12, 2021
@swt2c
Copy link
Contributor

swt2c commented Oct 12, 2021

However easy.perform() still blocks SIGINT when a SOCKS proxy is used. Not important for me personally, but I'll leave the issue open and change the title accordingly.

Are you sure that it is pycurl doing that and not libcurl?

@fsbs
Copy link
Contributor Author

fsbs commented Oct 12, 2021

@swt2c Doesn't seem so because of example 3, which doesn't block SIGINT. CLI curl uses curl_easy_perform by default, unless it's run with --parallel. But it probably sets some additional easy opts compared to example 1, so I can't say for certain.

When I have some spare time I'll rewrite example 1 in libcurl.

@fsbs
Copy link
Contributor Author

fsbs commented Nov 3, 2021

Here's a libcurl example and its pycurl equivalent, setting a SOCKS proxy via CURLOPT_PROXY.

The libcurl one terminates immediately on SIGINT. The pycurl one raises KeyboardInterrupt only after perform() returns.

libcurl

#include <curl/curl.h>
#include <stdio.h>

int main(void)
{
    printf("%s\n", curl_version());
    CURL *curl = curl_easy_init();

    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
    curl_easy_setopt(curl, CURLOPT_URL, "http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/");
    curl_easy_setopt(curl, CURLOPT_PROXY, "socks5h://127.0.0.1:9050");

    /* doesn't block SIGINT */
    curl_easy_perform(curl);

    curl_easy_cleanup(curl);
    return 0;
}

pycurl

import pycurl

print(pycurl.version)
curl = pycurl.Curl()

curl.setopt(pycurl.VERBOSE, 1)
curl.setopt(pycurl.URL, 'http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/')
curl.setopt(pycurl.PROXY, 'socks5h://127.0.0.1:9050')

# blocks SIGINT
curl.perform()

curl.close()

I've seen this happen only when a SOCKS proxy is set via pycurl.

Some other means of proxifying the same Python script doesn't have this issue, for example when removing the pycurl.PROXY line and using torsocks wrapper instead: torsocks python3 example.py.

I've also tested with different SSL libraries (openssl, gnutls, nss) when building pycurl and the above libcurl example, and it made no difference.

@p
Copy link
Member

p commented Jan 11, 2022

If the wait is inside libcurl then I can suggest experimenting with the NOSIGNAL option and trying a non-blocking dns resolver (c-ares/threaded?).

@fsbs
Copy link
Contributor Author

fsbs commented Jan 20, 2022

NOSIGNAL and non-blocking resolver make no difference on my end. I'm double-checking the presence of async resolver with:

print('ASYNCHDNS:', pycurl.version_info()[4] & pycurl.VERSION_ASYNCHDNS)

I also tried removing whatever setopts pycurl does internally as well as the GIL code in do_curl_perform() (BEGIN/END_ALLOW_THREADS), and SIGINT still doesn't interrupt curl_easy_perform().

I can't see any other spot that could cause this issue. Could someone else test this with a SOCKS proxy?

@bagder
Copy link
Member

bagder commented Jan 31, 2022

I suggest trying a modern libcurl version where the SOCKS connect procedure has been remade to be totally non-blocking.

@fsbs
Copy link
Contributor Author

fsbs commented Jan 31, 2022

@bagder
The issue affects pycurl only, not libcurl. What I described in #706 (comment) is still the case in the latest libcurl (7.81.0) and latest pycurl (7.44.1) release: interrupting curl_easy_perform() works in libcurl but not in pycurl. The pycurl build I tested with is based on the latest libcurl release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants