Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pycurl ignores SIGINT when HEADERFUNCTION is used #413

Open
lorien opened this issue Nov 6, 2016 · 8 comments
Open

Pycurl ignores SIGINT when HEADERFUNCTION is used #413

lorien opened this issue Nov 6, 2016 · 8 comments

Comments

@lorien
Copy link
Contributor

lorien commented Nov 6, 2016

Minimal python 3 code to reproduce the problem

import pycurl
from io import BytesIO


while True:
    def debug(type_, data):
        pass

    body = BytesIO()
    curl = pycurl.Curl()
    curl.setopt(pycurl.URL, 'http://yandex.ru/robots.txt')
    curl.setopt(pycurl.WRITEDATA, body)
    curl.setopt(pycurl.DEBUGFUNCTION, debug)
    curl.setopt(pycurl.VERBOSE, 1)
    curl.perform()
    curl.close()
    text = body.getvalue().decode('utf-8')
    assert 'Sitemap: https://yandex.ru/support/sitemap.xml' in text

When I run it and press Ctrl + C the script does not stop working. I have to press Ctrl + C and wait a few second until accidentally one of SIGINT signals is processed correctly. Example of script output:

$ python test.py
^CTraceback (most recent call last):
  File "test.py", line 6, in debug
    def debug(type_, data):
KeyboardInterrupt

...

^C^C^C^CTraceback (most recent call last):
  File "test.py", line 6, in debug
    def debug(type_, data):
KeyboardInterrupt
^CTraceback (most recent call last):
  File "test.py", line 16, in <module>
    curl.close()
KeyboardInterrupt
@p
Copy link
Member

p commented Nov 22, 2016

Thanks for the reproduce code. I suppose the debugfunction callback in pycurl needs to check Python exceptions prior to returning.

The catch here is that libcurl does not allow debug callback to fail - hence I imagine pycurl would need to store the failure flag and fail the overarching perform. This potentially causes its own issues as, for example, the perform might be for a POST/PUT/DELETE which completely succeeded but pycurl would report failure due to the debug callback failing.

This delayed failure may be even more complicated/less feasible for async calls (multi interface).

@lorien
Copy link
Contributor Author

lorien commented Nov 22, 2016

Oleg, is there any other way to get HTTP headers of request?

@p
Copy link
Member

p commented Nov 25, 2016

If you are referring to the headers that libcurl sends by default, I do not recall coming across an API facility to retrieve those.

@lorien
Copy link
Contributor Author

lorien commented Nov 25, 2016

Yeah, the only way I found is using VERBOSE flag and debug callback that receives various debug info including outgoing headers.

@p
Copy link
Member

p commented Dec 2, 2016

Sent this upstream, curl/curl#1151. The situation is rather messy to resolve in pycurl alone.

@lorien
Copy link
Contributor Author

lorien commented Feb 2, 2017

OK, let's forget about DEBUGFUNCTION and VERBOSE. These things are not commonly used.

The HEADERFUNCTION :)

import pycurl
from io import BytesIO
import time


events = {
    'sigint': 0,
    'pycurl.error': 0,
    'ok': 0,
}

while True:
    def header_handler(data):
        return None
    
    try:
        body = BytesIO()
        curl = pycurl.Curl()
        curl.setopt(pycurl.URL, 'http://yandex.ru/robots.txt')
        curl.setopt(pycurl.WRITEDATA, body)
        curl.setopt(pycurl.HEADERFUNCTION, header_handler)
        try:
            curl.perform()
        except pycurl.error:
            events['pycurl.error'] += 1
            print('Got pycurl.error instead of KeyboardInterrupt')
        else:
            events['ok'] += 1
            text = body.getvalue().decode('utf-8')
            assert 'Sitemap: https://yandex.ru/support/sitemap.xml' in text
        print(', '.join('%s: %d' % x for x in events.items()))
    except KeyboardInterrupt:
        events['sigint'] += 1
        # ignore KeyboardInterrupt exception silently
    finally:
        curl.close()

Start the script, then start pressing Ctrl+C. You'll see that pycurl.error counter increasing. On my computer I rarely get KeyboardInterrupt when I press Ctrl+C. Most times it is a pycurl.error

In practice such pycurl behaviour leads to different errors. For example, I can't stop execution of Grab tests because most of my Ctrl+C signals are caught by HEADERFUNCTION and converted into pycurl.error which is not fatal for unittest runner.

That is true for WRITEFUNCTION also.

@p
Copy link
Member

p commented Feb 3, 2017

Yep, agree with you there.

@lorien
Copy link
Contributor Author

lorien commented Feb 4, 2017

I found workaround.
lorien/grab@f26fd30
I collect stderr output during the execution of curl.perform() method. If I see "KeyboardInterrupt" in the collected output then it is a sign that pycurl got SIGINT and converted it to pycurl.error

@lorien lorien changed the title Combination of DEBUGFUNCTION and VERBOSE ignores SIGINT Pycurl ignores SIGINT when DEBUGFUNCTION/HEADERFUNCTION/etc are used Feb 8, 2017
@lorien lorien changed the title Pycurl ignores SIGINT when DEBUGFUNCTION/HEADERFUNCTION/etc are used Pycurl ignores SIGINT when HEADERFUNCTION is used Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants