Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29 #748

Open
citrus-it opened this issue Mar 15, 2022 · 12 comments
Open
Labels

Comments

@citrus-it
Copy link

Hi,

I have upgraded pycurl from 7.44.0 to 7.45.1 and am seeing the above assertion fail when running the testsuite for a large python application that uses pycurl. I am still investigating and trying to write a small reproducer, and I'll attach one once I can, but I'm opening this in case it's immediately obvious to the maintainers or if anyone else is seeing it since upgrading.

What happened?

When running the testsuite with pycurl.verbose, this is what I see:

,---------------------------------------------------------------------
| $   /usr/bin/python3 /tmp/ips.test.12726/fakeroot/pkg -R /tmp/ips.test.12726/0/image0 -D plandesc_validate=1 -D manifest_validate=Always -D smf_cmds_dir=/tmp/ips.test.12726/0/smf_cmds search -o action.name -a '<example_path>'
| *   Trying 127.0.0.1:12001...
| * Connected to localhost (127.0.0.1) port 12001 (#0)
| > GET /versions/0/ HTTP/1.1
| Host: localhost:12001
| User-Agent: pkg/1a6ee32f8 (sunos i86pc; 5.11 omnios-master-57a305f45c; full; pkg)
| Accept: */*
| X-IPkg-UUID: 089741e2-a468-11ec-9b51-820820101cb6
| X-IPkg-Variant: global,full
|
| * Mark bundle as not supporting multiuse
| < HTTP/1.1 200 OK
| < Content-Type: text/plain;charset=utf-8
| < Server: CherryPy/18.6.1
| < Date: Tue, 15 Mar 2022 13:58:57 GMT
| < Cache-Control: must-revalidate, no-transform, max-age=300
| < Expires: Tue, 15 Mar 2022 14:03:57 GMT
| < Content-Length: 171
| <
| * Connection #0 to host localhost left intact
| * Found bundle for host localhost: 0x4f1fa0 [serially]
| * Re-using existing connection! (#0) with host localhost
| * Connected to localhost (127.0.0.1) port 12001 (#0)
| > GET /test/search/1/False_2_None_None_p%3C%3A%3A%3Aexample_path%3E HTTP/1.1
| Host: localhost:12001
| User-Agent: pkg/1a6ee32f8 (sunos i86pc; 5.11 omnios-master-57a305f45c; full; pkg)
| Accept: */*
| X-IPkg-UUID: 089741e2-a468-11ec-9b51-820820101cb6
| X-IPkg-Variant: global,full
|
| * Mark bundle as not supporting multiuse
| < HTTP/1.1 200 OK
| < Content-Type: text/plain;charset=utf-8
| < Server: CherryPy/18.6.1
| < Date: Tue, 15 Mar 2022 13:58:58 GMT
| < Cache-Control: must-revalidate, no-transform, max-age=14400
| < Expires: Tue, 15 Mar 2022 17:58:58 GMT
| < Transfer-Encoding: chunked
| <
| Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29, function pycurl_get_thread_state

The stack trace from the core file is:

status: process terminated by SIGABRT (Abort), pid=12754 uid=0 code=-1
libc panic message: Assertion failed: self->handle != NULL, file src/threadsupport.c, line 29, function pycurl_get_thread_state

fffffc7fffdf3790 libc.so.1`abort+0x58()
fffffc7fffdf3a00 0xfffffc7feee15382()
fffffc7fffdf3a30 pycurl.cpython-310.so`pycurl_get_thread_state+0x187()
fffffc7fffdf3a50 pycurl.cpython-310.so`pycurl_acquire_thread+0x11()
fffffc7fffdf3ab0 pycurl.cpython-310.so`progress_callback+0x46()
fffffc7fffdf3ba0 libcurl.so.4.7.0`Curl_pgrsUpdate+0x3c4()
fffffc7fffdf3bc0 libcurl.so.4.7.0`Curl_pgrsDone+0x1a()
fffffc7fffdf3d10 libcurl.so.4.7.0`multi_done+0x318()
fffffc7fffdf3d40 libcurl.so.4.7.0`curl_multi_remove_handle+0xb1()
fffffc7fffdf3d80 pycurl.cpython-310.so`util_curl_xdecref+0x87()
fffffc7fffdf3db0 pycurl.cpython-310.so`util_curl_close+0x7c()
fffffc7fffdf3dd0 pycurl.cpython-310.so`do_curl_dealloc+0x48()
fffffc7fffdf3e00 libpython3.10.so.1.0`free_keys_object+0xa5()
fffffc7fffdf3e50 libpython3.10.so.1.0`dict_dealloc+0x6f()
fffffc7fffdf3e70 pycurl.cpython-310.so`do_multi_clear+0x9d()
fffffc7fffdf3fd0 libpython3.10.so.1.0`gc_collect_main+0x4ef()
fffffc7fffdf3ff0 libpython3.10.so.1.0`_PyGC_CollectNoFail+0x35()
fffffc7fffdf4080 libpython3.10.so.1.0`finalize_modules+0x29f()
fffffc7fffdf40c0 libpython3.10.so.1.0`Py_FinalizeEx+0x85()
fffffc7fffdf40e0 libpython3.10.so.1.0`Py_Exit+0xd()
fffffc7fffdf4110 libpython3.10.so.1.0`_PyErr_PrintEx.part.0.cold()
fffffc7fffdf4130 libpython3.10.so.1.0`PyErr_PrintEx+0x1e()
fffffc7fffdf41a0 libpython3.10.so.1.0`_PyRun_SimpleFileObject+0x3bd()
fffffc7fffdf41e0 libpython3.10.so.1.0`_PyRun_AnyFileObject+0x8b()
fffffc7fffdf42d0 libpython3.10.so.1.0`Py_RunMain+0x3d6()
fffffc7fffdf4320 libpython3.10.so.1.0`Py_BytesMain+0x3c()
fffffc7fffdf4350 _start_crt+0x87()
fffffc7fffdf4360 _start+0x18()

so it seems to be being triggered by garbage collection.

What is the PycURL version?

'PycURL/7.45.1 libcurl/7.82.0 OpenSSL/3.0.1 zlib/1.2.11 brotli/1.0.9 zstd/1.5.2 nghttp2/1.47.0'

What is your Python version?

Python 3.10.2

What is your operating system and its version?

OmniOS v11 r151041 SunOS bloody 5.11 omnios-master-57a305f45c i86pc i386 i86pc

@swt2c
Copy link
Contributor

swt2c commented Mar 15, 2022

Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.

@fsbs any ideas here?

@citrus-it
Copy link
Author

Yeah, a reproducer would be very helpful. If not, you could try bisecting to see which commit introduced the problem.

I'm having trouble writing something standalone to trigger it, the application is pretty big.
I'll bisect and report back.

@citrus-it
Copy link
Author

citrus-it commented Mar 15, 2022

ddde588

ddde58832187211e68e04f16a08342b155baab51 is the first bad commit
commit ddde58832187211e68e04f16a08342b155baab51
Author: fsbs <fsbs@users.noreply.github.com>
Date:   Tue Oct 12 14:18:23 2021 +0000

    Set easy->multi_stack->state for multi's callbacks

    This ensures multi's callbacks invoked by some easy functions have
    access to thread state.

    Fixes callbacks invoked by:
     * do_curl_pause
     * util_curl_xdecref

@swt2c
Copy link
Contributor

swt2c commented Mar 15, 2022

Thanks. That was my suspicion. Hopefully @fsbs will have a chance to look into this.

@swt2c swt2c added the bug label Mar 16, 2022
@citrus-it
Copy link
Author

Sorry, still no luck with a reproducer, I'll have another stab at it tonight.

@swt2c
Copy link
Contributor

swt2c commented May 7, 2022

@citrus-it did you ever get a reproducer for this?

@citrus-it
Copy link
Author

Unfortunately not, I'll give it another go this week. Our test suite triggers it every time, but it's a large project so hard to isolate.

@citrus-it
Copy link
Author

I have a smaller reproducer now, still using a large part of the local application library so I will keep iterating.

@citrus-it
Copy link
Author

I've been unable to make a standalone reproducer for this, the small-ish script I have still pulls in a lot of modules from the application.

However, I can see where the bug is from reading through the code.

Here's the relevant piece of the stack trace from the core file, most recent frame first (so read it backwards), with some in-line annotations.

fffffc7fffdf0520 pycurl.cpython-310.so`pycurl_get_thread_state+0x187()
    if (self->multi_stack != NULL && self->multi_stack->state != NULL)
    {
        /* inside multi_perform() */
        assert(self->handle != NULL);                          <--- assertion fails

fffffc7fffdf0540 pycurl.cpython-310.so`pycurl_acquire_thread+0x11()
fffffc7fffdf05a0 pycurl.cpython-310.so`progress_callback+0x46()
fffffc7fffdf0880 pycurl.cpython-310.so`util_curl_xdecref+0x87()
    This used to (before ddde588):
        self->multi_stack = NULL;
    before calling:
        curl_multi_remove_handle()
    but it no-longer does, causing pycurl_get_thread_sate() to enter the multi_stack cleanup block, where it did not before.

fffffc7fffdf08b0 pycurl.cpython-310.so`util_curl_close+0x78()
    handle = self->handle;
    self->handle = NULL;
    ...
    util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle);

This (^) is being called by the garbage collector at process exit, see below. I can't make my simple test case call this code path, but the failing application does it every time.

fffffc7fffdf08d0 pycurl.cpython-310.so`do_curl_dealloc+0x45()
fffffc7fffdf0970 pycurl.cpython-310.so`do_multi_clear+0x9d()
fffffc7fffdf0af0 libpython3.10.so.1.0`gc_collect_main+0x9d6()
fffffc7fffdf0b10 libpython3.10.so.1.0`_PyGC_CollectNoFail+0x3e()
fffffc7fffdf0ba0 libpython3.10.so.1.0`finalize_modules+0x4ef()
fffffc7fffdf0be0 libpython3.10.so.1.0`Py_FinalizeEx+0x90()
fffffc7fffdf0c00 libpython3.10.so.1.0`Py_Exit+0x10()

@kulikjak
Copy link

kulikjak commented Jun 2, 2023

Hi, we've hit this as well (apparently in the same application on Oracle Solaris). @citrus-it showed the behavior in the trace above, but here are my notes with some new info from my digging.

In our case, the issue arises in progress_callback function (as can be seen in the trace above) because the application is setting hdl.setopt(pycurl.PROGRESSFUNCTION, ...); when I remove it, the test passes again. That said, I believe that other callbacks are affected as well.

Here is what is happening:
util_curl_close is called and self->handle is set to NULL;util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle) is then called and it gets to the following code block:

    PYCURL_BEGIN_ALLOW_THREADS_EASY
    (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
    PYCURL_END_ALLOW_THREADS_EASY

at this point, self->handle is NULL, self->multi_stack is not yet NULL and since threads are allowed, progress_callback can be called inside the ALLOW_THREADS_EASY block. This gets all the way to pycurl_get_thread_state and:

    if (self->multi_stack != NULL && self->multi_stack->state != NULL)
    {
        /* inside multi_perform() */
        assert(self->handle != NULL);
        assert(self->multi_stack->multi_handle != NULL);
        assert(self->state == NULL);
        return self->multi_stack->state;
    }

where the condition still passes and the assertion fails.

When I move self->multi_stack = NULL; above PYCURL_BEGIN_ALLOW_THREADS_EASY, it still fails only in the other pycurl_get_thread_state branch because it doesn't change anything about self->handle being NULL.

When I remove PYCURL_BEGIN_ALLOW_THREADS_EASY, the issue is gone as no callbacks are allowed any longer. I believe that unless the handle is still available, util_curl_xdecref should not allow threads if called from util_curl_close (there should be no issue when called from do_curl_clear, though).

@kulikjak
Copy link

kulikjak commented Jun 2, 2023

As for the possible fix, I tried the following three patches:

  • don't allow threads at all (reverting part of ddde588)
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -469,10 +469,7 @@ util_curl_xdecref(CurlObject *self, int
                 from the multi object's easy_object_dict, but this
                 requires us to have a reference to the multi object
                 which right now we don't. */
-                /* Allow threads because callbacks can be invoked */
-                PYCURL_BEGIN_ALLOW_THREADS_EASY
                 (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
-                PYCURL_END_ALLOW_THREADS_EASY
             }
             self->multi_stack = NULL;
             Py_DECREF(multi_stack);
  • allow threads only when self->handle is available
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -470,9 +470,13 @@ util_curl_xdecref(CurlObject *self, int
                 requires us to have a reference to the multi object
                 which right now we don't. */
                 /* Allow threads because callbacks can be invoked */
-                PYCURL_BEGIN_ALLOW_THREADS_EASY
-                (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
-                PYCURL_END_ALLOW_THREADS_EASY
+                if (self->handle) {
+                    PYCURL_BEGIN_ALLOW_THREADS_EASY
+                    (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
+                    PYCURL_END_ALLOW_THREADS_EASY
+                } else {
+                    (void) curl_multi_remove_handle(multi_stack->multi_handle, handle);
+                }
             }
             self->multi_stack = NULL;
             Py_DECREF(multi_stack);
  • make sure that self->handle is still available when util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle) is called
--- pycurl-7.45.2/src/easy.c
+++ pycurl-7.45.2/src/easy.c
@@ -567,7 +567,6 @@ util_curl_close(CurlObject *self)
     assert(self != NULL);
     assert(PyObject_IsInstance((PyObject *) self, (PyObject *) p_Curl_Type) == 1);
     handle = self->handle;
-    self->handle = NULL;
     if (handle == NULL) {
         /* Some paranoia assertions just to make sure the object
          * deallocation problem is finally really fixed... */
@@ -584,6 +583,7 @@ util_curl_close(CurlObject *self)
 
     /* Decref multi stuff which uses this handle */
     util_curl_xdecref(self, PYCURL_MEMGROUP_MULTI, handle);
+    self->handle = NULL;
     /* Decref share which uses this handle */
     util_curl_xdecref(self, PYCURL_MEMGROUP_SHARE, handle);

All three seem to have fixed the original issue we saw in the application.

I also ran PycURL test suite and the first two changes caused multi_callback_test.py::MultiCallbackTest::test_easy_close failure. The last one doesn't seem to introduce new issues, but I don't know what other issues it might cause (and it seems like the biggest change out of those three).

All that said, I have only very limited knowledge of PycURL (or cURL), so I might be missing important details.

@p
Copy link
Member

p commented Jun 30, 2023

Thank you very much for your research @citrus-it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants