Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gotcha breaks OpenCL #146

Closed
bertwesarg opened this issue Apr 15, 2024 · 17 comments · Fixed by #148
Closed

Gotcha breaks OpenCL #146

bertwesarg opened this issue Apr 15, 2024 · 17 comments · Fixed by #148
Assignees

Comments

@bertwesarg
Copy link
Contributor

bertwesarg commented Apr 15, 2024

I have a reproducer that breaks valid OpenCL applications when wrapping OpenCL functions with GOTCHA

Compile either with or without -DUSE_GOTCHA:

/* -*- c -*- */

#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
#include <math.h>

#ifdef __APPLE__
#  include <OpenCL/opencl.h>
#else
#  include <CL/cl.h>
#endif

#ifdef USE_GOTCHA

#include <gotcha/gotcha.h>

// We need a place to store the pointer to the function we've wrapped
static gotcha_wrappee_handle_t handle_clGetPlatformIDs;
static gotcha_wrappee_handle_t handle_clGetExtensionFunctionAddress;
static gotcha_wrappee_handle_t handle_clGetPlatformInfo;

static cl_int
doClGetPlatformIDs(cl_uint         numEntries,
                   cl_platform_id* platforms,
                   cl_uint*        numPlatforms)
{
    fprintf(stderr, "E clGetPlatformIDs(%u)\n", numEntries);

    typeof(&doClGetPlatformIDs) origClGetPlatformIDs = gotcha_get_wrappee(handle_clGetPlatformIDs);
    cl_int ret = origClGetPlatformIDs(numEntries, platforms, numPlatforms);

    fprintf(stderr, "E clGetPlatformIDs(%u) = %d\n", numEntries, ret);

    return ret;
}

static void*
doClGetExtensionFunctionAddress( const char* funcName )
{
    fprintf(stderr, "E clGetExtensionFunctionAddress(%s)\n", funcName);

    typeof(&doClGetExtensionFunctionAddress) origClGetExtensionFunctionAddress = gotcha_get_wrappee(handle_clGetExtensionFunctionAddress);
    void* ret = origClGetExtensionFunctionAddress( funcName );

    fprintf(stderr, "L clGetExtensionFunctionAddress(%s) = %p\n", funcName, ret);

    return ret;
}

static cl_int
doClGetPlatformInfo(cl_platform_id   platform,
                    cl_platform_info paramName,
                    size_t           paramValueSize,
                    void*            paramValue,
                    size_t*          paramValueSizeRet)
{
    fprintf(stderr, "E clGetPlatformInfo()\n");

    typeof(&doClGetPlatformInfo) origClGetPlatformInfo = gotcha_get_wrappee(handle_clGetPlatformInfo);
    cl_int ret = origClGetPlatformInfo(platform, paramName, paramValueSize, paramValue, paramValueSizeRet);

    fprintf(stderr, "L clGetPlatformInfo()\n");

    return ret;
}

struct gotcha_binding_t bindings[] = {
    { "clGetPlatformIDs", doClGetPlatformIDs, &handle_clGetPlatformIDs},
    { "clGetExtensionFunctionAddress", doClGetExtensionFunctionAddress, &handle_clGetExtensionFunctionAddress},
    { "clGetPlatformInfo", doClGetPlatformInfo, &handle_clGetPlatformInfo}
};

#endif

int
main(int ac, char *av[])
{
#ifdef USE_GOTCHA
    gotcha_wrap(bindings, 2, "ocl_test");
#endif

    cl_uint platform_count = 0;

    clGetPlatformIDs(0, NULL, &platform_count);

    if (platform_count == 0) {
        printf("No OpenCL platforms found!\n");
        return 127;
    } else {
        cl_platform_id* platforms = calloc(platform_count, sizeof(*platforms));
        clGetPlatformIDs(platform_count, platforms, NULL);

        printf("Platforms:\n");

        for (size_t i = 0; i < platform_count; i++) {
            char platform_name[256];
            clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL);
            printf(" %zu: %s\n", i, platform_name);
        }

        free(platforms);
    }

    return 0;
}

When running the non-wrapped version and setting a breakpoint at clGetExtensionFunctionAddress, the call chain looks like this:

#0  0x00007ffff7c64400 in clGetExtensionFunctionAddress () from /opt/rocm-6.0.2/lib/libamdocl64.so
#1  0x00007ffff7f8e532 in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#2  0x00007ffff7f9058c in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#3  0x00007ffff7d874df in __pthread_once_slow (once_control=0x7ffff7f940f8, init_routine=0x7ffff7f90420) at pthread_once.c:116
#4  0x00007ffff7f8eb95 in clGetPlatformIDs () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#5  0x000000000020190f in main ()

clGetExtensionFunctionAddress is called from inside libOpenCL.so.1 but will be dispatched to the AMD OpenCL ICD libamdocl64.so.

When doing the same with the wrapped binary:

#0  0x00007ffff7f8fc60 in clGetExtensionFunctionAddress () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#1  0x0000000000201c24 in doClGetExtensionFunctionAddress ()
#2  0x00007ffff7f8e532 in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#3  0x00007ffff7f9058c in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#4  0x00007ffff7d764df in __pthread_once_slow (once_control=0x7ffff7f940f8, init_routine=0x7ffff7f90420) at pthread_once.c:116
#5  0x00007ffff7f8eb95 in clGetPlatformIDs () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#6  0x0000000000201bb1 in doClGetPlatformIDs ()
#7  0x0000000000201d3d in main ()

the wrapper calls clGetExtensionFunctionAddress from libOpenCL.so.1 and not from libamdocl64.so. But because the OpenCL library is still in its initialization, clGetExtensionFunctionAddress tries to init itself again but that deadlocks because of the recursive call to pthread_once. Here is the backtrace when continuing:

#0  futex_wait (private=0, expected=1, futex_word=0x7ffff7f940f8) at ../sysdeps/nptl/futex-internal.h:141
#1  futex_wait_simple (private=0, expected=1, futex_word=0x7ffff7f940f8) at ../sysdeps/nptl/futex-internal.h:172
#2  __pthread_once_slow (once_control=0x7ffff7f940f8, init_routine=0x7ffff7f90420) at pthread_once.c:105
#3  0x00007ffff7f8fc7f in clGetExtensionFunctionAddress () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#4  0x0000000000201c24 in doClGetExtensionFunctionAddress ()
#5  0x00007ffff7f8e532 in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#6  0x00007ffff7f9058c in ?? () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#7  0x00007ffff7d764df in __pthread_once_slow (once_control=0x7ffff7f940f8, init_routine=0x7ffff7f90420) at pthread_once.c:116
#8  0x00007ffff7f8eb95 in clGetPlatformIDs () from /opt/rocm-6.0.2/lib/libOpenCL.so.1
#9  0x0000000000201bb1 in doClGetPlatformIDs ()
#10 0x0000000000201d3d in main ()

You will notice that the arguments to __pthread_once_slow are the same in level #2 and #7.

My current work around is to filter libOpenCL.so.

@hariharan-devarajan
Copy link
Member

@bertwesarg A recent bug fix to return the correct internal link_map, did an dlopen on symbols intercepted by gotcha here. I thinking that the RTLD_NOW is creating this issue.

I will try and debug this and find the root cause.

Thoughts?

@bertwesarg
Copy link
Contributor Author

that first showed up over year ago and I was just coming back to it, but I can still reproduce this with 1.0.6 (9bff68a). But yes, the libOpenCL.so is using RTLD_NOW when loading the ICD

@bertwesarg
Copy link
Contributor Author

btw, if I call clGetExtensionFunctionAddress("") before gotcha_wrap, it also solves the deadlock, but I consider this a rather fragile workaround

@hariharan-devarajan
Copy link
Member

@bertwesarg I am new to open cl. I tried the program you shared with gotcha enabled.

E clGetPlatformIDs(0)
E clGetExtensionFunctionAddress(clIcdGetPlatformIDsKHR)
L clGetExtensionFunctionAddress(clIcdGetPlatformIDsKHR) = (nil)
E clGetPlatformIDs(0) = -1001
No OpenCL platforms found!

Is this expected output? I compiled the code with opencl library. Is there anything else I should do?

@bertwesarg
Copy link
Contributor Author

bertwesarg commented Apr 18, 2024

I tested it with the AMD one, which’s libOpenCL.so ICD dispatcher is from the Khronos group, and the deadlock happens only in their code. I checked with the Intel oneAPI OpenCL and there it happens too:

$ icx -g test-ocl.c -DUSE_GOTCHA -o test-ocl-wrapped -lOpenCL -lgotcha
#0  __pthread_once_slow (once_control=0x7ffff7fc6d78 <initialized>, init_routine=0x7ffff7fc3b90 <khrIcdOsVendorsEnumerate>) at pthread_once.c:68
#1  0x00007ffff7fc0671 in clGetExtensionFunctionAddress () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#2  0x00000000004015e4 in doClGetExtensionFunctionAddress (funcName=0x7ffff7fbb508 "clIcdGetPlatformIDsKHR") at test-ocl.c:48
#3  0x00007ffff7fbf8d8 in khrIcdVendorAdd () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#4  0x00007ffff7fc0304 in khrIcdVendorsEnumerateEnv () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#5  0x00007ffff7fc3b9b in khrIcdOsVendorsEnumerate () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#6  0x00007ffff60504df in __pthread_once_slow (once_control=0x7ffff7fc6d78 <initialized>, init_routine=0x7ffff7fc3b90 <khrIcdOsVendorsEnumerate>) at pthread_once.c:116
#7  0x00007ffff7fc0501 in clGetPlatformIDs () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#8  0x0000000000401571 in doClGetPlatformIDs (numEntries=0, platforms=0x0, numPlatforms=0x7fffffffc64c) at test-ocl.c:35
#9  0x00000000004016fd in main (ac=1, av=0x7fffffffc758) at test-ocl.c:88

The output without GOTCHA is

Platforms:
 0: Intel(R) OpenCL
 1: AMD Accelerated Parallel Processing
 2: Intel(R) FPGA Emulation Platform for OpenCL(TM)

With GOTCHA it hangs after:

E clGetPlatformIDs(0)
E clGetExtensionFunctionAddress(clIcdGetPlatformIDsKHR)

I also tested the libOpenCL.so from Ubuntu (ocl-icd-opencl-dev), but this uses a different libOpenCL.so implementation and does not suffer from the deadlock.

@hariharan-devarajan
Copy link
Member

@bertwesarg To me, it looks like the function doClGetExtensionFunctionAddress is defined in two libraries, and when u wrap the function, it picks up from the first one based on link order. Now, on load, the first library is probably doing a dlsym on the second libraries's function during initialization. GOTCHA catches that and calls the first library, which is still loading.

Is the intention to wrap both these libraries or one of these?

if u want to only wrap one of these functions, then specifying which library through a filter makes sense to me.

If u wanna wrap both these functions, then we should have two wrappers and again use filter to select which library u wanna match.

@hariharan-devarajan
Copy link
Member

I think i found the issue. When we call the gotcha wrap, the libamdocl64.so is not loaded so the wrapping of is a no-op as the symbol table for that was not updated but it was in the gotcha internal.

Now when the opencl calls dlsym on the same function, it is looping back to the wrapper function which doesn't point to the libamdocl64.so or anything at all.

With the current design, gotcha should not call the wrapper as it was not wrapped correctly.

We need to track this which is a bug. However if I fix it, it will not wrap ClGetExtensionFunctionAddress till the library is loaded. Either by loading openCL first or LD_PRELOADING the loading library.

@bertwesarg
Copy link
Contributor Author

bertwesarg commented Apr 18, 2024

doClGetExtensionFunctionAddress is only defined in the test-ocl.c. clGetExtensionFunctionAddress is defined in two libraries. And this is also expected, as libOpenCL.so is a dispatcher library.

The application is expected to only call into libOpenCL.so, so yes, setting up a filter to not wrap function calls inside libOpenCL.so does work (as stated in my OP) .

I extended my test to include wrapping a non-OpenCL symbol, to see if its possible to wrap different sets of symbols, but the filtering seems to be ineffective.

/* -*- c -*- */

#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
#include <math.h>

#include <pthread.h>

#ifdef __APPLE__
#  include <OpenCL/opencl.h>
#else
#  include <CL/cl.h>
#endif

#include <link.h>

#ifdef USE_GOTCHA

#include <gotcha/gotcha.h>

static gotcha_wrappee_handle_t handle_clGetPlatformIDs;
static gotcha_wrappee_handle_t handle_clGetExtensionFunctionAddress;
static gotcha_wrappee_handle_t handle_clGetPlatformInfo;
static gotcha_wrappee_handle_t handle_pthread_once;

static cl_int
doClGetPlatformIDs(cl_uint         numEntries,
                   cl_platform_id* platforms,
                   cl_uint*        numPlatforms)
{
    fprintf(stderr, "E clGetPlatformIDs(%u)\n", numEntries);

    typeof(&doClGetPlatformIDs) origClGetPlatformIDs = gotcha_get_wrappee(handle_clGetPlatformIDs);
    cl_int ret = origClGetPlatformIDs(numEntries, platforms, numPlatforms);

    fprintf(stderr, "E clGetPlatformIDs(%u) = %d\n", numEntries, ret);

    return ret;
}

static void*
doClGetExtensionFunctionAddress( const char* funcName )
{
    fprintf(stderr, "E clGetExtensionFunctionAddress(%s)\n", funcName);

    typeof(&doClGetExtensionFunctionAddress) origClGetExtensionFunctionAddress = gotcha_get_wrappee(handle_clGetExtensionFunctionAddress);
    void* ret = origClGetExtensionFunctionAddress( funcName );

    fprintf(stderr, "L clGetExtensionFunctionAddress(%s) = %p\n", funcName, ret);

    return ret;
}

static cl_int
doClGetPlatformInfo(cl_platform_id   platform,
                    cl_platform_info paramName,
                    size_t           paramValueSize,
                    void*            paramValue,
                    size_t*          paramValueSizeRet)
{
    fprintf(stderr, "E clGetPlatformInfo()\n");

    typeof(&doClGetPlatformInfo) origClGetPlatformInfo = gotcha_get_wrappee(handle_clGetPlatformInfo);
    cl_int ret = origClGetPlatformInfo(platform, paramName, paramValueSize, paramValue, paramValueSizeRet);

    fprintf(stderr, "L clGetPlatformInfo()\n");

    return ret;
}

static struct gotcha_binding_t ocl_bindings[] = {
    { "clGetPlatformIDs", doClGetPlatformIDs, &handle_clGetPlatformIDs},
    { "clGetExtensionFunctionAddress", doClGetExtensionFunctionAddress, &handle_clGetExtensionFunctionAddress},
    { "clGetPlatformInfo", doClGetPlatformInfo, &handle_clGetPlatformInfo}
};

static int
do_pthread_once(pthread_once_t *once_control,
                void (*init_routine) (void))
{
    fprintf(stderr, "E pthread_once()\n");

    typeof(&do_pthread_once) orig_pthread_once = gotcha_get_wrappee(handle_pthread_once);
    int ret = orig_pthread_once(once_control, init_routine);

    fprintf(stderr, "L pthread_once()\n");

    return ret;
    
}

static struct gotcha_binding_t pthread_bindings[] = {
    { "pthread_once", do_pthread_once, &handle_pthread_once}
};

#endif

static int
filter_libs(struct link_map *target)
{
    const char* libs[] = {
        "/libOpenCL.so",
        NULL
    };
    for (const char** lib = libs; *lib; lib++) {
        if (strstr(target->l_name, *lib) != 0)
            return 0;
    }

    return 1;
}

int
main(int ac, char *av[])
{
#ifdef USE_GOTCHA
    gotcha_wrap(pthread_bindings, 1, "ocl_test");

    gotcha_set_library_filter_func(filter_libs);

    gotcha_wrap(ocl_bindings, 2, "ocl_test");
#endif

    cl_uint platform_count = 0;
    clGetPlatformIDs(0, NULL, &platform_count);

    if (platform_count == 0) {
        printf("No OpenCL platforms found!\n");
        return 127;
    } else {
        cl_platform_id* platforms = calloc(platform_count, sizeof(*platforms));
        clGetPlatformIDs(platform_count, platforms, NULL);

        printf("Platforms:\n");

        for (size_t i = 0; i < platform_count; i++) {
            char platform_name[256];
            clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL);
            printf(" %zu: %s\n", i, platform_name);
        }

        free(platforms);
    }

    return 0;
}

@bertwesarg
Copy link
Contributor Author

bertwesarg commented Apr 19, 2024

I now have a standalone reproducer. But it still fails with #148 (c09a9ef).

$ tar xf dispatcher.targz
$ cd dispatcher
$ make GOTCHA_ROOT=<path-to-gotcha>
$ ./main
Ed foo()
Ed dispatch_init()
Ei bar()
Li bar()
Ld dispatch_init() = 23
Ei bar()
Li bar()
Ei foo()
Li foo()
Ld foo()
65
$ ./main-gotcha
Ew foo()
Ed foo()
Ed dispatch_init()
Ed bar()
^C

this Ed bar() needs to be Ei bar() to avoid the deadlock in pthread_once.

@hariharan-devarajan
Copy link
Member

@bertwesarg I tried the reproducer and added it to the CI as well. It seems to be finishing for me with that branch.

@bertwesarg
Copy link
Contributor Author

thanks, I will give it a final run with the actual tool

@bertwesarg
Copy link
Contributor Author

works now also without filtering libOpenCL.so. thanks

@bertwesarg
Copy link
Contributor Author

Is it an intended side affect, that symbols resolved via dlsym are not wrapped at all anymore? Before this PR, it was possible to install a wrapper for foo and callers of dlsym(foo) got the wrapper.

@hariharan-devarajan
Copy link
Member

The intended behavior as i understand is as follows.

When gotcha_wrap is called, it will wrap functions in symbol table for all libraries that are currently loaded. In this case the dispatcher library.

Now if the dispatcher library opens a new library and does dlsym on that with the same function name, that function in the new library was not wrapped yet by gotcha. Now we do have a handling that can differ wrapping for cases where the first gotcha_wrap didn't find the symbol itself. Then in that case, on dlopen the unwrapped functions are checked and wrapped. However, in this case, gotcha finds the symbol name in the dispatcher library and thus its not an unwrapped function.

if you called gotcha_wrap after dlopen of that new library it should be wrapped.

One potential caviate I see here is that if the same function name exists in two different libraries (at the time of gotcha_wrap), which internal function would be called as the wrapee as we have just one wrapper? I think in this case, the best way is to use the filter and wrap each function differently. I don't see any other way currently around this.

@bertwesarg
Copy link
Contributor Author

I think you have at least one misconception. GOTCHA tries to wrap all currently loaded libraries when gotcha_wrap is called or a new libarary dlopen ed. This means, that if the filter changes between two wraps, the filtering at the time of the first wrapping is reverted. You can see this in this new test I made. It sets a filter before dlopen libimplX.so, which is effective. Then I reset the filter and loading libimplY.so, GOTCHA wraps also the call into libimplX.so.

That its not possible to change the result of an dlsym which was done before calling gotcha_wrap, is expected. I did not have done it when finding that dlsym is not working anymore in my previous comment.

@hariharan-devarajan
Copy link
Member

@bertwesarg I revised the dlopen test we had. I also found a bug in my commit which i fixed. As u can see in the test, when the gotcha_wrap is called, libnum.so or libnum2.so are not loaded. Therefore, the return_four and return_six functions are not wrapped.

Once we have loaded the libs they are intercepted correctly.

for the filter part, the first part i understand and agree.

I am not sure i understand this part.
That its not possible to change the result of an dlsym which was done before calling gotcha_wrap, is expected. I did not have done it when finding that dlsym is not working anymore in my previous comment.

@hariharan-devarajan
Copy link
Member

@bertwesarg Can u check if #148 solves the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants