-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gotcha breaks OpenCL #146
Comments
@bertwesarg A recent bug fix to return the correct internal link_map, did an dlopen on symbols intercepted by gotcha here. I thinking that the RTLD_NOW is creating this issue. I will try and debug this and find the root cause. Thoughts? |
btw, if I call |
@bertwesarg I am new to open cl. I tried the program you shared with gotcha enabled.
Is this expected output? I compiled the code with opencl library. Is there anything else I should do? |
I tested it with the AMD one, which’s $ icx -g test-ocl.c -DUSE_GOTCHA -o test-ocl-wrapped -lOpenCL -lgotcha #0 __pthread_once_slow (once_control=0x7ffff7fc6d78 <initialized>, init_routine=0x7ffff7fc3b90 <khrIcdOsVendorsEnumerate>) at pthread_once.c:68
#1 0x00007ffff7fc0671 in clGetExtensionFunctionAddress () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#2 0x00000000004015e4 in doClGetExtensionFunctionAddress (funcName=0x7ffff7fbb508 "clIcdGetPlatformIDsKHR") at test-ocl.c:48
#3 0x00007ffff7fbf8d8 in khrIcdVendorAdd () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#4 0x00007ffff7fc0304 in khrIcdVendorsEnumerateEnv () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#5 0x00007ffff7fc3b9b in khrIcdOsVendorsEnumerate () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#6 0x00007ffff60504df in __pthread_once_slow (once_control=0x7ffff7fc6d78 <initialized>, init_routine=0x7ffff7fc3b90 <khrIcdOsVendorsEnumerate>) at pthread_once.c:116
#7 0x00007ffff7fc0501 in clGetPlatformIDs () from /opt/intel/oneapi/compiler/2024.1/lib/libOpenCL.so.1
#8 0x0000000000401571 in doClGetPlatformIDs (numEntries=0, platforms=0x0, numPlatforms=0x7fffffffc64c) at test-ocl.c:35
#9 0x00000000004016fd in main (ac=1, av=0x7fffffffc758) at test-ocl.c:88 The output without GOTCHA is
With GOTCHA it hangs after:
I also tested the |
@bertwesarg To me, it looks like the function Is the intention to wrap both these libraries or one of these? if u want to only wrap one of these functions, then specifying which library through a filter makes sense to me. If u wanna wrap both these functions, then we should have two wrappers and again use filter to select which library u wanna match. |
I think i found the issue. When we call the gotcha wrap, the Now when the opencl calls dlsym on the same function, it is looping back to the wrapper function which doesn't point to the With the current design, gotcha should not call the wrapper as it was not wrapped correctly. We need to track this which is a bug. However if I fix it, it will not wrap |
The application is expected to only call into I extended my test to include wrapping a non-OpenCL symbol, to see if its possible to wrap different sets of symbols, but the filtering seems to be ineffective. /* -*- c -*- */
#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
#include <math.h>
#include <pthread.h>
#ifdef __APPLE__
# include <OpenCL/opencl.h>
#else
# include <CL/cl.h>
#endif
#include <link.h>
#ifdef USE_GOTCHA
#include <gotcha/gotcha.h>
static gotcha_wrappee_handle_t handle_clGetPlatformIDs;
static gotcha_wrappee_handle_t handle_clGetExtensionFunctionAddress;
static gotcha_wrappee_handle_t handle_clGetPlatformInfo;
static gotcha_wrappee_handle_t handle_pthread_once;
static cl_int
doClGetPlatformIDs(cl_uint numEntries,
cl_platform_id* platforms,
cl_uint* numPlatforms)
{
fprintf(stderr, "E clGetPlatformIDs(%u)\n", numEntries);
typeof(&doClGetPlatformIDs) origClGetPlatformIDs = gotcha_get_wrappee(handle_clGetPlatformIDs);
cl_int ret = origClGetPlatformIDs(numEntries, platforms, numPlatforms);
fprintf(stderr, "E clGetPlatformIDs(%u) = %d\n", numEntries, ret);
return ret;
}
static void*
doClGetExtensionFunctionAddress( const char* funcName )
{
fprintf(stderr, "E clGetExtensionFunctionAddress(%s)\n", funcName);
typeof(&doClGetExtensionFunctionAddress) origClGetExtensionFunctionAddress = gotcha_get_wrappee(handle_clGetExtensionFunctionAddress);
void* ret = origClGetExtensionFunctionAddress( funcName );
fprintf(stderr, "L clGetExtensionFunctionAddress(%s) = %p\n", funcName, ret);
return ret;
}
static cl_int
doClGetPlatformInfo(cl_platform_id platform,
cl_platform_info paramName,
size_t paramValueSize,
void* paramValue,
size_t* paramValueSizeRet)
{
fprintf(stderr, "E clGetPlatformInfo()\n");
typeof(&doClGetPlatformInfo) origClGetPlatformInfo = gotcha_get_wrappee(handle_clGetPlatformInfo);
cl_int ret = origClGetPlatformInfo(platform, paramName, paramValueSize, paramValue, paramValueSizeRet);
fprintf(stderr, "L clGetPlatformInfo()\n");
return ret;
}
static struct gotcha_binding_t ocl_bindings[] = {
{ "clGetPlatformIDs", doClGetPlatformIDs, &handle_clGetPlatformIDs},
{ "clGetExtensionFunctionAddress", doClGetExtensionFunctionAddress, &handle_clGetExtensionFunctionAddress},
{ "clGetPlatformInfo", doClGetPlatformInfo, &handle_clGetPlatformInfo}
};
static int
do_pthread_once(pthread_once_t *once_control,
void (*init_routine) (void))
{
fprintf(stderr, "E pthread_once()\n");
typeof(&do_pthread_once) orig_pthread_once = gotcha_get_wrappee(handle_pthread_once);
int ret = orig_pthread_once(once_control, init_routine);
fprintf(stderr, "L pthread_once()\n");
return ret;
}
static struct gotcha_binding_t pthread_bindings[] = {
{ "pthread_once", do_pthread_once, &handle_pthread_once}
};
#endif
static int
filter_libs(struct link_map *target)
{
const char* libs[] = {
"/libOpenCL.so",
NULL
};
for (const char** lib = libs; *lib; lib++) {
if (strstr(target->l_name, *lib) != 0)
return 0;
}
return 1;
}
int
main(int ac, char *av[])
{
#ifdef USE_GOTCHA
gotcha_wrap(pthread_bindings, 1, "ocl_test");
gotcha_set_library_filter_func(filter_libs);
gotcha_wrap(ocl_bindings, 2, "ocl_test");
#endif
cl_uint platform_count = 0;
clGetPlatformIDs(0, NULL, &platform_count);
if (platform_count == 0) {
printf("No OpenCL platforms found!\n");
return 127;
} else {
cl_platform_id* platforms = calloc(platform_count, sizeof(*platforms));
clGetPlatformIDs(platform_count, platforms, NULL);
printf("Platforms:\n");
for (size_t i = 0; i < platform_count; i++) {
char platform_name[256];
clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL);
printf(" %zu: %s\n", i, platform_name);
}
free(platforms);
}
return 0;
} |
I now have a standalone reproducer. But it still fails with #148 (c09a9ef). $ tar xf dispatcher.targz
$ cd dispatcher
$ make GOTCHA_ROOT=<path-to-gotcha>
$ ./main
Ed foo()
Ed dispatch_init()
Ei bar()
Li bar()
Ld dispatch_init() = 23
Ei bar()
Li bar()
Ei foo()
Li foo()
Ld foo()
65
$ ./main-gotcha
Ew foo()
Ed foo()
Ed dispatch_init()
Ed bar()
^C this |
@bertwesarg I tried the reproducer and added it to the CI as well. It seems to be finishing for me with that branch. |
thanks, I will give it a final run with the actual tool |
works now also without filtering |
Is it an intended side affect, that symbols resolved via |
The intended behavior as i understand is as follows. When Now if the dispatcher library opens a new library and does dlsym on that with the same function name, that function in the new library was not wrapped yet by gotcha. Now we do have a handling that can differ wrapping for cases where the first gotcha_wrap didn't find the symbol itself. Then in that case, on dlopen the unwrapped functions are checked and wrapped. However, in this case, gotcha finds the symbol name in the dispatcher library and thus its not an unwrapped function. if you called One potential caviate I see here is that if the same function name exists in two different libraries (at the time of gotcha_wrap), which internal function would be called as the wrapee as we have just one wrapper? I think in this case, the best way is to use the filter and wrap each function differently. I don't see any other way currently around this. |
I think you have at least one misconception. GOTCHA tries to wrap all currently loaded libraries when That its not possible to change the result of an |
@bertwesarg I revised the dlopen test we had. I also found a bug in my commit which i fixed. As u can see in the test, when the gotcha_wrap is called, libnum.so or libnum2.so are not loaded. Therefore, the return_four and return_six functions are not wrapped. Once we have loaded the libs they are intercepted correctly. for the filter part, the first part i understand and agree. I am not sure i understand this part. |
@bertwesarg Can u check if #148 solves the issue? |
I have a reproducer that breaks valid OpenCL applications when wrapping OpenCL functions with GOTCHA
Compile either with or without
-DUSE_GOTCHA
:When running the non-wrapped version and setting a breakpoint at
clGetExtensionFunctionAddress
, the call chain looks like this:clGetExtensionFunctionAddress
is called from insidelibOpenCL.so.1
but will be dispatched to the AMD OpenCL ICDlibamdocl64.so
.When doing the same with the wrapped binary:
the wrapper calls
clGetExtensionFunctionAddress
fromlibOpenCL.so.1
and not fromlibamdocl64.so
. But because the OpenCL library is still in its initialization,clGetExtensionFunctionAddress
tries to init itself again but that deadlocks because of the recursive call topthread_once
. Here is the backtrace when continuing:You will notice that the arguments to
__pthread_once_slow
are the same in level#2
and#7
.My current work around is to filter
libOpenCL.so
.The text was updated successfully, but these errors were encountered: