Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel OpenMP breaks Gotcha #70

Open
DavidPoliakoff opened this issue Jun 6, 2018 · 3 comments
Open

Intel OpenMP breaks Gotcha #70

DavidPoliakoff opened this issue Jun 6, 2018 · 3 comments

Comments

@DavidPoliakoff
Copy link
Contributor

During the release process we discussed this issue with @daboehme, it was first reported in #40.

The problem is that Intel's runtimes use many mechanisms that break Gotcha, dlsym with RTLD_NEXT, and even (I believe) GOT rewriting. Fixing this could take multiple steps

Step 1) Function better with dlsym and RTLD_NEXT
Step 2) (If the problem still exists) Use some awareness when Intel OpenMP libraries are present. Perhaps, if Intel OpenMP is there, we wait until we see them do a rewrite before rewriting GOTs.

Closer to the time it comes to fix this I might ping some people at Intel, this seems like the kind of bug which we could iteratively cause for one another as we update code, I'd like to avoid that.

@mplegendre
Copy link
Member

@DavidPoliakoff -- I just went back to this, and failed to reproduce it. Do you still have a reproducer for this issue?

@jgalarowicz
Copy link

jgalarowicz commented Jun 10, 2020

We are attempting to use gotcha for our memory wrappers and have encountered a problem that shows up with Intel 16.0 compilers but does not show up with Intel 20 compilers. With the older Intel compiled application dgemm, we see an abort in gotcha:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./dgemm -size 5000 -iterations 6 -threads 8'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaab9f44fb in raise (sig=sig@entry=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
36      return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-39.el7.x86_64
(gdb) where
#0  0x00002aaaab9f44fb in raise (sig=sig@entry=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
#1  0x00002aaaab7cf6fe in monitor_signal_handler (sig=11, info=<optimized out>, context=0x7fffffff99c0) at signal.c:242
#2  <signal handler called>
#3  0x00002aaaaf543002 in ?? ()
#4  0x00002aaaab5c3005 in free_wrapper (ptr=0x637550) at /tmp/jgalaro/spack-stage/spack-stage-survey-develop-5hxg7azphjr266ym7ynxrhbnroxair73/spack-src/collector/mem/mem_gotcha.c:116
#5  0x00002aaab0cb6ded in destroy_hashtable (table=table@entry=0x7fffffffa060)
    at /tmp/jgalaro/spack-stage/spack-stage-gotcha-master-mrowqjodoyow7fzbklljl5if5qlparqh/spack-src/src/hash.c:123
#6  0x00002aaab0cb4bad in gotcha_wrap (user_bindings=user_bindings@entry=0x2aaaab7c40c0 <wrap_actions>, num_actions=num_actions@entry=6, tool_name=<optimized out>,
    tool_name@entry=0x2aaaab5c30f1 "mem_survey") at /tmp/jgalaro/spack-stage/spack-stage-gotcha-master-mrowqjodoyow7fzbklljl5if5qlparqh/spack-src/src/gotcha.c:336
#7  0x00002aaaab5c30e8 in init_mem_wrappers () at /tmp/jgalaro/spack-stage/spack-stage-survey-develop-5hxg7azphjr266ym7ynxrhbnroxair73/spack-src/collector/mem/mem_gotcha.c:71
#8  0x00002aaaaacd5d84 in monitor_init_process (argc=argc@entry=0x2aaaab9dc3e8 <monitor_argc>, argv=<optimized out>, data=data@entry=0x0)
    at /tmp/jgalaro/spack-stage/spack-stage-survey-develop-5hxg7azphjr266ym7ynxrhbnroxair73/spack-src/collector/Monitor.c:717
#9  0x00002aaaab7c98b4 in monitor_begin_process_fcn (user_data=user_data@entry=0x0, is_fork=is_fork@entry=0) at main.c:285
#10 0x00002aaaab7c9b4d in monitor_main (argc=argc@entry=7, argv=argv@entry=0x7fffffffa2b8, envp=0x7fffffffa2f8) at main.c:505
#11 0x00002aaaafdbe545 in __libc_start_main (main=0x2aaaab7c9aa8 <monitor_main>, argc=7, argv=0x7fffffffa2b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffa2a8) at ../csu/libc-start.c:266
#12 0x00002aaaab7c9f26 in __libc_start_main (main=0x401470 <main>, argc=7, argv=0x7fffffffa2b8, init=0x40d2d0 <__libc_csu_init>, fini=0x40d340 <__libc_csu_fini>,
    rtld_fini=0x2aaaaaabae60 <_dl_fini>, stack_end=0x7fffffffa2a8) at main.c:556
#13 0x00000000004013a9 in _start ()

With the same tool and an Intel 20 compiled dgemm there is no abort and we get the expected results. While we were debugging the issue we also thought it was related to the OpenMP runtime.

@jgalarowicz
Copy link

jgalarowicz commented Aug 21, 2020

@mplegendre Our abort (above) only occurs when the MKL libraries are included in the executable. We are still trying to use gotcha but with older Intel compilers it aborts. Do you have any suggestions on how to debug? Are there flags to turn on to help with debugging?

100	
101	static void* free_wrapper(void* ptr)
102	{
103	    typeof(&free_wrapper) orig_free = gotcha_get_wrappee(orig_free_handle);
104	    void* retval;
(gdb) 
105	    memt_event event;
106	
107	    bool dotrace = collector_do_trace();
108	
109	    if (ptr == NULL) dotrace = false;
110	
111	    if (dotrace) {
112	        TLS_start_mem_event(&event);
113	        event.start_time = GetTime();
114	        event.mem_type = MEM_FREE;
115	    }
116	    retval =  orig_free(ptr);

(gdb) p (void*)ptr
$2 = (void *) 0x637550
(gdb) p *(void*)ptr
$3 = 0
(gdb) up

Looks like the pointer gotcha is going to free points to 0.

Thanks
Jim G

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants