Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/verbs,util/memhooks: Deadlock on mm_lock when registering memory #9594

Open
sydidelot opened this issue Nov 17, 2023 · 2 comments
Open

Comments

@sydidelot
Copy link
Member

Describe the bug
I have experienced a deadlock when the verbs profiler is registering memory (MR cache and memhooks are enabled). The application is compiled with jemalloc (see all the je_* symbols in the below backtrace) but I don't think jemalloc is the culprit here.

Here is the backtrace:

[... truncated frames as irrelevant for the issue... ]
#13 0x00007ffff69792c0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x00007ffff6980002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff74bf4c3 in ofi_intercept_handler (addr=0x7ffeb7a30000, len=458752) at prov/util/src/util_mem_hooks.c:423
#16 0x00007ffff74bf6cb in ofi_intercept_madvise (addr=0x7ffeb7a30000, length=458752, advice=4) at prov/util/src/util_mem_hooks.c:475
#17 0x00007ffff772624c in je_pages_purge_forced (addr=addr@entry=0x7ffeb7a30000, size=<optimized out>) at src/pages.c:476
#18 0x00007ffff77134c2 in je_ehooks_default_purge_forced_impl (addr=addr@entry=0x7ffeb7a30000, offset=offset@entry=0, length=<optimized out>) at src/ehooks.c:146
#19 0x00007ffff7719312 in ehooks_purge_forced (length=<optimized out>, offset=0, size=<optimized out>, addr=0x7ffeb7a30000, ehooks=0x7fff56a00480, tsdn=0x7fff5b1f5848) at include/jemalloc/internal/ehooks.h:318
#20 je_extent_dalloc_wrapper (tsdn=0x7fff5b1f5848, pac=0x7fff56a03ab0, ehooks=0x7fff56a00480, edata=0x7fff56a1c880) at src/extent.c:1060
#21 0x00007ffff772524d in pac_decay_stashed (decay=0x7fff56a10d98, decay_extents=<synthetic pointer>, fully_decay=<optimized out>, ecache=0x7fff56a03ae8, decay_stats=0x7fff56a01478, pac=0x7fff56a03ab0, tsdn=0x7fff5b1f5848) at src/pac.c:402
#22 pac_decay_to_limit (tsdn=tsdn@entry=0x7fff5b1f5848, pac=pac@entry=0x7fff56a03ab0, decay=decay@entry=0x7fff56a10d98, decay_stats=decay_stats@entry=0x7fff56a01478, ecache=ecache@entry=0x7fff56a03ae8, fully_decay=fully_decay@entry=false, npages_limit=<optimized out>, npages_decay_max=<optimized out>) at src/pac.c:452
#23 0x00007ffff77259cb in pac_decay_to_limit (npages_decay_max=<optimized out>, npages_limit=<optimized out>, fully_decay=<optimized out>, ecache=<optimized out>, decay_stats=<optimized out>, decay=<optimized out>, pac=<optimized out>, tsdn=<optimized out>) at src/pac.c:474
#24 pac_decay_try_purge (npages_limit=<optimized out>, current_npages=<optimized out>, ecache=<optimized out>, decay_stats=<optimized out>, decay=<optimized out>, pac=<optimized out>, tsdn=<optimized out>) at src/pac.c:474
#25 je_pac_maybe_decay_purge (eagerness=<optimized out>, ecache=<optimized out>, decay_stats=<optimized out>, decay=<optimized out>, pac=<optimized out>, tsdn=<optimized out>) at src/pac.c:512
#26 je_pac_maybe_decay_purge (tsdn=tsdn@entry=0x7fff5b1f5848, pac=pac@entry=0x7fff56a03ab0, decay=decay@entry=0x7fff56a10d98, decay_stats=0x7fff56a01478, ecache=0x7fff56a03ae8, eagerness=<optimized out>) at src/pac.c:481
#27 0x00007ffff76c6690 in arena_decay_impl (all=false, is_background_thread=false, ecache=0x7fff56a03ae8, decay_stats=<optimized out>, decay=0x7fff56a10d98, arena=0x7fff56a01400, tsdn=0x7fff5b1f5848) at src/arena.c:439
#28 arena_decay_dirty (all=false, is_background_thread=false, arena=0x7fff56a01400, tsdn=0x7fff5b1f5848) at src/arena.c:459
#29 je_arena_decay (tsdn=0x7fff5b1f5848, arena=0x7fff56a01400, is_background_thread=false, all=<optimized out>) at src/arena.c:485
#30 0x00007ffff773c46c in je_tcache_alloc_small_hard (tsdn=tsdn@entry=0x7fff5b1f5848, arena=arena@entry=0x7fff56a01400, tcache=tcache@entry=0x7fff5b1f5be0, cache_bin=cache_bin@entry=0x7fff5b1f5c60, binind=binind@entry=5, tcache_success=tcache_success@entry=0x7fff5b1f4500) at src/tcache.c:238
#31 0x00007ffff76b15ff in tcache_alloc_small (slow_path=<optimized out>, zero=false, binind=5, size=<optimized out>, tcache=0x7fff5b1f5be0, arena=0x7fff56a01400, tsd=0x7fff5b1f5848) at include/jemalloc/internal/tcache_inlines.h:68
#32 arena_malloc (slow_path=<optimized out>, tcache=0x7fff5b1f5be0, zero=false, ind=5, size=<optimized out>, arena=0x0, tsdn=0x7fff5b1f5848) at include/jemalloc/internal/arena_inlines_b.h:151
#33 iallocztm (slow_path=<optimized out>, arena=0x0, is_internal=false, tcache=0x7fff5b1f5be0, zero=false, ind=5, size=<optimized out>, tsdn=0x7fff5b1f5848) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:55
#34 imalloc_no_sample (ind=5, usize=80, size=<optimized out>, tsd=0x7fff5b1f5848, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2398
#35 imalloc_body (tsd=0x7fff5b1f5848, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2573
#36 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2687
#37 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2722
#38 0x00007ffff7262e9d in malloc (usr_size=usr_size@entry=40) at src/util/red_malloc_hooks.cpp:327
#39 0x00007ffff74f0b79 in ofi_rbnode_alloc (map=0x7ffee8ed0250) at src/tree.c:59
#40 ofi_rbmap_insert (map=map@entry=0x7ffee8ed0250, key=0x7fff34d647a0, data=0x7fff34d647a0, ret_node=0x7fff34d64808) at src/tree.c:242
#41 0x00007ffff74c0589 in util_mr_cache_create (entry=0x7fff5b1f46e8, info=0x7fff5b1f4750, cache=0x7ffee8ed01b0) at prov/util/src/util_mr_cache.c:292
#42 ofi_mr_cache_search (cache=cache@entry=0x7ffee8ed01b0, info=info@entry=0x7fff5b1f4750, entry=entry@entry=0x7fff5b1f46e8) at prov/util/src/util_mr_cache.c:361
#43 0x00007ffff74c7238 in vrb_mr_cache_reg (device=0, iface=FI_HMEM_SYSTEM, context=0x0, mr=0x7fff5b1f4838, flags=0, requested_key=<optimized out>, offset=<optimized out>, access=16128, len=344064, buf=0x7ffeb79cb8a0, domain=0x7ffee8ed0020) at prov/verbs/src/verbs_mr.c:303
#44 vrb_mr_reg_iface (device=0, iface=FI_HMEM_SYSTEM, context=0x0, mr=0x7fff5b1f4838, flags=0, requested_key=<optimized out>, offset=<optimized out>, access=16128, len=344064, buf=0x7ffeb79cb8a0, fid=0x7ffee8ed0020) at prov/verbs/src/verbs_mr.c:324
#45 vrb_mr_reg (fid=0x7ffee8ed0020, buf=0x7ffeb79cb8a0, len=344064, access=16128, offset=<optimized out>, requested_key=<optimized out>, flags=0, mr=0x7fff5b1f4838, context=0x0) at prov/verbs/src/verbs_mr.c:354
[... truncated frames as irrelevant for the issue... ]

On GDB frame 15, the thread waits on the mutex mm_lock in prov/util/src/util_mem_hooks.c:423:

420 void ofi_intercept_handler(const void *addr, size_t len)                 
421 {                                                                        
422     pthread_rwlock_rdlock(&mm_list_rwlock);                              
423     pthread_mutex_lock(&mm_lock);                                        
424     ofi_monitor_notify(memhooks_monitor, addr, len);                     
425     pthread_mutex_unlock(&mm_lock);                                      
426     pthread_rwlock_unlock(&mm_list_rwlock);                              
427 }                                                                        
428                                                                          

mm_lock was already acquired by the same thread in GDB frame 41 in util_mr_cache_create()

To Reproduce
The bug reproduced only a few times with our application (which is closed-source). I don't have a standalone reproducer.

Expected behavior
I don't want my application to deadlock when MR cache is enabled.

Output
See the backtrace above.

Environment:
Ubuntu Linux, Libfabric version v1.18.1.
I haven't tested latest main branch, but I see no change to the MR cache that could fix the issue.

Additional context
The workaround I found is to disable the MR cache with FI_MR_CACHE_MONITOR=disabled

@sydidelot sydidelot added the bug label Nov 17, 2023
@sydidelot
Copy link
Member Author

The easiest fix I see is to make the lock recursive but I'm not sure about the performance penalty that might cause.
FYI, I don't plan to fix the issue as I don't need this MR cache.
@iziemba and @shijin-aws, I see that you contributed to this MR cache. Maybe one of you might be interested in fixing the issue?

@j-xiong
Copy link
Contributor

j-xiong commented Mar 27, 2024

This could be similar to #9003.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants