`@cfunction`s precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

vchuravy · 2022-01-10T22:05:58Z

@ericphanson is observing segmentation faults when using a system image with CUDA.jl (x-ref: JuliaGPU/CUDA.jl#1314). The TL;DR is that CUDA.jl uses :uv_async_send to trigger AsyncConditions from a foreign thread.

async_send(handle::Ptr{Cvoid}) = ccall(:uv_async_send, Cvoid, (Ptr{Cvoid},), handle).

If we disassemble the code we normally run, when executing without a custom systemimage:

Dump of assembler code for function jlcapi_async_send_15:
   0x00007fff9c9bc9c0 <+0>:     push   %r14
   0x00007fff9c9bc9c2 <+2>:     push   %rbx
   0x00007fff9c9bc9c3 <+3>:     sub    $0x8,%rsp
   0x00007fff9c9bc9c7 <+7>:     movabs $0x7ffff761dfc0,%rdx
   0x00007fff9c9bc9d1 <+17>:    movabs $0x7fffee5cecc8,%rsi
   0x00007fff9c9bc9db <+27>:    mov    %fs:0x0,%rax
   0x00007fff9c9bc9e4 <+36>:    mov    -0x8(%rax),%rax
   0x00007fff9c9bc9e8 <+40>:    mov    %rsp,%rbx
   0x00007fff9c9bc9eb <+43>:    mov    (%rsi),%rcx
   0x00007fff9c9bc9ee <+46>:    mov    (%rdx),%rdx
   0x00007fff9c9bc9f1 <+49>:    lea    0x8(%rax),%r8
   0x00007fff9c9bc9f5 <+53>:    cmp    %rdx,%rcx
   0x00007fff9c9bc9f8 <+56>:    mov    %rcx,%rsi
   0x00007fff9c9bc9fb <+59>:    cmovae %rdx,%rsi
   0x00007fff9c9bc9ff <+63>:    test   %rax,%rax
   0x00007fff9c9bca02 <+66>:    movabs $0x7fff9c9bca40,%rax
   0x00007fff9c9bca0c <+76>:    cmovne %r8,%rbx
   0x00007fff9c9bca10 <+80>:    movabs $0x7fff9c9bc900,%r8
   0x00007fff9c9bca1a <+90>:    cmovne %rdx,%rsi
   0x00007fff9c9bca1e <+94>:    mov    (%rbx),%r14
   0x00007fff9c9bca21 <+97>:    cmove  %r8,%rax
   0x00007fff9c9bca25 <+101>:   cmp    %rdx,%rcx
   0x00007fff9c9bca28 <+104>:   mov    %rsi,(%rbx)
   0x00007fff9c9bca2b <+107>:   cmovae %r8,%rax
   0x00007fff9c9bca2f <+111>:   call   *%rax
   0x00007fff9c9bca31 <+113>:   mov    %r14,(%rbx)
   0x00007fff9c9bca34 <+116>:   add    $0x8,%rsp
   0x00007fff9c9bca38 <+120>:   pop    %rbx
   0x00007fff9c9bca39 <+121>:   pop    %r14
   0x00007fff9c9bca3b <+123>:   ret    
End of assembler dump.

On the other hand having a custom sysimage:

Dump of assembler code for function jlcapi_async_send_48533:
   0x00007fffe4209300 <+0>:     push   %r15
   0x00007fffe4209302 <+2>:     push   %r14
   0x00007fffe4209304 <+4>:     push   %r13
   0x00007fffe4209306 <+6>:     push   %r12
   0x00007fffe4209308 <+8>:     push   %rbx
   0x00007fffe4209309 <+9>:     sub    $0x20,%rsp
   0x00007fffe420930d <+13>:    mov    0x85b9bfc(%rip),%rax        # 0x7fffec7c2f10 <jl_tls_offset.real>
   0x00007fffe4209314 <+20>:    vxorps %xmm0,%xmm0,%xmm0
   0x00007fffe4209318 <+24>:    mov    %rdi,%r14
   0x00007fffe420931b <+27>:    movq   $0x0,0x10(%rsp)
   0x00007fffe4209324 <+36>:    vmovaps %xmm0,(%rsp)
   0x00007fffe4209329 <+41>:    test   %rax,%rax
   0x00007fffe420932c <+44>:    je     0x7fffe42093db <jlcapi_async_send_48533+219>
   0x00007fffe4209332 <+50>:    mov    %fs:0x0,%rcx
   0x00007fffe420933b <+59>:    mov    (%rcx,%rax,1),%rbx
   0x00007fffe420933f <+63>:    movq   $0x4,(%rsp)
   0x00007fffe4209347 <+71>:    mov    0x136c62(%rip),%rcx        # 0x7fffe433ffb0
   0x00007fffe420934e <+78>:    mov    %rsp,%rdi
   0x00007fffe4209351 <+81>:    mov    $0x570,%esi
   0x00007fffe4209356 <+86>:    mov    $0x10,%edx
   0x00007fffe420935b <+91>:    mov    (%rbx),%rax
   0x00007fffe420935e <+94>:    mov    %rax,0x8(%rsp)
   0x00007fffe4209363 <+99>:    mov    %rdi,(%rbx)
   0x00007fffe4209366 <+102>:   mov    (%rcx),%rax
   0x00007fffe4209369 <+105>:   mov    0x8(%rbx),%r12
   0x00007fffe420936d <+109>:   mov    0x10(%rbx),%rdi
   0x00007fffe4209371 <+113>:   mov    %rax,0x8(%rbx)
   0x00007fffe4209375 <+117>:   mov    0x85b8b54(%rip),%r15        # 0x7fffec7c1ed0 <jl_globalYY.10208>
   0x00007fffe420937c <+124>:   mov    0x859ffc5(%rip),%r13        # 0x7fffec7a9348 <SUM.CoreDOT.Ptr1114>
   0x00007fffe4209383 <+131>:   call   0x7fffe3ac1190 <jl_gc_pool_alloc@plt>
   0x00007fffe4209388 <+136>:   lea    0x18(%rsp),%rsi
   0x00007fffe420938d <+141>:   mov    %r15,%rdi
   0x00007fffe4209390 <+144>:   mov    $0x1,%edx
   0x00007fffe4209395 <+149>:   mov    %r13,-0x8(%rax)
   0x00007fffe4209399 <+153>:   mov    %r14,(%rax)
   0x00007fffe420939c <+156>:   mov    %rax,0x10(%rsp)
   0x00007fffe42093a1 <+161>:   mov    %rax,0x18(%rsp)
   0x00007fffe42093a6 <+166>:   call   0x7fffe3ac1200 <jl_apply_generic@plt>
   0x00007fffe42093ab <+171>:   mov    -0x8(%rax),%rcx
   0x00007fffe42093af <+175>:   mov    0x85a1d12(%rip),%rsi        # 0x7fffec7ab0c8 <SUM.CoreDOT.Int32677>
   0x00007fffe42093b6 <+182>:   and    $0xfffffffffffffff0,%rcx
   0x00007fffe42093ba <+186>:   cmp    %rsi,%rcx
   0x00007fffe42093bd <+189>:   jne    0x7fffe42093e9 <jlcapi_async_send_48533+233>
   0x00007fffe42093bf <+191>:   mov    (%rax),%eax
   0x00007fffe42093c1 <+193>:   mov    %r12,0x8(%rbx)
   0x00007fffe42093c5 <+197>:   mov    0x8(%rsp),%rcx
   0x00007fffe42093ca <+202>:   mov    %rcx,(%rbx)
   0x00007fffe42093cd <+205>:   add    $0x20,%rsp
   0x00007fffe42093d1 <+209>:   pop    %rbx
   0x00007fffe42093d2 <+210>:   pop    %r12
   0x00007fffe42093d4 <+212>:   pop    %r13
   0x00007fffe42093d6 <+214>:   pop    %r14
   0x00007fffe42093d8 <+216>:   pop    %r15
   0x00007fffe42093da <+218>:   ret    
   0x00007fffe42093db <+219>:   call   *0x85b9b1f(%rip)        # 0x7fffec7c2f00 <jl_pgcstack_func_slot.real>
   0x00007fffe42093e1 <+225>:   mov    %rax,%rbx
   0x00007fffe42093e4 <+228>:   jmp    0x7fffe420933f <jlcapi_async_send_48533+63>
   0x00007fffe42093e9 <+233>:   lea    0x32924(%rip),%rdi        # 0x7fffe423bd14 <_j_str175>
   0x00007fffe42093f0 <+240>:   mov    %rax,%rdx
   0x00007fffe42093f3 <+243>:   call   0x7fffe3ac1070 <jl_type_error@plt>

The pointer to disassemble was obtained by using Reproducer.launch()

With:

module Reproducer

async_send(data::Ptr{Cvoid}) = ccall(:uv_async_send, Cint, (Ptr{Cvoid},), data) 

function launch()
    callback = @cfunction(async_send, Cint, (Ptr{Cvoid},))
    return callback
end

end # module

and a precompile.jl like:

using Reproducer

Reproducer.launch()

This is funnily related to #43747, since I wanted to have errors instead of mysterious segmentation faults.

cc: @KristofferC (although I don't think there is something we can do in PackageCompiler.jl), @maleadt

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2022-01-11T20:04:13Z

I think in this case cfunction is generating the wrong (very inefficient) code. But in general calling julia code from a foreign thread will not work. Luckily in this case one can call uv_async_send directly.

JeffBezanson · 2022-01-11T20:10:20Z

Related: #35252 #36977 #17573

vchuravy · 2022-01-11T20:14:19Z

But in general calling julia code from a foreign thread will not work.

Right. One has to craft very careful code. I have some more complicated examples if you want xD.

The goal of #43747 was to be able to have some guarantees over such code. What cropped up in #41616 was that in in Base in the presence of @threadcall relies on this.

vtjnash · 2022-10-25T14:41:44Z

now works

vchuravy added the compiler:codegen Generation of LLVM IR and native code label Jan 10, 2022

vchuravy mentioned this issue Jan 11, 2022

Add Expr(:funcinfo, ... and unsafe verifier #43747

Closed

maleadt mentioned this issue Jan 12, 2022

Segfault during CUBLAS logging JuliaGPU/CUDA.jl#1062

Open

vtjnash closed this as completed Oct 25, 2022

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`@cfunction`s precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

`@cfunction`s precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

vchuravy commented Jan 10, 2022

JeffBezanson commented Jan 11, 2022

JeffBezanson commented Jan 11, 2022

vchuravy commented Jan 11, 2022

This comment was marked as off-topic.

vtjnash commented Oct 25, 2022

@cfunctions precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

@cfunctions precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

Comments

vchuravy commented Jan 10, 2022

JeffBezanson commented Jan 11, 2022

JeffBezanson commented Jan 11, 2022

vchuravy commented Jan 11, 2022

This comment was marked as off-topic.

vtjnash commented Oct 25, 2022

`@cfunction`s precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

`@cfunction`s precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748