running destructors before completion of a primitive #1814

rscohn2 · 2024-02-29T19:33:13Z

I want to launch a primitive in a function and return immediately. I am getting a segv that looks like a memory corruption. If I wait for completion before returning, it works. I think the problem is that destructors cannot run before completion. In the code below what instances cannot be destroyed before the primitive completes?

struct onednn_engine {
  onednn_engine(dnnl::engine::kind engine_kind) :
    engine_(engine_kind, 0),
    stream_(engine_)
  {}

  void softmax(const memory::dims &dims, const int axis, float *data) {
    auto src_md = memory::desc(dims, dt::f32, tag::nc);
    auto dst_md = memory::desc(dims, dt::f32, tag::nc);
    auto src_mem = memory(src_md, engine_, data);

    // This can be expensive, but dnn will cache it                                                                                      
    auto softmax_pd = softmax_forward::primitive_desc(engine_,
            prop_kind::forward_training, algorithm::softmax_accurate, src_md,
            dst_md, axis);

    auto softmax_prim = softmax_forward(softmax_pd);

    // Set up in-place execution by assigning src as DST.                                                                                
    std::unordered_map<int, memory> softmax_args;
    softmax_args.insert({DNNL_ARG_SRC, src_mem});
    softmax_args.insert({DNNL_ARG_DST, src_mem});

    softmax_prim.execute(stream_, softmax_args);
    // Fails with segv if we do not wait                                                                                                
    stream_.wait();
  }

  void wait() {
    stream_.wait();
  }

  dnnl::engine engine_;
  dnnl::stream stream_;
};

The text was updated successfully, but these errors were encountered:

dzarukin · 2024-02-29T20:03:29Z

Hi @rscohn2, thanks for bringing this one.
Could you, please, supply ONEDNN_VERBOSE=all output with an example?

rscohn2 · 2024-02-29T20:20:39Z

It does not fail with logging:

(cs149) rscohn1@gkdse-fcp-19:examples$ ./primitives-gpt149-softmax-cpp
Segmentation fault (core dumped)
(cs149) rscohn1@gkdse-fcp-19:examples$ ONEDNN_VERBOSE=all ./primitives-gpt149-softmax-cpp
onednn_verbose,info,oneDNN v3.3.0 (commit a96e9b1a6769171e74b0b8e031489303438906e5)
onednn_verbose,info,cpu,runtime:DPC++,nthr:128
onednn_verbose,info,cpu,isa:Intel AVX-512 with float16, Intel DL Boost and bfloat16 support and Intel AMX with bfloat16 and 8-bit in\teger support
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,cpu,engine,0,backend:OpenCL,name:Intel(R) Xeon(R) Gold 6438M,driver_version:2023.16.12,binary_kernels:disabled
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,p\roblem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,ba\ckend,exec_time
onednn_verbose,primitive,create:cache_miss,cpu,softmax,jit:avx512_core_fp16,forward_training,src_f32::blocked:ab::f0 dst_f32::blocke\d:ab::f0 diff_dst_undef::undef:::,,alg:softmax_accurate axis:1,3x1000,0.176025
onednn_verbose,primitive,exec,cpu,softmax,jit:avx512_core_fp16,forward_training,src_f32::blocked:ab::f0 dst_f32::blocked:ab::f0 diff\_dst_undef::undef:::,,alg:softmax_accurate axis:1,3x1000,0.677979
Example passed on CPU.
(cs149) rscohn1@gkdse-fcp-19:examples$

It fails in host_ptr:

Thread 129 "primitives-gpt1" received signal SIGSEGV, Segmentation fault.
[Switching to thread 129 (Thread 0x7ffdbbfff640 (LWP 2215990))]
0x00007ffff374bff0 in dnnl::impl::exec_ctx_t::host_ptr(dnnl::impl::memory_storage_t const*) const ()
   from /localdisk/opt/intel/oneapi/dnnl/2024.0/lib/libdnnl.so.3
(gdb)x/i $pc
=> 0x7ffff374bff0 <_ZNK4dnnl4impl10exec_ctx_t8host_ptrEPKNS0_16memory_storage_tE+32>:   mov    (%rsi),%rax
(gdb) x/i $rsi
   0x5550035effaa:      Cannot access memory at address 0x5550035effaa
(gdb) p/x $rsi
$1 = 0x5550035effaa
(gdb)

Source files are in the attached zip
gpt149.zip

densamoilov · 2024-02-29T20:36:40Z

@rscohn2

In the code below what instances cannot be destroyed before the primitive completes?

For the engine kind CPU and CPU runtime SYCL the memory objects are currently required to stay alive until the primitive is complete.

densamoilov · 2024-02-29T20:43:15Z

@rscohn2,

I want to launch a primitive in a function and return immediately

The non-blocking execution model is typically used and beneficial for GPU, is there any reason why you want to use it for CPU?

rscohn2 · 2024-02-29T20:53:07Z

I was looking at making the CPU async to overlap framework time with kernel time. This was a suggestion from @ddkalamk. I am trying to do a prototype to see if it helps.

I guess there is some reference counting for the memory object? I am passing in a copy in the softmax_args The memory object constructor takes a pointer, and does not own the data. It is no problem to keep a memory object live, I am just curious why I can't destruct the original after passing it by value.

rscohn2 · 2024-02-29T21:03:56Z

Storing the memory objects associated with pending operations in a std::vector and clearing the vector after the wait() fixed the problem. Thanks!

densamoilov · 2024-02-29T21:21:17Z

@rscohn2,

I guess there is some reference counting for the memory object?

Actually, there isn't but there should be. For the GPU kind we don't need it because the memory objects themselves are not used by the GPU kernels so the user only needs to keep the underlying buffer alive if it's USM and doesn't have to do anything if it's SYCL buffer. For the CPU kind, we re-use the classic CPU kernels written in C++ or JIT Assembly, which additionally requires the memory objects to stay alive. We plan to add the reference count mechanism in the future though the interest in SYCL CPU is quite low and therefore the task is not a high priority at this point.

rscohn2 added the question label Feb 29, 2024

vpirogov assigned densamoilov Feb 29, 2024

vpirogov added enhancement A feature or an optimization request and removed question labels Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running destructors before completion of a primitive #1814

running destructors before completion of a primitive #1814

rscohn2 commented Feb 29, 2024

dzarukin commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

densamoilov commented Feb 29, 2024

densamoilov commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

densamoilov commented Feb 29, 2024

running destructors before completion of a primitive #1814

running destructors before completion of a primitive #1814

Comments

rscohn2 commented Feb 29, 2024

dzarukin commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

densamoilov commented Feb 29, 2024

densamoilov commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

rscohn2 commented Feb 29, 2024

densamoilov commented Feb 29, 2024