Minutes_2022_06_07
esc edited this page Jun 8, 2022
·
1 revision
Attendees: Todd A. Anderson, Andre Masella, Enrico Guiraud, Graham Markall, Jim Pivarski, stuart, Vincenzo Eduardo Padulano, LI Da, Shannon Quinn, Siu Kwan Lam, brandon willard Benjamin Graham, Kaustubh Chaudhari, Nickholas Riasanovsky, Shannon Quinn, Luk, Guilherme Leobas
NOTE: All communication is subject to the Numba Code of Conduct.
Please refer to this calendar for the next meeting date.
- Numba 0.56.0
- Creating and posting meeting agenda earlier
- Siu: create meeting document and publishing it immediately after meeting and add topics throughput the week for folks to look at before the meeting
- Multi-thread, concurrent calls of some numba-jitted cfuncs limited in scaling by numba runtime
- About the multi-thread, concurrent call, an example function is:
def count_muons(ptr):
arr = numba.carray(ptr, 10)
return np.count_nonzero((arr > 1.) & (np.abs(arr) < 7.) & (arr > 0.))
The full reproducer (will try to come up with something that does not depend on ROOT):
import numba
import numpy as np
import ROOT
from time import time
def count_muons_loop(ptr):
arr = numba.carray(ptr, 10)
count = 0
for i in range(len(arr)):
if arr[i] > 1. and abs(arr[i]) < 7. and arr[i] > 0.:
count += 1
return count
def count_muons(ptr):
arr = numba.carray(ptr, 10)
return np.count_nonzero((arr > 1.) & (np.abs(arr) < 7.) & (arr > 0.))
if __name__ == "__main__":
loop_func = numba.cfunc(numba.int32(numba.types.CPointer(numba.float32)), nopython=True)(count_muons_loop)
numpy_func = numba.cfunc(numba.int32(numba.types.CPointer(numba.float32)), nopython=True)(count_muons)
ROOT.gInterpreter.Declare(f"""
auto *loopf = reinterpret_cast<int(*)(float*)>({loop_func.address});
auto *npf = reinterpret_cast<int(*)(float*)>({numpy_func.address});
""")
ROOT.gInterpreter.Calc("""
ROOT::TThreadExecutor t;
float arr[10]{};
std::vector<float*> args(1'000'000'000, arr);
TStopwatch s;
s.Start();
auto res = t.Map(loopf, args);
s.Stop();
s.Print();
s.Reset();
s.Start();
res = t.Map(npf, args);
s.Stop();
s.Print();
""")
And an example stacktrace of where threads get stuck:
#0 0x00007fffc8e11000 in nrt_atomic_add ()
#1 0x00007ffff6dc442d in NRT_MemInfo_destroy (mi=0x7fff900010e0) at numba/core/runtime/nrt.c:331
#2 NRT_MemInfo_call_dtor (mi=0x7fff900010e0) at numba/core/runtime/nrt.c:348
#3 0x00007fffc04e8566 in numba::np::arraymath::np_count_nonzero::_3clocals_3e::impl_244[abi:c8tJTIeFCjyCbUFRqqOAK_2f6h0phxApMogijRBAA_3d](Array<bool, 1, C, mutable, aligned>, omitted_28default_3dNone_29) ()
#4 0x00007fffc04e3b0c in __main__::count_muons_243[abi:c8tJTIeFCjyCbUFRqqOAK_2f6h0ogIRRJAjSSYRBFEjyYA](float32_2a) ()
#5 0x00007fffc04e3b82 in cfunc._ZN8__main__15count_muons_243B46c8tJTIeFCjyCbUFRqqOAK_2f6h0ogIRRJAjSSYRBFEjyYAE10float32_2a ()
#6 0x00007fffb064cb20 in ROOT::TThreadExecutor::MapImpl<int (*)(float*), float*, void>(int (*)(float*), std::vector<float*, std::allocator<float*> >&)::{lambda(unsigned int)#1}::operator()(unsigned int) const (this=0x55556051cfe0, i=<optimized out>) at /home/blue/ROOT/relwithdebinfo/cmake-build-foo/include/ROOT/TThreadExecutor.hxx:330
#7 0x00007fffc04fd043 in std::function<void (unsigned int)>::operator()(unsigned int) const (__args#0=<optimized out>, this=<optimized out>)
at /usr/include/c++/11.2.0/bits/std_function.h:560
-
-
_nrt=False
can be set to disable reference counting - future changes that are being considered:
- Remove unneeded atomic ops on internal stats of NRT
- Inline NRT as LLVM for more aggressive optimization
- turn off atomicity?
- flag to turn off atomicity seems doable
-
- AOT compilation + NJIT questions
- distutils deprecated. should it be replaced with setuptools?
- yes. but check how numba extends numpy's distutils
- use AOT compiled code in njit?
- no direct support but doable if go thru C callconv
- distutils deprecated. should it be replaced with setuptools?
- #8134 - Support non-constant values in exception.
@njit('void(string)', no_cfunc_wrapper=True)
def foo(a):
# raise IndexError(a + ' world', a, 'test', a, 3)
raise ValueError(a, IndexError)
- #8127 - No out of bounds check during advanced array indexing
-
#8128 -
typed.List
is not considered atypes.Sequence
while a reflected list is - #8131 - All slices of contiguous 2D+ arrays are assumed to be not contiguous (even when they would be)
- #8132 - Record not recognized as a data type
-
#8135 - Slow compilation of function taking numpy structured arrays with many fields
- Siu to produce a chrome trace profile for further discussion
llvmlite:
- #850 - Better error support for creating custom types
- #8130 - NumbaIRAssumptionWarning: variable '_i8_impl_v4_cur_2' is not in scope
- #8122 - WIP: support register_jitable-ed function as njit function argument
- #8123 - Fix CUDA print tests on Windows
- #8124 - Add explicit checks to all allocators in the NRT.
- #8125 - [DO NOT MERGE] Temp/pr8061
- #8126 - Mark gufuncs as having mutable inputs
- #8129 - FIXED :: No out of bounds check during advanced array indexing #8127
- #8133 - Fix #8132. Regression in Record.make_c_struct for handling nestedarray
- #8134 - Support non-constant exception values in JIT
- #8136 - Fix some C++ 11 Issues
- #8137 - CUDA: Fix #7806, Division by zero stops the kernel
llvmlite:
- #849 - added type hints
llvmlite:
- #851 - adding the llvm_11_consecutive_registers.patch