Skip to content
Stan Seibert edited this page Sep 25, 2018 · 1 revision

Numba Meeting: 2018-09-25

Attendees: Stan, Ehsan, Stuart, Siu, Todd

0. Feature Discussion

  • String discussion

    • Things we've learned about Python unicode so far
      • Get parts of unicode struct using PyUnicode_GET_LENGTH, PyUnicode_KIND,PyUnicode_DATA C macros. Need to wrap in helperlib.c function
      • Unicode strings built with old C API are not automatically in "canonical" form. Need to call PyUnicode_READY first to prepare them, which may do some memory allocation.
      • Create new unicode object with PyUnicode_FromKindAndData.
      • Once ready, data pointer can be used as a uint8, uint16 or uint32 array, depending on the value of kind.
    • Right Numba extension API to use?
      • Built in types use registries
      • Documented extension API does not
      • Can cause strange behavior when using @jit vs. compile_isolated in unit tests?
    • How to upgrade strings without breaking existing partial support that treats them as special constant type?
    • How to handle memory ownership?
      • Similar to array: Numba can have view on Python-managed string storage. If Numba creates a string, we have to managed it with NRT in case it is put into jitclass.
      • Different: As soon as we box the Numba string into a Python unicode object, Python copies the string into its memory, and the refcount on NRT version should be decremented. (no Python unicode views on NRT managed memory)
      • Better: Strings being immutable does make reasoning about all of this easier.
  • Release lessons learned:

    • Need to expand build farm. All test configurations take more than 24 hours to run on Windows. At least one more Windows GPU worker needed.
    • Need automated job to clear old build working directories
    • Checklist was very helpful!
    • Automatic build and test of development wheels?
      • Slowest manual part of release
      • Automated testing as during release cycle to detect issues like #3341
      • Where should we post dev wheels to? Too many to post to PyPI. Is Anaconda.org compatible with most recent wheel package? (YES! latest update fixed it)

1. New issues

  • 3341 - numba 0.40.0 installed via pip from PyPI: _dl_check_map_versions: Assertion `needed != NULL' failed!
    • Can fix tbb part by rebuilding wheel
    • auditwheel repair seems to be corrupting something causing the openmp segfault
    • Once we fix or work around this, tag patch release 0.40.1
    • Crashing numba -s
  • 3339 - np.where typing needs to handle scalar selection with array input
    • Bug report from gitter
  • 3336 - Enable automatic parallel execution in pre-compiled code
    • entangled with general problems with AOT and caching with parfors
    • Also highlights missing compiler flag options with AOT compiling.
  • 3334 - Copy to host broken for __cuda_array_interface__-derived strided DeviceNDArray.
    • PR already submitted
  • 3333 - Indexing broken for __cuda_array_interface__-derived DeviceNDArray.
    • PR already submitted
  • 3332 - Nested jit annotations bug
    • Want to mix object mode and nopython mode in unusual way.
  • 3331 - jit bug
    • Hitting reflection of nested list limitation
    • Another reminder that "reflection" is problematic
  • 3330 - Type Refactor
    • Placeholder for type refactoring brainstorming
  • 3328 - Avoid unnecessary compilation in type inference
    • Wishlist item from last meeting
    • Note @overload vs @jit performance difference. Is compiling the wrapper causing the nopython function body to be compiled twice?
    • TODO: check @overload vs @njit speed difference.
  • 3327 - CUDA 10 notes
    • current Numba seems to work with cudatoolkit 9 on new driver and with new cuda 10 toolkit directly
  • 3326 - Heap Allocation Error at parallel execution
    • list of list is very common
    • how do we parallelize this?
    • need legalization pass to raise errors when users do non-threadsafe things
  • 3323 - Initial string support
    • see discussion above

2. Open PRs

New

  • 3340 - Fix typo in error name.
    • easy fix
  • 3337 - [WIP] Support for np.ediff1d
    • in progress
  • 3335 - Fix memory management of __cuda_array_interface__ views.
    • Needs review
  • 3325 - [WIP] Add location information to exceptions.
    • Segfaults in strange ways on CI
    • Very useful, but need to fix
  • 3324 - Support for np.nancumsum and np.nancumprod
    • Needs review

Old

  • 3320 - Support for np.partition
  • 3307 - Adding NUMBA_ENABLE_PROFILING envvar, enabling jit event
  • 3228 - Reduce redundant module linking
    • Stuart will review
  • 3162 Support constant dtype string in nopython mode in functions like numpy.empty.
    • Need to resolve #3195
  • 3160 First attempt at parallel diagnostics
    • Stuart will implement Todd's suggestion
  • 3134 [WIP] Cfunc x86 abi
    • Needs re-review
  • 3124 Fix 3119, raise for 0d arrays in reductions
    • Stuart needs to implement feedback
  • 3046 Pairwise sum implementation.
  • #2999 Support LowLevelCallable
  • #2983 [WIP] invert mapping b/w binop operators and the operator module
  • #2950 Fix dispatcher to only consider contiguous-ness.
  • #2942 Fix linkage nature (declspec(dllexport)) of some test functions
  • #2894: [WIP] Implement jitclass default constructor arguments.
  • #2817: [WIP] Emit LLVM optimization remarks

===========================

4. Next Release: Version 0.41, RC=Nov 19, Final=Nov 26, 2018

  • Type refactoring
  • Parallel diagnostics
  • LLVM 7
  • Initial string support
  • Finishing off stalled PRs
  • Usual collection of bug fixes and small features
Clone this wiki locally