Minutes_2020_08_11
Valentin Haenel edited this page Aug 11, 2020
·
1 revision
Attendees: Siu, Sergey M, Andreas, Dipto D, Ehsan, Eric W, Frank S, Graham M, Hameer A, Hannes P, Ivan B, Juan G, Luk F-A, Matti P, Nick R, Pearu P, Sahil, Sergey P, Todd A, Val, Stu, Alex, Mike W, Mathieu, Reazul H, Alexander, Keith K.
- Slides: https://drive.google.com/file/d/1YUFRy8FqLLBajqbTUuEXzYIYwXU67BFL/view?usp=sharing
- PyDPPL - wrapper library around SYCL and OpenCL (primarily SYCL)
- Predominant idea is to act as an interop-layer across the Python stack, e.g. NumPy, DAAL, Numba
- All libraries can end up sharing the same buffers, USM, queues
- Q: Hameer:
with
statements are notoriously hard for thread-safety- A: Dipto: Won't inherit queue/resources across threads.
- Q: Eric: with statements need to be TLS and context local too (there's a PEP 567) (Filed https://github.com/IntelPython/pydppl/issues/11)
- A: Dipto: Will check it out.
- Q: Ehsan: Numba function will run on accelerator
- A: Dipto: Will be answered in the next few slides
- Q: Ehsan: Will passes be modular?
- A: Dipto: At present, separate pass.
- Q: Ehsan: possibility of adding option like
dppl=True/gpu
to the@jit
context?- A: Dipto: DIY wrapper should be easy to create, it will contain the context.
- Q: Ehsan: concerns over (lack of) fine grained control and integrating it in an existing compiler pipeline.
- A: Dipto: Bit like
parallel=True
, reliance on the optimiser to do the right thing.
- A: Dipto: Bit like
- Q: Mike W: how to arrange computations on the GPU esp with respect to e.g. branch divergence.
- A: Dipto: At present not so much fine grained control. Bulk of efforts once work finalised will go into this. Reliance on driver/compiler to do this (igc).
- Q: Hameer: User can put with statement at top of program, but library author could put
parallel=False
in, which should win? Hameer thinks library should win over user control.- A: Dipto: Useful example for consideration, please add to discourse discussion.
- Q: Hameer:
with
statement makes it quite hard to mix CPU and GPU code, have you thought about that?- A: Dipto: kernels are synchronous at present, have to undo that to permit mixing contexts.
- Q: Hameer: Will this be allowed inside a
@njit
'ed function?- A: Dipto: Initially not permitted. Harder example... with context inside a
prange
loop!? To start with just ban it. May change once more understanding is developed about scenarios. - A: Todd A: Multi-GPU case is another reason why the with-context is a good idea. e.g. CPU prange dispatch to a number of GPUs
- A: Dipto: Needs more design time on Intel's side. Please post usecases on discourse!
- A: Hameer: Main case was once inside a function hard to switch contexts.
- A: Todd: there's cases where both make sense/don't make sense, more work needed.
- A: Dipto: Initially not permitted. Harder example... with context inside a
- Q: Siu: If we want to try Numba with DPPL what sort of hardware/OS combination is the best option.
- A: Dipto: Works only on linux right now, windows on it's way (week or so!). Gen 9 Intel GPUs (integrated graphics) should be fine on latest CPUs. Also dependency on OneAPI beta 8 being installed. Intel Python has this as part of it's stack. OpenCL CPU driver also required.
- Q: Hameer: Will the code work on other GPUs or is it just Intel hardware?
- A: Dipto: Should be platform agnostic but starting with Intel hardware. CUDA support in DPC++ as one option or add CUDA support to PyDPPL.
- Q: Hameer: Could someone else write a SYCL compiler and use that in the same infrastructure?
- A: Dipto: TBD. USM is a DPC++ extension for example. Extension support level will somewhat determine this (largely USM at present). LLVM SYCL compiler is a long term ideal.
-
0.51.0rc1
-
New bug labels:
- ice, miscompile, incorrect behavior, segfault, failure to compile
- #6104 - Numba can't properly match ListType of Arrays in function signature
- #6102 - parallel function compiles with v0.50.1 and fails with v0.51.0rc1
- #6100 - Incorrect results on skylake with AVX512 and icc_rt=2018.0.2
- #6095 - numpy max for arrays of several dimensions not implemented for parallelized code
- **** #6094 - Numba 0.51: avoid subclassing NamedTuple in LiteralStrKeyDict
-
#6093 - Invalid cache replay from function defined in closure capturing another function
- patch: 6097
-
#6091 - Applying numba.typed.List to a nested Python list doesn't result in a nested typed-list
- need to stop segfault?
- the full fix again leads to the reflected x typed list problem
-
#6088 - NameError from _unlit_non_poison
- should unban flake8 on it to prevent future mistake
- #6085 - LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
-
#6077 - StructRef initialization SIGABRT while using types.deferred_type
- probably a problem of ordering of LLVM API
- #6103 - Unable to pass a local memory array to a device function?
- #6105 - What's the input of Numba
- #6098 - A problem has occurred in Numba's internals
- #6082 - list of list (nested build_list) fails on master
- #6079 - numba cuda how to lock code block
- #6076 - Shared memory persists between kernel launches
- #6069 - Test failure with Numba master and Numpy 1.19
- #6101 - Restrict lower limit of icc_rt version due to assumed SVML bug.
- #6099 - Restrict upper limit of TBB version due to ABI changes.
- #6097 - Add function code and closure bytes into cache key
- #6096 - [WIP] remove deprecated tbb::task_scheduler_init, use new api
- #6092 - CUDA: Add mapped_array_like and pinned_array_like
- #6090 - doc: Add doc on direct creation of Numba typed-list
- #6089 - Fix invalid reference to TypingError
- #6087 - remove invalid sanity check from randrange tests
- #6086 - Add more accessible version information
- #6075 - add np.float_power and np.cbrt
- #6074 - Add support for math.isclose() and numpy.isclose()
- #6084 - Update CHANGE_LOG for 0.51.0
- #6083 - Fix bug in initial value unify.
- #6081 - Fix issue with cross drive use and relpath.
- #6080 - CUDA: Prevent auto-upgrade of atomic intrinsics
- #6078 - Duplicate NumPy's PyArray_DescrCheck macro
- #6073 - Fixes invalid C prototype in helper function.
- #6072 - Fix for #6005
- #6071 - Remove f-strings in setup.py
- #6070 - Fix overspecialized containers
- #6068 - Add unliteral to despecialize containers with initial_value
-
Requests for 0.51
-
high risk stuff for 0.51.
-
0.51 potential tasks (To be updated)