Pull requests: NVIDIA/cutlass
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: support kFactor 8 used in mma tensor op tile iterator
#1512
opened Apr 29, 2024 by
gavinchen430
Loading…
Fix C++17 version detection in helper_macros.hpp
#1479
opened Apr 12, 2024 by
nickjeliopoulos
Loading…
Add Faster Neighborhood Attention to PUBLICATIONS
#1471
opened Apr 11, 2024 by
alihassanijr
Loading…
Add missing #include <memory> for definition of std::addressof.
#1470
opened Apr 10, 2024 by
Gregory-Meyer
Loading…
Refactor to use FastDivmod for predicated strided dgrad iterators.
#1453
opened Apr 3, 2024 by
ZelboK
Loading…
add a new epilogue for the case that the output is not packed
inactive-30d
#1437
opened Mar 28, 2024 by
hwu36
Loading…
Add support for mixed 4-bit/8-bit data types GEMM
#1413
opened Mar 19, 2024 by
alexsamardzic
Loading…
Add couple configs into generator.py for mixed input MM
inactive-30d
#1350
opened Feb 16, 2024 by
alexsamardzic
Loading…
Add support for dynamic offsets to DefaultEpilogue
inactive-30d
#1274
opened Dec 19, 2023 by
ezhulenev
Loading…
Add int4b_t/uint4b_t support for mixed dtypes GEMM
#1190
opened Nov 15, 2023 by
alexsamardzic
Loading…
Make runtime assert more clear on CUDA
inactive-30d
inactive-90d
#1128
opened Oct 6, 2023 by
sophiawisdom
Loading…
ProTip!
Follow long discussions with comments:>50.