Kernel Tuner
-
Updated
May 27, 2024 - Python
Kernel Tuner
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
CUDA C++ Core Libraries
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
Safe rust wrapper around CUDA toolkit
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
CUDA Kernel Benchmarking Library
Implement Neural Networks in Cuda from Scratch
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
C++ cross-platform gpu SDK
This repos contains a personnal project created within one week. It can generate fractals pictures based on a Julia Set, and explore such a fractal in real time (zoom in and out, go left, right, up and down)
it is our solutions to the parallel computing labs taught in college 🧮🖥️
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."