A General-purpose Parallel and Heterogeneous Task Programming System
-
Updated
May 7, 2024 - C++
A General-purpose Parallel and Heterogeneous Task Programming System
Sample codes for my CUDA programming book
CUDA C++ Core Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
TinyChatEngine: On-Device LLM Inference Library
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Safe rust wrapper around CUDA toolkit
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
A self-learning tutorail for CUDA High Performance Programing.
Install CUDA on Windows11 using WSL2
Speed up image preprocess with cuda when handle image or tensorrt inference
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."