Supplementary Material for Lectures

The PMPP Book: Programming Massively Parallel Processors: A Hands-on Approach (Amazon link)

Lecture 1: Profiling and Integrating CUDA kernels in PyTorch

Video
Date: 2024-01-13, Speaker: Mark Saroufim
Notebook and slides in lecture_001 folder

Lecture 2: Recap Ch. 1-3 from the PMPP book

Video
Date: 2024-01-20, Speaker: Andreas Koepf
Slides: The powerpoint file lecture_002/cuda_mode_lecture2.pptx can be found in the root directory of this repository. Alternatively here as Google docs presentation.

Lecture 3: Getting Started With CUDA

Video
Date: 2024-01-27, Speaker: Jeremy Howard
Notebook: See the lecture_003 folder, or run the Colab version

Lecture 4: Intro to Compute and Memory Architecture

Video
Date: 2024-02-03, Speaker: Thomas Viehmann
Notebook and slides in the lecture_004 folder.

Lecture 5: Going Further with CUDA for Python Programmers

Video
Date: 2024-02-10, Speaker: Jeremy Howard
Notebook in the lecture_005 folder.

Lecture 6: Optimizing PyTorch Optimizers

Video
Date: 2024-02-17, Speaker: Jane Xu
Slides

Lecture 7: Advanced Quantization

Video
Date: 2024-02-25, Speaker: Charles Hernandez
Slides

Lecture 8: CUDA Performance Checklist

Video
Date: 2024-03-09, Speaker: Mark Saroufim
Code in the lecture_008 folder
Slides

Lecture 9: Reductions

Video
Date: 2024-03-09, Speaker: Mark Saroufim
Code in the lecture_009 folder
Slides

Lecture 10: Build a Prod Ready CUDA Library

Video
Date: 2024-03-16, Speaker: Oscar Amoros Huguet
slides

Lecture 11: Sparsity

Video
Date: 2024-03-23, Speaker: Jesse Cai
Slides

Lecture 12: Flash Attention

Video
Date: 2024-03-30, Speaker: Thomas Viehmann

Lecture 13: Ring Attention

Video
Date: 2024-04-06, Speaker: Andreas Koepf
Slides

Lecture 14: Practitioner's Guide to Triton

Video
Date: 2024-04-13, Speaker: Umer Adil
Notebook

Lecture 17: GPU Collective Communication (NCCL)

Date: 2024-05-04, Speaker: Dan Johnson
Code in the lecture_017 folder

Lecture 18: Fused Kernels

Date: 2024-05-11, Speaker: Kapil Sharma
Code in the lecture_018 folder

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
lecture_001		lecture_001
lecture_002		lecture_002
lecture_003		lecture_003
lecture_004		lecture_004
lecture_005		lecture_005
lecture_008		lecture_008
lecture_009		lecture_009
lecture_011		lecture_011
lecture_013		lecture_013
lecture_014		lecture_014
lecture_017		lecture_017
lecture_018		lecture_018
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py

License

cuda-mode/lectures

Folders and files

Latest commit

History

Repository files navigation

Supplementary Material for Lectures

Lecture 1: Profiling and Integrating CUDA kernels in PyTorch

Lecture 2: Recap Ch. 1-3 from the PMPP book

Lecture 3: Getting Started With CUDA

Lecture 4: Intro to Compute and Memory Architecture

Lecture 5: Going Further with CUDA for Python Programmers

Lecture 6: Optimizing PyTorch Optimizers

Lecture 7: Advanced Quantization

Lecture 8: CUDA Performance Checklist

Lecture 9: Reductions

Lecture 10: Build a Prod Ready CUDA Library

Lecture 11: Sparsity

Lecture 12: Flash Attention

Lecture 13: Ring Attention

Lecture 14: Practitioner's Guide to Triton

Lecture 17: GPU Collective Communication (NCCL)

Lecture 18: Fused Kernels

About

Resources

License

Stars

Watchers

Forks

Languages