Skip to content

pramodith/kernel-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kernel-engineering

A repo for learning kernel-engineering/gpu-programming

Setup

make setup

Notebooks

Notebook Description
Control Divergence Explores warp divergence in GPU kernels: what happens when threads within a warp take different branches, how it serializes execution, and benchmarks the performance cost.
TF32 Precision & Performance Demonstrates TensorFloat32 (TF32) on Ampere+ GPUs. Compares matmul precision (TF32 vs FP32 vs FP16 vs FP64), shows TF32 has FP16's precision but FP32's range, and benchmarks the throughput speedup.
GPU Memory Hierarchy & Data Movement Walks through the GPU memory hierarchy (RMEM, SMEM, GMEM) using CuTe DSL. Covers GMEM/RMEM scalar and vectorized copies, GMEM to SMEM via cp.async, commit groups, copy atoms, and PTX analysis of each path.

About

A repo for learning kernel-engineering/gpu-programming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors