Triton
libraryOpen-source GPU programming language and compiler for writing high-performance neural network kernels. Python-like syntax that compiles to optimized GPU code, 3-6x simpler than CUDA while achieving comparable performance. 19K+ GitHub stars.
Triton lowered the barrier to writing custom GPU kernels from CUDA expertise to Python familiarity. Used extensively in PyTorch, FlashAttention, and throughout the ML infrastructure stack. Key enabler of the custom kernel ecosystem for efficient LLM inference and training. MIT License.