Highly optimized kernels for Multi-head Latent Attention.

Library

GitHub Repository

infrastructureattention