Lightweight CUDA implementation for maximum LLM inference performance on edge GPUs (RTX series and Jetson).

Library

GitHub Repository

efficiencyon-deviceinfrastructure