7.3B code model using Mamba2 state-space architecture (NOT a transformer). Linear-time inference with theoretically unlimited context (tested to 256K tokens). Co-developed with Mamba's original authors. Apache 2.0.

Model Details

Architecture DENSE
Parameters 7.3B
codingopen-weightarchitecture

Related