123B dense model with 128K context. Performs on par with GPT-4o, Claude 3 Opus, and Llama 3 405B. MMLU: 84.0%. Supports 80+ programming languages and dozens of natural languages.

Model Details

Architecture DENSE
Parameters 123B
Context window 128,000
open-weightfrontier

Related