Smaller Trinity variants sharing the same architecture. Mini: 26B/3B active (128 experts, top-8, 131K context). Nano: 6B/1B active (128 experts, 128K context). Both trained on 10T tokens using 512 H200 GPUs. Apache 2.0.

Model Details

Architecture MOE
Parameters 26B
Active params 3B
Context window 131,000

Variants

Name Parameters Notes
Trinity Mini 26B
Trinity Nano 6B

Paper

arXiv: 2602.17004

moeopen-weightefficiency

Related