Command A | Lab Index

Cohere's current flagship. 111B dense Transformer with a novel hybrid attention: 3 sliding-window attention layers (4096 window) + 1 global attention layer without positional embeddings, using RoPE. 256K context, 23 languages. Uses model merging rather than MoE to achieve expert-level breadth.

150% higher throughput vs Command R+ 08-2024. Runs on 2 GPUs (A100/H100). Trained via decentralized training and self-refinement algorithms. AA Intelligence Index: 13. CC-BY-NC. Command A Vision (Jul 2025) adds SigLIP2 vision encoder.

Paper (arXiv)HuggingFace Artificial Analysis OpenRouter

Model Details

Architecture DENSE

Parameters 111B

Context window 256,000

AA Intelligence 13

Paper

arXiv HTML

open-weightenterprisemultilingualfrontier

Model Details

Paper

Related