Aya Vision | Lab Index

Multilingual multimodal model (8B and 32B) using Command R7B base + SigLIP2 vision encoder with multimodal adapter. 23 languages. 8B beats Qwen-2.5-VL-7B, Pixtral 12B, and Llama-3.2-90B-Vision. 32B outperforms Molmo-72B. CC-BY-NC-4.0.

No results found