Pioneering series of open multimodal reasoning models with visual chain-of-thought. R1V (38B, Apr 2025) introduced multimodal CoT via iterative SFT + GRPO. R1V2 added hybrid RL (MPO + GRPO). R1V3 achieved MMMU 76.0% (rivaling closed-source VLMs). R1V4 (30B/3B active) introduced agentic multimodal intelligence with interleaved visual reasoning + web search, surpassing Gemini 2.5 Flash on 11/11 metrics.

Model Details

Architecture DENSE
Parameters 38B

Paper

arXiv: 2504.05599

reasoningmultimodalvisionopen-weight

Related