Introduces the V-Triune system and Orsta models (7B/32B) that unify visual reasoning and perception tasks via reinforcement learning. Up to +14.1 improvement on MEGA-Bench Core.

Paper

arXiv: 2505.18129

visionreasoningtrainingresearch