"Compact giant" outperforming models 20x its size (including GPT-4o on certain benchmarks) via fully unfrozen perception-decoder training.

Outputs 2

Step-3-VL-10B

model

"Compact giant" outperforming models 20x its size (including GPT-4o on certain benchmarks) via fully unfrozen perception-decoder training.

Architecture DENSE
Parameters 10B

Released Jan 20, 2026 on HuggingFace.

Step3-VL-10B Technical Report

paper

Focused on "Intrinsic Vision-Language Synergy."

arXiv: 2601.09668

multimodalefficiencyopen-weight