Multimodal understanding model built on MoE architecture (389B total, 52B active). Handles images, videos, and 3D content. Ranked first among Chinese image AI models on LMArena Vision Leaderboard.

Model Details

Architecture MOE
Parameters 389B
Active params 52B
multimodalmoe