InternVL 2.5
modelFirst open-source multimodal LLM to surpass 70% on MMMU. Investigates scaling of vision encoders, language models, datasets, and test-time Chain-of-Thought reasoning. Also retroactively documents InternVL 2.0.
Model Details
Architecture DENSE
Variants
| Name | Parameters | Notes |
|---|---|---|
| InternVL2_5-1B | 1B | — |
| InternVL2_5-8B | 8B | — |
| InternVL2_5-38B | 38B | — |
| InternVL2_5-78B | 78B | — |
Paper
arXiv: 2412.05271