Aquila-VL-2B | Lab Index

2B-parameter vision-language model built on LLaVA-one-vision framework with Qwen2.5-1.5B-instruct as LLM and SigLIP-SO400M as vision tower. Trained on the Infinity-MM dataset (~40M image-text pairs). First model to earn LF AI & Data's MOF Class I "Open Science" rating. SOTA performance among models of the same scale on MMBench, RealWorldQA, and ScienceQA benchmarks.

HuggingFace Paper (arXiv, Infinity-MM)

Model Details

Variants

Name	Parameters	Notes
Aquila-VL-2B-llava-qwen	—	—
Aquila-VL-2B-Intermediate	—	—

multimodalopen-weight

Model Details

Variants

Related