2B-parameter vision-language model built on LLaVA-one-vision framework with Qwen2.5-1.5B-instruct as LLM and SigLIP-SO400M as vision tower. Trained on the Infinity-MM dataset (~40M image-text pairs). First model to earn LF AI & Data's MOF Class I "Open Science" rating. SOTA performance among models of the same scale on MMBench, RealWorldQA, and ScienceQA benchmarks.

Model Details

Variants

Name Parameters Notes
Aquila-VL-2B-llava-qwen
Aquila-VL-2B-Intermediate
multimodalopen-weight

Related