BGE-VL
model datasetState-of-the-art multimodal embedding models for visual search including text-to-image, image-to-text, image+prompt-to-image, and text-to-image+text retrieval. Achieves SOTA on 36 multimodal embedding evaluation tasks (MMEB). Trained on the MegaPairs synthetic dataset (26M+ samples).
Outputs 2
BGE-VL (Multimodal Embedding)
modelState-of-the-art multimodal embedding models for visual search. BGE-VL-MLLM improves by 8.1pp over prior SOTA on the CIRCO benchmark. Released under MIT license.
Variants
| Name | Parameters | Notes |
|---|---|---|
| BGE-VL-base | — | — |
| BGE-VL-large | — | — |
| BGE-VL-MLLM-S1 | — | Trained on MegaPairs only |
| BGE-VL-MLLM-S2 | — | Full fine-tuned variant |
| BGE-VL-v1.5-zs | — | Zero-shot variant |
| BGE-VL-v1.5-mmeb | — | MMEB fine-tuned variant |
MegaPairs
datasetLarge-scale synthetic dataset of 26M+ multimodal retrieval triplets. Accepted as ACL 2025 Oral. Released under MIT license.