Benchmark for evaluating multimodal large language models across multiple dimensions. SEED-Bench-2 expanded to 24K multiple-choice questions covering 27 evaluation dimensions. Published at CVPR 2024.

Dataset

GitHub Repository

benchmarkmultimodalevaluation