Comprehensive benchmark for evaluating multimodal LLMs on code generation from scientific plots. Published at NAACL Findings 2025.

Dataset

GitHub Repository

benchmarkcodemultimodal