ChartNet | Lab Index

Million-scale multimodal dataset for chart understanding from IBM Granite. 4.2M synthetic chart samples (~1 TB total, 2.5M under the permissive subset) plus 94K human-verified charts and 2K human-verified eval samples. Each sample bundles five tightly aligned elements: plotting code, rendered image, underlying data table, natural-language summary, and QA pairs with chain-of-thought reasoning.

Covers 24 chart types across 6 plotting libraries (matplotlib, seaborn, plotly, Dask, Polars, etc.) and supports four downstream tasks: chart-to-code, chart-to-CSV, chart-to-text, and grounded chart QA. Used to train the Granite Vision 4 series (Granite-4.0-3B-Vision, Granite-Vision-4.1-4B). CVPR 2026.

The April 2026 core_permissive subset (2.5M rows) is available under CDLA-Permissive-2.0; the original subsets remain research/evaluation-only due to upstream Mistral Research License contamination. June 3, 2026 release added grounded_qa and completed the reasoning subset.

HuggingFace Paper (arXiv)

Paper

arXiv HTML

datasetmultimodalvision-language

Your notes

Paper