747M image-text pair dataset from CommonCrawl (Oct 2020 - Aug 2021). Filtered from 10B raw pairs. Validated by training ALIGN, unCLIP, and ViT from scratch on COYO, achieving competitive performance with original papers. Used to train Karlo (Kakao's DALL-E 2 variant). CC-BY-4.0.

Dataset

GitHub Repository

dataopen-sourcevision