Groundbreaking multimodal dataset featuring 200K images and 10K videos with detailed, dense captions.
multimodaltraining-data

Notes

arXiv submission Apr 25, 2024.