Large-scale video-text dataset for multimodal generation.
videotraining-datamultimodal

Notes

arXiv submission Jul 13, 2023.