Multimodal foundation model with unified multi-granularity comprehension and generation. Extends the SEED family to handle both fine-grained visual understanding and flexible image generation within a single framework.

Paper

arXiv: 2404.14396

multimodalgenerationvisionresearch