Dream 7B | Lab Index

Most powerful open diffusion large language model, jointly developed with HKU. Uses discrete diffusion modeling to refine sequences in parallel through iterative denoising rather than autoregressive token generation. Trained on 580B tokens with AR weight initialization from Qwen2.5-7B. Matches or exceeds similarly-sized AR models on general, math, and coding tasks. Outperforms DeepSeek V3 (671B) on structured planning tasks.

Paper (arXiv)Project Page GitHub MarkTechPost Announcement

Model Details

Architecture DENSE

Parameters 7B

Paper

arXiv HTML

nlpgenerationreasoningopen-weight