Dream 7B
modelMost powerful open diffusion large language model, jointly developed with HKU. Uses discrete diffusion modeling to refine sequences in parallel through iterative denoising rather than autoregressive token generation. Trained on 580B tokens with AR weight initialization from Qwen2.5-7B. Matches or exceeds similarly-sized AR models on general, math, and coding tasks. Outperforms DeepSeek V3 (671B) on structured planning tasks.
Model Details
Architecture DENSE
Parameters 7B
Paper
arXiv: 2508.15487