Scales diffusion language models to 100B by converting pretrained AR models. LLaDA 2.1 introduces token-to-token editing for real self-correction, achieving 892 tokens/s on HumanEval+.

Outputs 2

LLaDA 2.0

model

Scales diffusion language models to 100B by converting pretrained AR models.

arXiv: 2512.15745

LLaDA 2.1

paper

Introduces token-to-token editing on top of mask-to-token denoising. Achieves 892 tokens/s on HumanEval+.

arXiv: 2602.08676

generationarchitecturemoescalingresearch

Related