DiffusionGemma
modelAn open-weights discrete-diffusion language model built on the Gemma 4 26B-A4B MoE base (25.2B total / 3.8B active, 128 experts + 1 shared, top-8; 256K context; ~550M vision encoder for image/video input). Instead of left-to-right token generation, DiffusionGemma iteratively denoises blocks ("canvases") of tokens in parallel: an autoregressive encoder caches prompt context while a bidirectional decoder refines the generation canvas, with an Entropy-Bounded Denoising + Adaptive Stopping sampler (≤48 steps). Google reports 15–20 tokens per forward pass and >1,100 tok/s per user at low batch on H100/FP8.
Apache-2.0. The diffusion variant trades some quality for speed vs the autoregressive Gemma 4 sibling (e.g. AIME 2026 69.1 vs 88.3, GPQA Diamond 73.2 vs 82.3). Self-reported: MMLU-Pro 77.6, AIME 2026 69.1, LiveCodeBench v6 69.1, GPQA Diamond 73.2, τ²-Bench 56.2, MMMU-Pro 54.3. Not yet on the AA Intelligence Index.
Model Details
Benchmark Scores
| Benchmark | Score | Mode |
|---|---|---|
| MMLU-Pro | 77.6 | — |
| AIME 2026 | 69.1 | — |
| LiveCodeBench v6 | 69.1 | — |
| GPQA Diamond | 73.2 | — |
| τ²-Bench | 56.2 | — |
| MMMU-Pro | 54.3 | — |