DiffusionGemma

An open-weights discrete-diffusion language model built on the Gemma 4 26B-A4B MoE base (25.2B total / 3.8B active, 128 experts + 1 shared, top-8; 256K context; ~550M vision encoder for image/video input). Instead of left-to-right token generation, DiffusionGemma iteratively denoises blocks ("canvases") of tokens in parallel: an autoregressive encoder caches prompt context while a bidirectional decoder refines the generation canvas, with an Entropy-Bounded Denoising + Adaptive Stopping sampler (≤48 steps). Google reports 15–20 tokens per forward pass and >1,100 tok/s per user at low batch on H100/FP8.

Apache-2.0. The diffusion variant trades some quality for speed vs the autoregressive Gemma 4 sibling (e.g. AIME 2026 69.1 vs 88.3, GPQA Diamond 73.2 vs 82.3). Self-reported: MMLU-Pro 77.6, AIME 2026 69.1, LiveCodeBench v6 69.1, GPQA Diamond 73.2, τ²-Bench 56.2, MMMU-Pro 54.3. Not yet on the AA Intelligence Index.

Announcement (Google blog)HuggingFace Docs Artificial Analysis

Model Details

Architecture MOE

Parameters 25.2B

Active params 3.8B

Experts 128 (top-8)

Context window 262,144

AA Intelligence 13

License Apache 2.0

Benchmark Scores

Benchmark	Score	Mode
MMLU-Pro	77.6	—
AIME 2026	69.1	—
LiveCodeBench v6	69.1	—
GPQA Diamond	73.2	—
τ²-Bench	56.2	—
MMMU-Pro	54.3	—

open-weightmoemultimodalarchitectureresearch

Your notes

Model Details

Benchmark Scores

Related