LeapAlign: Post-Training Flow Matching Models at Any Generation Step

Post-training method for aligning flow matching generative models with human preferences at any generation step count. Compresses long ODE trajectories into two-step shortcuts via strategic leaps, enabling direct gradient propagation from reward signals to early generation steps — a bottleneck for existing methods like GRPO that only optimize final outputs.

Randomized timestep selection stabilizes updates across the full generation schedule. Weighted training favors trajectories most consistent with full-length generation paths. Outperforms GRPO and other direct-gradient baselines on the Flux model. CVPR 2026. By Zhanhao Liang, Tao Yang, Jie Wu, Chengjian Feng (ByteDance Seed), and Liang Zheng (ANU).

Paper (arXiv)

Paper

Venue CVPR 2026

arXiv HTML

foundationalgeneration