"Resampling Images into 1D Token Sequences of Flexible Length." Projects 2D images into variable-length, ordered 1D token sequences (1 to 256 tokens for a 256×256 image). Uses a rectified flow decoder and nested dropout for hierarchical, semantic compression.

Achieves FID<2 across 8 to 128 tokens on ImageNet, outperforming TiTok and matching SOTA with far fewer tokens. Shows that in autoregressive generation, the number of tokens needed depends on image complexity. ICML 2025. By Bachmann, Allardice, Dehghan et al.

Paper

arXiv: 2502.13967

Venue: ICML 2025

visionresearchopen-source

Related