FlexTok
paper"Resampling Images into 1D Token Sequences of Flexible Length." Projects 2D images into variable-length, ordered 1D token sequences (1 to 256 tokens for a 256×256 image). Uses a rectified flow decoder and nested dropout for hierarchical, semantic compression.
Achieves FID<2 across 8 to 128 tokens on ImageNet, outperforming TiTok and matching SOTA with far fewer tokens. Shows that in autoregressive generation, the number of tokens needed depends on image complexity. ICML 2025. By Bachmann, Allardice, Dehghan et al.
Paper
arXiv: 2502.13967
Venue: ICML 2025