FlexTok | Lab Index

"Resampling Images into 1D Token Sequences of Flexible Length." Projects 2D images into variable-length, ordered 1D token sequences (1 to 256 tokens for a 256×256 image). Uses a rectified flow decoder and nested dropout for hierarchical, semantic compression.

Achieves FID<2 across 8 to 128 tokens on ImageNet, outperforming TiTok and matching SOTA with far fewer tokens. Shows that in autoregressive generation, the number of tokens needed depends on image complexity. ICML 2025. By Bachmann, Allardice, Dehghan et al.

Paper (arXiv)GitHub Apple ML Research

Paper

Venue ICML 2025

arXiv HTML

visionresearchopen-source

Paper

Related