VQ-VAE | Lab Index

"Neural Discrete Representation Learning." Introduced Vector Quantised Variational AutoEncoder (VQ-VAE), which learns discrete latent representations via vector quantization. Solves the posterior collapse problem of standard VAEs by using a codebook of learned embeddings.

VQ-VAE became foundational infrastructure for generative AI: DALL-E's dVAE, audio codecs (SoundStream, EnCodec), video tokenization, and discrete visual representations all build on this work. The follow-up VQ-VAE-2 (2019) demonstrated high-fidelity image generation competitive with GANs. NeurIPS 2017. By van den Oord, Vinyals, and Kavukcuoglu (DeepMind).

Paper (arXiv)

Paper

Venue NeurIPS 2017

arXiv HTML

foundationalvision