GLM (Original)
model paperThe GLM paper (ACL 2022) is a seminal work that unified the three major pretraining paradigms — autoencoding (BERT), autoregressive (GPT), and encoder-decoder (T5) — into a single flexible architecture via Autoregressive Blank Infilling: masking continuous spans of text and training the model to reconstruct them sequentially.
Core innovations include: (1) a unified objective where varying the number and length of masked spans tunes the model for NLU, conditional generation, or unconditional generation; (2) 2D positional encoding with absolute position in corrupted text and relative position within generated spans; and (3) a single GLM outperforming BERT, T5, and GPT on SuperGLUE at smaller parameter counts.
The blank infilling approach saw widespread adoption as Fill-in-the-Middle (FIM) training — used by OpenAI (GPT-3.5/4 code models), Meta (CodeLLaMA), BigCode (StarCoder), and DeepSeek. With ~2,200 citations by early 2026, the paper launched the GLM architecture that underpins the entire ChatGLM/GLM-4/GLM-5 series. Its authors went on to found Zhipu AI (Jie Tang), Moonshot AI (Zhilin Yang, also known for XLNet), and contribute to DeepSeek (Xuehai Pan).
Outputs 2
GLM-10B
modelInitial general-purpose foundational model.
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
paperFoundational architecture paper introducing the autoregressive blank-infilling objective.
arXiv: 2103.10360