GLM (Original) | Lab Index

The GLM paper (ACL 2022) is a seminal work that unified the three major pretraining paradigms — autoencoding (BERT), autoregressive (GPT), and encoder-decoder (T5) — into a single flexible architecture via Autoregressive Blank Infilling: masking continuous spans of text and training the model to reconstruct them sequentially.

Core innovations include: (1) a unified objective where varying the number and length of masked spans tunes the model for NLU, conditional generation, or unconditional generation; (2) 2D positional encoding with absolute position in corrupted text and relative position within generated spans; and (3) a single GLM outperforming BERT, T5, and GPT on SuperGLUE at smaller parameter counts.

The blank infilling approach saw widespread adoption as Fill-in-the-Middle (FIM) training — used by OpenAI (GPT-3.5/4 code models), Meta (CodeLLaMA), BigCode (StarCoder), and DeepSeek. With ~2,200 citations by early 2026, the paper launched the GLM architecture that underpins the entire ChatGLM/GLM-4/GLM-5 series. Its authors went on to found Zhipu AI (Jie Tang), Moonshot AI (Zhilin Yang, also known for XLNet), and contribute to DeepSeek (Xuehai Pan).

GitHub Paper (arXiv)HuggingFace

Outputs 2

GLM-10B

model

Initial general-purpose foundational model.

GitHub HuggingFace

Architecture DENSE

Parameters 10B

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

paper

Foundational architecture paper introducing the autoregressive blank-infilling objective.

Paper (arXiv)

Citations 21

arXiv HTML

open-weightarchitecture