PanGu-pi

Efficient LLM architecture addressing feature collapse via nonlinearity compensation. Introduces a series-based activation function in FFN and augmented shortcuts in MSA. PanGu-pi-7B achieves comparable performance with 10% inference speed-up. PanGu-pi Pro further optimizes for tiny language models (1B-1.5B), achieving 8.87-point average improvement through tokenizer compression, architecture tweaking, and parameter inheritance.

Paper: PanGu-pi (arXiv)Paper: PanGu-pi Pro (arXiv)GitHub

Outputs 3

model

Variants

Name	Parameters	Notes
PanGu-pi-1B	1B	—
PanGu-pi-7B	7B	—
PanGu-pi-1B Pro	1B	Released 2024-02
PanGu-pi-1.5B Pro	1.5B	Released 2024-02

PanGu-pi: Enhancing Language Model Architectures via Nonlinearity Compensation

paper

Citations 2

arXiv HTML

PanGu-pi Pro: Rethinking Optimization and Architecture for Tiny Language Models

paper

Citations 2

arXiv HTML

nlpefficiencyarchitecture