Efficient LLM architecture addressing feature collapse via nonlinearity compensation. Introduces a series-based activation function in FFN and augmented shortcuts in MSA. PanGu-pi-7B achieves comparable performance with 10% inference speed-up. PanGu-pi Pro further optimizes for tiny language models (1B-1.5B), achieving 8.87-point average improvement through tokenizer compression, architecture tweaking, and parameter inheritance.

Outputs 3

PanGu-pi

model

Variants

Name Parameters Notes
PanGu-pi-1B 1B
PanGu-pi-7B 7B
PanGu-pi-1B Pro 1B Released 2024-02
PanGu-pi-1.5B Pro 1.5B Released 2024-02

PanGu-pi: Enhancing Language Model Architectures via Nonlinearity Compensation

paper

arXiv: 2312.17276

PanGu-pi Pro: Rethinking Optimization and Architecture for Tiny Language Models

paper

arXiv: 2402.02791

nlpefficiencyarchitecture