PanGu-pi
model paperEfficient LLM architecture addressing feature collapse via nonlinearity compensation. Introduces a series-based activation function in FFN and augmented shortcuts in MSA. PanGu-pi-7B achieves comparable performance with 10% inference speed-up. PanGu-pi Pro further optimizes for tiny language models (1B-1.5B), achieving 8.87-point average improvement through tokenizer compression, architecture tweaking, and parameter inheritance.
Outputs 3
PanGu-pi
modelVariants
| Name | Parameters | Notes |
|---|---|---|
| PanGu-pi-1B | 1B | — |
| PanGu-pi-7B | 7B | — |
| PanGu-pi-1B Pro | 1B | Released 2024-02 |
| PanGu-pi-1.5B Pro | 1.5B | Released 2024-02 |
PanGu-pi: Enhancing Language Model Architectures via Nonlinearity Compensation
paperarXiv: 2312.17276
PanGu-pi Pro: Rethinking Optimization and Architecture for Tiny Language Models
paperarXiv: 2402.02791