135B dense LLM trained entirely on Ascend NPUs (8,192 chips) on 13.2T tokens. Demonstrates that frontier-scale dense models can be trained on domestic Chinese hardware without NVIDIA GPUs.

Outputs 2

Pangu Ultra 135B

model
Architecture DENSE
Parameters 135B

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

paper

arXiv: 2504.07866

frontiernlp

Related