PLaMo-100B | Lab Index

100B dense Transformer with QK Normalization and Z-Loss for training stability. Trained on 2T tokens (1.3T English, 0.7T Japanese) in two phases on NVIDIA H100 GPUs with FP8. Funded under Japan's GENIAC/NEDO program.

Beats GPT-4 on Japanese benchmarks: Jaster 0-shot avg 0.738 (vs GPT-4 0.722), 4-shot avg 0.775 (vs 0.772). Japanese MT-Bench: 7.78.

Paper (arXiv)HuggingFace

Model Details

Architecture DENSE

Parameters 100B

Training tokens 2T

Paper

arXiv HTML

open-weightmultilingual

Model Details

Paper

Related