LLM-jp-3 (172B) | Lab Index

172B dense Transformer (96 layers, 12288 hidden, 96 heads). Trained on 2.1T tokens (50% Japanese, 50% English+code) on llm-jp-corpus v3 using Megatron-LM. Compute from Google Cloud Japan and SAKURA Internet. Post-trained with SFT + DPO.

Japanese MT-Bench avg 7.57 (writing 9.20, humanities 9.56). Largest fully open Japanese model at release. Also includes 3.1 variants: 13B mid-trained (2.5T tokens, MT-Bench JA 7.37) and 8x13B MoE (73B total, 32K context). Apache 2.0 for smaller variants; custom license for 172B.

HuggingFace (172B)Announcement NVIDIA Blog (Training)

Model Details

Architecture DENSE

Parameters 172B

Context window 4,096

Training tokens 2.1T

open-weightmultilingualfrontier

Model Details

Related