172B dense Transformer (96 layers, 12288 hidden, 96 heads). Trained on 2.1T tokens (50% Japanese, 50% English+code) on llm-jp-corpus v3 using Megatron-LM. Compute from Google Cloud Japan and SAKURA Internet. Post-trained with SFT + DPO.

Japanese MT-Bench avg 7.57 (writing 9.20, humanities 9.56). Largest fully open Japanese model at release. Also includes 3.1 variants: 13B mid-trained (2.5T tokens, MT-Bench JA 7.37) and 8x13B MoE (73B total, 32K context). Apache 2.0 for smaller variants; custom license for 172B.

Model Details

Architecture DENSE
Parameters 172B
Context window 4,096
open-weightmultilingualfrontier

Related