PLaMo 2
modelHybrid Mamba2 + Sliding Window Attention architecture. PLaMo 2.0-31B (2T tokens), PLaMo 2-8B (6T tokens, ~45% English / ~30% Japanese / ~15% code), PLaMo 2-1B. Weight reusing and structural pruning: PLaMo 2.1-8B (pruned from 31B + 500B token retrain with knowledge distillation) matches PLaMo-100B quality at 7x less compute (55 vs 372 exaFLOPs).
32K context via continual pretraining with full attention + RoPE (theta=1M). PLaMo Community License (commercial up to 1B yen revenue).
Model Details
Architecture DENSE
Parameters 31B
Context window 32,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| PLaMo 2-1B | 1B | — |
| PLaMo 2-8B | 8B | — |
| PLaMo 2.0-31B | 31B | — |
| PLaMo 2.1-8B | 8B | Pruned from 31B, matches PLaMo-100B |
Paper
arXiv: 2509.04897