Apertus
modelSwitzerland's sovereign multilingual LLM. 8B and 70B dense Transformers trained on 15 trillion tokens across 1,811 languages (~40% non-English) using 4,096 NVIDIA GH200 GPUs on the CSCS Alps supercomputer (10M+ GPU hours). 65K context. 101+ authors.
Novel contributions: xIELU activation function, AdEMAMix optimizer, Goldfish objective for suppressing verbatim memorization. Trained exclusively on openly available data with retroactive robots.txt compliance — claims first EU AI Act-compliant large model. Post-trained via SFT + QRPO alignment. Apache 2.0.
70B competitive with Llama 3.1-70B on average across multilingual benchmarks (67.5% vs 67.3%). AA Intelligence: 6 (8B). Significantly behind frontier models on English-only benchmarks, but strongest fully open model for extreme multilingual breadth.
Model Details
Variants
| Name | Parameters | Notes |
|---|---|---|
| Apertus-8B | 8B | — |
| Apertus-70B | 70B | — |
Paper
arXiv: 2509.14233