Tülu 3

Post-training framework introducing RLVR (Reinforcement Learning with Verifiable Rewards). Applied to Llama 3.1 at 8B, 70B, and 405B. Surpasses instruct versions of Llama 3.1, Qwen 2.5, Mistral, and closed models GPT-4o-mini and Claude 3.5-Haiku.

At release, no model in LMSYS ChatBot Arena top-50 had published post-training data. Tülu 3 releases all datasets, training code, and recipes. Comprehensive decontamination of open datasets. Apache 2.0.

No results found