LIMO
paper"Less Is More for Reasoning." Demonstrates that 817 curated samples can match models trained on 100x more data. AIME24: 63.3% (vs 6.5% prior SFT), MATH500: 95.6% (vs 59.2%), 45.8% absolute out-of-distribution improvement. Published at COLM 2025.
LIMR (Feb 2025) extended to RL scaling: 1,389 samples beat the full 8,523-sample dataset. LIMI (Sep 2025) applied to agency: 78 samples achieved 73.5% on agency benchmarks with 128x data reduction.
Paper
arXiv: 2502.03387
Venue: COLM 2025