MiMo-Audio
model paperSpecialized model for few-shot audio understanding and environmental sound classification. Pre-trained on 100M+ hours, achieves SOTA on speech intelligence and audio understanding benchmarks.
Outputs 2
MiMo-Audio: Audio Language Models are Few-Shot Learners
paperAudio LM pre-trained on 100M+ hours. Achieves SOTA on speech intelligence and audio understanding benchmarks.
arXiv: 2512.23808