AI Lab Tracker
Labs
Timeline
Voxtral
model
2025-07-01
Mistral
Speech model family. Voxtral Mini (3B) and Small (24B) handle long-form audio understanding (30-40 minutes). Voxtral TTS (4B) adds zero-shot voice cloning. Natively multilingual. Apache 2.0.
Paper (arXiv)
HuggingFace (Mini)
HuggingFace (TTS)
Paper
arXiv:
2507.13264
audio
multimodal
open-weight