Baichuan-Audio
paper modelOpen-source end-to-end audio large language model integrating speech understanding and generation for real-time bilingual Chinese-English dialogue. Uses multi-codebook discretization at 12.5 Hz frame rate to retain both semantic and acoustic information. Achieves 3.2% WER on Fleurs zh test set, significantly outperforming Whisper-large-v3 (12.4%). Includes the OpenAudio-Bench evaluation benchmark.
Outputs 2
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
paperTechnical paper describing the multi-codebook speech discretization, text-guided aligned speech generation, and two-stage pre-training strategy.
arXiv: 2502.17239
Baichuan-Audio (model)
modelOpen-source end-to-end speech interaction model with Base and Instruct variants for bilingual Chinese-English audio dialogue.
Architecture DENSE
Parameters 10B