Qwen2.5-Omni-7B
modelEnd-to-end multimodal model processing text, images, audio, and video with real-time speech generation. Thinker-Talker architecture. Over 80k downloads in first week on HuggingFace.
Model Details
Architecture DENSE
Parameters 7B