MiniCPM-o 2.6
modelOmni-modal model (8B) capable of real-time speech-to-speech interaction and multimodal live streaming on mobile devices. Built on SigLip-400M + Whisper-medium-300M + ChatTTS-200M + Qwen2.5-7B. The first on-device model to achieve GPT-4o level across vision, speech, and streaming.
Model Details
Architecture DENSE
Parameters 8B