MiniCPM-V 2.6
modelA breakthrough 8B-parameter multimodal model (built on Qwen2-7B) that surpassed GPT-4V in single-image, multi-image, and video understanding tasks. It supports real-time inference on mobile devices and iPads, introducing advanced spatio-temporal compression for video processing and Needle-in-a-Haystack retrieval for long-context multimodal inputs.
Model Details
Parameters 8B
Paper
arXiv: 2408.01800