An efficient end-side Multimodal LLM (MLLM) combining MiniCPM-2.4B with a SigLip-400M vision encoder. It introduced high-resolution image support (up to 1.8M pixels) using adaptive tiling and was the first mobile-optimized MLLM aligned via RLHF-V to reduce hallucinations.

Model Details

Parameters 2.8B
on-devicemultimodalvisionopen-weight

Related