MiniCPM-V

Vision-language model series achieving GPT-4V level performance on mobile devices. The first multimodal model deployed natively on a smartphone. Progressed from V 2.0 through V 2.6 with world-class OCR and video understanding.

Paper (arXiv)GitHub HuggingFace (V 2.0)HuggingFace (V 2.6)

Outputs 4

MiniCPM-V 2.0

model

First multimodal model deployed natively on a smartphone.

HuggingFace

MiniCPM-Llama3-V 2.5

model 2024-05-20

GPT-4V level performance in a 9B parameter package with world-class OCR capabilities.

HuggingFace

Parameters 9B

MiniCPM-V 2.6

model 2024-08-06

Introduced multi-image and video understanding, outperforming GPT-4V on major benchmarks.

HuggingFace

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

paper 2024-08-03

Paper (arXiv)

Citations 23

arXiv HTML

multimodalon-devicevisionvideo

Your notes

Outputs 4

MiniCPM-V 2.0

MiniCPM-Llama3-V 2.5

MiniCPM-V 2.6

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Related