Chinese text embedding model from SenseTime Research achieving state-of-the-art on the CMTEB benchmark across 6 tasks. Uses efficient multi-task hybrid loss training, scaled-up embedding dimensions, and MRL training for flexible vector dimensions.

Outputs 2

Piccolo2 Models

model

Variants

Name Parameters Notes
piccolo-large-zh-v2
piccolo-large-zh Original v1 model
piccolo-base-zh

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

paper

arXiv: 2405.06932

embeddingnlpopen-weight