DINO-X
model paperUnified vision model for open-world object detection and understanding. Supports text, visual, and customized prompts including prompt-free universal object detection. Trained on Grounding-100M dataset with 100M+ high-quality grounding samples. Sets new SOTA: 56.0 AP on COCO, 59.8 AP on LVIS-minival, 52.4 AP on LVIS-val in zero-shot settings. Supports detection, segmentation, pose estimation, and region captioning.
Outputs 2
DINO-X
modelWorld's top-performing vision model for open-world object detection. Pro and Edge variants with text, visual, and customized prompt support.
Variants
| Name | Parameters | Notes |
|---|---|---|
| DINO-X Pro | — | SOTA 56.0 AP COCO, 59.8 AP LVIS-minival zero-shot |
| DINO-X Edge | — | Efficient variant for edge deployment |
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
paperPresents DINO-X with universal object prompt and Grounding-100M dataset for prompt-free open-world detection and understanding.
arXiv: 2411.14347