Mask DINO

Unified Transformer-based framework for object detection and segmentation. Extends DINO with a mask prediction branch supporting instance, panoptic, and semantic segmentation. Achieves 54.7 AP on COCO instance, 59.5 PQ on COCO panoptic, and 60.8 mIoU on ADE20K semantic segmentation.

Paper (arXiv)GitHub

Outputs 2

library

Official implementation of Mask DINO. Achieves best results on all three segmentation tasks simultaneously.

GitHub

Stars 1.5k

GitHub Repository →

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

paper

Extends DINO with a shared mask prediction branch that supports all image segmentation tasks via query-based dot-product with pixel embeddings.

Paper (arXiv)

Venue CVPR 2023

Citations 20

arXiv HTML

visionopen-source

Your notes

Outputs 2

Mask DINO

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation