DaX (大象)
paperPathology vision foundation model from DAMO Academy (Alibaba Group), CAS Institute of Automation, and Hupan Lab. ViT-L backbone (plus a ViT-B "DaX-Base" sibling for scaling analysis), initialized from natural-image DINOv3 weights and adapted to whole-slide histopathology via a two-stage DINOv3-style self-supervised framework.
Pathology-specific design choices: continuous magnification training (multi-resolution patches at 2.5×, 5×, 10×, 20× anchors), cross-scale tissue views, orientation-agnostic + acquisition-robust augmentation, multi-input-size training, and Gram-anchored dense consistency for stable token-level representations across scales. The aim is to connect local cellular morphology with global tissue architecture in one model.
Alongside the model, the paper introduces a WSI-level benchmark of 161 clinically meaningful tasks spanning 44 public datasets, 28,182 patients, 34,394 slides, across four clinical domains (diagnostic pathology, biomarker/molecular profiling, tissue/specimen context, prognosis) and nine task categories. All models evaluated under a fixed patient-level cross-validation protocol with fold-level statistical ranking. DaX achieves the highest mean performance and consistently top task-level ranking across the full benchmark.