DynaBERT
model paperDynamic BERT model with adaptive width and depth, allowing flexible adjustment of model size and latency at runtime. Uses knowledge distillation from full-sized models to smaller sub-networks with network rewiring to share important attention heads. Published at NeurIPS 2020.
Outputs 2
DynaBERT
modelDynaBERT: Dynamic BERT with Adaptive Width and Depth
paperarXiv: 2004.04037