TinyBERT
model paperCompressed BERT model using a novel Transformer distillation method. Achieves 96.8% of BERT-base performance on GLUE while being 7.5x smaller and 9.4x faster. Introduces a two-stage learning framework performing distillation at both pre-training and fine-tuning stages. One of the most influential model compression works for NLP.
Outputs 2
TinyBERT
modelTinyBERT: Distilling BERT for Natural Language Understanding
paperarXiv: 1909.10351