BERT | Lab Index

Bidirectional Encoder Representations from Transformers. Pre-trained deep bidirectional representations by jointly conditioning on both left and right context in all layers via masked language modeling. 110M (Base) and 340M (Large) parameters.

BERT revolutionized NLP, pushing GLUE to 80.5% (+7.7 pts) and SQuAD to 93.2 F1. Spawned an entire generation of models (RoBERTa, ALBERT, DeBERTa, XLNet) and became the dominant approach for search, classification, and NER. NAACL 2019. ~100K+ citations. By Devlin, Chang, Lee, and Toutanova. Apache 2.0.

Paper (arXiv)HuggingFace

Model Details

Architecture DENSE

Parameters 340M

Paper

Venue NAACL 2019

arXiv HTML

foundationalopen-source

Model Details

Paper

Related