OPT (Open Pre-trained Transformer)
modelSuite of decoder-only Transformers from 125M to 175B parameters, fully open-sourced including pre-trained weights, code, and logbooks. OPT-175B matched GPT-3 performance at 1/7th the carbon footprint.
OPT was the first fully open GPT-3-scale model, enabling researchers outside major labs to study and build on frontier-scale models. Released model weights, training code, and detailed training logbooks documenting failures and decisions. By Zhang, Roller, Goyal et al.
Model Details
Architecture DENSE
Parameters 175B
Paper
arXiv: 2205.01068