Suite of decoder-only Transformers from 125M to 175B parameters, fully open-sourced including pre-trained weights, code, and logbooks. OPT-175B matched GPT-3 performance at 1/7th the carbon footprint.

OPT was the first fully open GPT-3-scale model, enabling researchers outside major labs to study and build on frontier-scale models. Released model weights, training code, and detailed training logbooks documenting failures and decisions. By Zhang, Roller, Goyal et al.

Model Details

Architecture DENSE
Parameters 175B

Paper

arXiv: 2205.01068

open-weightopen-source

Related