OPT (Open Pre-trained Transformer)

Suite of decoder-only Transformers from 125M to 175B parameters, fully open-sourced including pre-trained weights, code, and logbooks. OPT-175B matched GPT-3 performance at 1/7th the carbon footprint.

OPT was the first fully open GPT-3-scale model, enabling researchers outside major labs to study and build on frontier-scale models. Released model weights, training code, and detailed training logbooks documenting failures and decisions. By Zhang, Roller, Goyal et al.

Paper (arXiv)

Model Details

Architecture DENSE

Parameters 175B

Paper

arXiv HTML

open-weightopen-source

Model Details

Paper

Related