Protein language model. 1.2B parameter autoregressive Transformer trained on 280M protein sequences conditioned on taxonomic and functional annotations. Generates functional protein sequences from natural language-like controls.

Published in Nature Biotechnology (2023). Generated artificial proteins that were experimentally validated as functional. One of the earliest examples of large language models applied to protein engineering. By Madani, Krause et al.

Model Details

Architecture DENSE
Parameters 1.2B

Paper

arXiv: 2004.03497

Venue: Nature Biotechnology 2023

scientificopen-source