"Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning." Builds on BioT5 by incorporating IUPAC nomenclature, extensive biomedical data from bioRxiv and PubChem, multi-task instruction tuning, and improved numerical tokenization. Bridges molecular representations with textual descriptions for comprehensive biological entity understanding.

Evaluated on 3 problem types (classification, regression, generation), 15 task types, and 21 benchmark datasets, achieving SOTA on most. Targets drug discovery and bioinformatics applications. By Pei, Wu, Gao, Liang, Fang, Zhu, Xie, Qin, and Yan from Microsoft Research AI for Science Asia.

Paper

scientificresearch

Related