Adaptive Language Model Training for Molecular Design

Publication Type
Journal Article
Publication Year
2023
Authors
Blanchard, Andrew
Bhowmik, Debsindhu
Fox, Zachary
Gounley, John
Glaser, Jens
Akpa, Belinda S.
Irle, Stephan
Abstract

The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, the community has applied masked language models to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (that is, using tokenization) and predict rearrangements (that is, using mask prediction). Here, the authors consider how they can adapt language models to improve molecule generation for different optimization tasks. The authors use two different generation strategies for comparison, fixed and adaptive: 

  • The fixed strategy uses a pre-trained model to generate mutations; 
  • The adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. 

The results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, the authors suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. The authors demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. The results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.

Citation
Date
Issue
1
Volume
15
Publication Title
Journal of Cheminformatics
ISSN
1758-2946
DOI
https://doi.org/10.1186/s13321-023-00719-7