What is ATOM?

The Accelerating Therapeutics for Opportunities in Medicine (ATOM) began as a consortium, a partnership between the public and private sectors with a vision to revolutionize drug discovery by accelerating the development of more effective therapies for patients. Continuing now as a collaborative open-source effort, ATOM continues its aim to transform drug discovery from a slow, sequential, and high-failure process into a rapid, integrated, and patient-centric model.

 

The consortium was officially established in October 2017, with GSK, Lawrence Livermore National Laboratory, Frederick National Laboratory for Cancer Research, and the University of California, San Francisco as founding members. The ATOM Consortium grew to include the U.S. Department of Energy’s Brookhaven and Oak Ridge National Laboratories. ATOM is part of the NCI-DOE Collaboration and continues to deliver open-source predictive and generative AI software, models, and educational resources to the drug discovery community.

Current Contributing Organizations
  • Brookhaven National Laboratory
  • Frederick National Laboratory for Cancer Research
  • Lawrence Livermore National Laboratory 
  • Oak Ridge National Laboratory
  • University of California, San Francisco
Goal

A main goal of ATOM is to develop a sustainable pre-clinical drug design and optimization platform that leverages computation to help shorten the successful drug discovery timeline. ATOM's innovative model-driven and patient-centric approach provides a faster, more streamlined process that delivers better results. By embracing ATOM's integrated model, the growing community transforms drug discovery into a more effective and efficient system that benefits patients and researchers globally.

Datasets and Methods

ATOM is developing and validating a precompetitive, preclinical, small molecule model-driven drug discovery platform to optimize pharmacokinetics, toxicity, protein-ligand interactions, systems-level models, molecular design, and novel compound generation. To achieve this, the robust ATOM Modeling Pipeline (AMPL) has been developed to enable advanced and emerging machine learning (ML) approaches to create FAIR (findable, accessible, interoperable, and reusable) computational models for drug discovery.

 

This modular pipeline has been designed to couple with generative molecular design to concurrently optimize multiple therapeutic, physical, and synthetic parameters necessary for drug discovery. ATOM’s active learning design platform aims to selectively incorporate results from mechanistic simulation and human-relevant data to generate and optimize new drug candidates significantly faster and with greater success than prior laboratory-intensive processes.

Modifications and Limitations

The combination of computing and machine learning has been shown to accelerate molecular optimization for applications ranging from cancer to infectious disease therapeutics. ATOM has successfully demonstrated multiparameter property optimization across efficacy, safety, pharmacokinetics, and developability. The AMPL system and the upcoming Generalized Generative Molecular Design platform work together to optimize candidate compounds to meet pharmaceutical parameters.

What is the ATOM Modeling PipeLine (AMPL)?

AMPL is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.  One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, AMPL has developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. 

 

As a general system, AMPL can create molecular predictive models based on nearly any data associated with a 2D SMILES string.

 

In the interest of growing the community, models may be shared with the community through the NCI Predictive Oncology Model and Data Clearinghouse (MoDaC).

Datasets and Methods

The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open-source library DeepChem and supports an array of ML and molecular featurization tools. AMPL is an end-to-end data-driven modeling pipeline to generate machine learning models that can predict key safety and pharmacokinetic-relevant parameters. AMPL has been benchmarked on a large collection of pharmaceutical datasets covering a wide range of parameters.   

Modifications and Limitations

This release marks the first public availability of the ATOM Modeling PipeLine (AMPL). Installation instructions for setting up and running AMPL are described in GitHub. Basic examples of model fitting and prediction are also included. AMPL has been deployed to and tested in multiple computing environments by ATOM members and contributors. Detailed documentation for the majority of the available features is included. ATOM is a living software project with active development. Check back for continued updates. Feedback is welcome and appreciated, and the project is open to contributions.

 

AMPL has been run on platforms ranging from Linux laptops, cloud computing, and even the largest exascale systems.

 

Limitations: Typically, the quantity of a few hundred data points is recommended for initial model creation, with an expectation of adding more data points guided by active learning to improve the initial predictive model. With larger numbers of data points (on the order of a few thousand), initial models are likely to be more initially predictive in the chemical space covered by the data.

Educational Materials

AMPL has released a series of tutorials using Google Colab (for the beginner) and more advanced features for the advanced programmers. AMPL can be run from the command line or by importing into Python scripts and Jupyter notebooks. See the tutorials at this link.

Contact Information

Contact us at this link or by email at computational-cancer-tech@nih.gov.