ATOM Modeling PipeLine
(AMPL)

Short Description

Offers an open source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.

Description and Impact
User Community
  • Primary:
    • ATOM partners and collaborators are the first adopters
    • Cancer research community with chemical datasets would benefit
  • Secondary: Free and open-source chemical informatics community
Impact
  • Free and open-source chemical property and activity modeling and prediction using machine learning
  • Industry/government/academic drug discovery
Description

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.

AMPL extends the functionality of DeepChem and supports an array of machine learning and molecular featurization tools. AMPL is an end-to-end data-driven modeling pipeline to generate machine learning models that can predict key safety and pharmacokinetic-relevant parameters. AMPL has been benchmarked on a large collection of pharmaceutical datasets covering a wide range of parameters.

Hypothesis/Objective

The objective was to build global and local “baseline” models for a wide variety of molecular properties necessary for in silico drug discovery.

Technical Elements
Uniqueness

Other software programs are available that build the same types of models. However, this is a free and open-source tool specifically designed for drug discovery datasets and models.

Usability
  • Robust enough to run at scale at ATOM and partner organizations.
  • Suggested to have basic chemistry dataset and machine learning knowledge.
  • Data pre-processing is included with AMPL.
  • Complexity requires a new user approximately two weeks to get started.
  • Machine learning experience would be very beneficial.
Level of Documentation
Minimal
Components

This will be added as available.

Inputs
  • Type of data required: 2D chemical structures with associated quantitative chemical property or activity measurements
  • Source of data required: Any source
  • Public vs. Restricted: Both
Input Data Format
Unspecified
Results and Publications
Results

Overall, AMPL produced many useful models for pharmaceutical safety properties. Predictive performance can vary based on assay type, feature type, model type, dataset size, and dataset split type.  

Outputs

Machine learning model to predict chemical properties or activities

Use Cases
Use Cases

An end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open-source library DeepChem and supports an array of machine learning and molecular featurization tools.