Cellular-level Pilot | Computational Resources for Cancer Research

What is the Cellular-level Pilot?

The cellular-level pilot focused on developing predictive models of drug responses in pre-clinical cancer screening to improve and expedite the selection and development of new targeted therapies for cancer patients. Using both NCI and external data sources, machine learning and deep learning models and methods were created for various tasks, including tumor type classification of omics data, drug response prediction, and dimension reduction of genetic features.

The cellular-level pilot produced numerous computational models, software, and datasets. Some of the resources developed in this pilot have been incorporated into the Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE) project.

The cellular-level pilot is part of the NCI-DOE Collaboration.

Collaborators

The cellular-level pilot concluded in 2020. The co-leads were:
• James Doroshow, National Cancer Institute
• Yvonne Evrard, Frederick National Laboratory for Cancer Research
• Rick Stevens, Argonne National Laboratory/University of Chicago

Available Project Assets

The Cellular-level pilot resources are publicly available in the Models & Software Catalogs, the Dataset Finder, and GitHub.

Drug Response Prediction Models

Unified Drug Response Predictor (Uno) – A dense neural network (DNN)-based model for single drug response prediction. Uno is part of the IMPROVE project.

Combination Drug Response Predictor (Combo) – A DNN-based model for combination drug response prediction. Combo is part of the IMPROVE project.

Imaging Generator for Tabular Data (IGTD) – A convolutional neural network (CNN)-based model that can use any tabular data (e.g., Cancer Drug Response Prediction Dataset) and be used for different prediction tasks. IGTD is part of the IMPROVE project as a single drug response prediction model.

Single Drug Response Predictor (P1B3) – A DNN or CNN-based model for single drug response prediction.

Classification Models

Tissue Type Classifier (TC1) – A CNN-based model that predicts the tumor type of a sample using RNA-seq data.

TUmor CLassIfication Predictor (TULIP) – An updated version of TC1 that can predict from over thirty tumor types.

Canine TUmor CLassIfication Predictor (cTULIP) - A modified version of TULIP for canine tumor type prediction.

Normal-Tumor Pair Classifier (NT3) – A CNN-based model that predicts whether the sample is from a tumor using RNA-seq data.

Mutation Classifier (P1B2) – A DNN-based model that predicts the tumor type using a patient’s somatic single nucleotide polymorphisms (SNPs).

Software

Learning Curves (LC) – An empirical method that evaluates whether more training data improves the performance of supervised learning models.

Gene Expression Autoencoder (P1B1) – A dimension reduction method using a sparse autoencoder for gene expression data.

Enhanced COXEN – An enhanced version of the co-expression extrapolation (COXEN) method to select genes that are predictive of multiple drug activities for improving drug response prediction performance.

Autoencoder Node Saliency (ANS) - Methods that interpret what an autoencoder has learned during training by identifying the hidden nodes that contribute to the unsupervised learning task.

Semi-Supervised Feature Learning with Center Loss (CLRNA) – A semi-supervised, autoencoder-based machine learning procedure that learns a smaller set of gene expression features that are robust to batch effects.

Datasets

Cancer Drug Response Prediction - This dataset contains gene expression data, dose-response data, and drug descriptors for developing the cellular-level project drug response prediction models (e.g., Combo).

Available Use Cases

TUmor CLassIfication Predictor (TULIP)

Prediction Generalization Of Drug Response Prediction Models Across Cancer Cell Lines Datasets

Contact Information