Showing 4 Results
Showing 1-4 of 4
Dataset

To view details of each card, click icon

DATASET DESCRIPTION: Collection of metadata and DataFrames used by machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines
Cancer Drug Response Prediction Dataset
CDRP
Short Description:

Collection of metadata and DataFrames used by machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines

Long Description:

This dataset contains:

  • DataFrames and supporting metadata used by Combo, Single Drug Response Predictor (formerly P1B3), Uno, UNOMT, CLRNA, and benchmarking machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines.
  • Gene expression and drug response data for cancer cell lines from the NCI-60 Human Cancer Cell Line Screen (NCI 60), NCI ALMANAC, NCI Sarcoma (SCL), NCI Small Cell Lung Cancer (SCLC), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), Genentech Cell Line Screening Initiative (gCSI), and Cancer Therapeutics Response Portal (CTRP) studies, and molecular descriptors generated using Dragon 7.0 and Mordred software packages.
  • Relevant metadata for the cancer cell lines and drug compounds.
  • A list of genes from the Library of Integrated Network-Based Cellular Signatures (LINCS) 1000 study. The LINCS1000 gene set was used as a reference to filter cancer cell line data.

The TopN DataFrames for the Cellular-Level Pilot combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. The models can be further evaluated and improved by using an empirical method, Learning curves. For more information, refer to the following links.

GitHub repository links:

CLRNA

https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Semi-Supervised-Feature-Learning-with-Center-Loss

Combo

https://github.com/CBIIT/NCI-DOE-Colab-Pilot1-Combo-combination-drug-response-predictor

Learning Curve

https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve

Single Drug Response Predictor

https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Single-Drug-Response-Predictor

Uno

https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Unified-Drug-Response-Predictor

 

Source links:

Aspuru-Guzik VAE

https://github.com/aspuru-guzik-group/chemical_vae

CCLE

https://portals.broadinstitute.org/ccle/data

CTRP

https://portals.broadinstitute.org/ctrp/

Dose Response AUC

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753377/

GDC

https://portal.gdc.cancer.gov/

GDSC

https://www.cancerrxgene.org/downloads/bulk_download

LINCS1000

http://lincsportal.ccs.miami.edu/dcic-portal/

NCI ALMANAC

https://dtp.cancer.gov/ncialmanac/initializePage.do

NCI PDMR

https://pdmdb.cancer.gov/web/apex/f?p=101:41

NCI Sarcoma

https://sarcoma.cancer.gov/sarcoma/downloads.xhtml

NCI Small Cell Lung Cancer

https://sclccelllines.cancer.gov/sclc/

NCI-60 - CellMiner

https://discover.nci.nih.gov/cellminer/loadDownload.do

NCI-60 - DTP

https://dtp.cancer.gov/databases_tools/bulk_data.htm

gCSI

https://pharmacodb.pmgenomics.ca/datasets/4

 

VERSION: Version 1
CONTENT TYPE: RNA-Seq, Drug Response, Drug Molecular Descriptors, SMILES
CDRP Models & Software
DATASET DESCRIPTION: Collection of drug MoA information on FDA-approved and anti-cancer drugs
Drug MoA Information
Drug MoA
Short Description:

Collection of drug MoA information on FDA-approved and anti-cancer drugs

Long Description:

This dataset contains drug MoA information on both FDA-approved anti-cancer drugs and investigational drugs/compounds.

  • One text file provides the MoA information of compounds collected from the Drug Repurposing Hub of the Broad Institute. The data have been further processed to include compound name, PubChem ID, Broad Institute ID, SMILES, MoA description, and target gene symbols.
  • The other text file provides the MoA information of compounds/drugs included in the CTRP, GDSC, CCLE, and gCSI drug screening studies. The MoA information is curated from multiple sources and is grouped into categories. Target genes are represented by both gene symbols and Entrez IDs. Drug IDs used by the Cellular-Level Pilot project are also included.
VERSION: Version 1
CONTENT TYPE: Drug Molecular Descriptors, SMILES, Cell Line Drugs
CDRP Models & Software
DATASET DESCRIPTION: Collection of drug molecular descriptor data
Drug Molecular Descriptors
Drug Mol. Descrip.
Short Description:

Collection of drug molecular descriptor data

Long Description:

This dataset contains drug molecular descriptors generated using Dragon 7.0 and Mordred software packages.

  • One file provides the molecular descriptors for the drugs generated using Dragon 7.0 software package, which calculates 5,270 molecular descriptors. They include the simplest atom types, functional groups and fragment counts, topological and geometrical descriptors, three-dimensional descriptors, but also several properties estimation (such as logP) and drug-like and lead-like alerts (such as the Lipinski’s alert). The Dragon 7.0 software package also generates path-based fingerprints (PFP) and extended connectivity fingerprints (ECFP) for drugs.
  • The other file provides the molecular descriptors for the drugs generated using Mordred software package, which calculates 1,826 molecular descriptors.

For more information, refer to the GitHub Repository (https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve).

VERSION: Version 1
CONTENT TYPE: Drug Molecular Descriptors
CDRP Models & Software
DATASET DESCRIPTION: Combined DataFrame that includes drug response data, gene expression data, and drug molecular descriptors of top N cancer types
Integrated DataFrames of Most Prevalent Cancer Types - TopN [Top6/Top21]
TopN Cancer Types
Short Description:

Combined DataFrame that includes drug response data, gene expression data, and drug molecular descriptors of top N cancer types

Long Description:

This asset contains five files. The TopN DataFrames for the Cellular-Level Pilot combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. For more information, refer to the following source links:

Source CCLE

https://portals.broadinstitute.org/ccle/data

Source CTRP

https://portals.broadinstitute.org/ctrp/

Source GDSC

https://www.cancerrxgene.org/downloads/bulk_download

Source NCI-60 – DTP

https://dtp.cancer.gov/databases_tools/bulk_data.htm

Source gCSI

https://pharmacodb.pmgenomics.ca/datasets/4

VERSION: Version 1
CONTENT TYPE: RNA-Seq, Drug Response, Drug Molecular Descriptors
CDRP Models & Software