View Dataset Finder | Computational Resources for Cancer Research

Showing 4 Results

Showing 1-4 of 4

To view details of each card, click icon

Cancer Drug Response Prediction Dataset CDRP

PROJECT: Cellular-Level Pilot

DATASET DESCRIPTION: Collection of metadata and DataFrames used by machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines

Cancer Drug Response Prediction Dataset

CDRP

Cellular-Level Pilot

Short Description:

Collection of metadata and DataFrames used by machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines

Long Description:

This dataset contains:

DataFrames and supporting metadata used by Combo, Single Drug Response Predictor (formerly P1B3), Uno, UNOMT, CLRNA, and benchmarking machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines.
Gene expression and drug response data for cancer cell lines from the NCI-60 Human Cancer Cell Line Screen (NCI 60), NCI ALMANAC, NCI Sarcoma (SCL), NCI Small Cell Lung Cancer (SCLC), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), Genentech Cell Line Screening Initiative (gCSI), and Cancer Therapeutics Response Portal (CTRP) studies, and molecular descriptors generated using Dragon 7.0 and Mordred software packages.
Relevant metadata for the cancer cell lines and drug compounds.
A list of genes from the Library of Integrated Network-Based Cellular Signatures (LINCS) 1000 study. The LINCS1000 gene set was used as a reference to filter cancer cell line data.

The TopN DataFrames for the Cellular-Level Pilot combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. The models can be further evaluated and improved by using an empirical method, Learning curves. For more information, refer to the following links.

GitHub repository links:

CLRNA	https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Semi-Supervised-Feature-Learning-with-Center-Loss
Combo	https://github.com/CBIIT/NCI-DOE-Colab-Pilot1-Combo-combination-drug-response-predictor
Learning Curve	https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve
Single Drug Response Predictor	https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Single-Drug-Response-Predictor
Uno	https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Unified-Drug-Response-Predictor

Source links:

Aspuru-Guzik VAE	https://github.com/aspuru-guzik-group/chemical_vae
CCLE	https://portals.broadinstitute.org/ccle/data
CTRP	https://portals.broadinstitute.org/ctrp/
Dose Response AUC	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753377/
GDC	https://portal.gdc.cancer.gov/
GDSC	https://www.cancerrxgene.org/downloads/bulk_download
LINCS1000	http://lincsportal.ccs.miami.edu/dcic-portal/
NCI ALMANAC	https://dtp.cancer.gov/ncialmanac/initializePage.do
NCI PDMR	https://pdmdb.cancer.gov/web/apex/f?p=101:41
NCI Sarcoma	https://sarcoma.cancer.gov/sarcoma/downloads.xhtml
NCI Small Cell Lung Cancer	https://sclccelllines.cancer.gov/sclc/
NCI-60 - CellMiner	https://discover.nci.nih.gov/cellminer/loadDownload.do
NCI-60 - DTP	https://dtp.cancer.gov/databases_tools/bulk_data.htm
gCSI	https://pharmacodb.pmgenomics.ca/datasets/4

VERSION: Version 1

CONTENT TYPE: RNA-Seq, Drug Response, Drug Molecular Descriptors, SMILES

MoDac Linkhttps://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-8088592

CDRP Models & Software

MoDaC Link

https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-8088592

RELATED Models

RELATED Software

Drug MoA Information Drug MoA

PROJECT: Cellular-Level Pilot

DATASET DESCRIPTION: Collection of drug MoA information on FDA-approved and anti-cancer drugs

Drug MoA Information

Drug MoA

Cellular-Level Pilot

Short Description:

Collection of drug MoA information on FDA-approved and anti-cancer drugs

Long Description:

This dataset contains drug MoA information on both FDA-approved anti-cancer drugs and investigational drugs/compounds.

One text file provides the MoA information of compounds collected from the Drug Repurposing Hub of the Broad Institute. The data have been further processed to include compound name, PubChem ID, Broad Institute ID, SMILES, MoA description, and target gene symbols.
The other text file provides the MoA information of compounds/drugs included in the CTRP, GDSC, CCLE, and gCSI drug screening studies. The MoA information is curated from multiple sources and is grouped into categories. Target genes are represented by both gene symbols and Entrez IDs. Drug IDs used by the Cellular-Level Pilot project are also included.

VERSION: Version 1

CONTENT TYPE: Drug Molecular Descriptors, SMILES, Cell Line Drugs

MoDac Linkhttps://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103786

CDRP Models & Software

MoDaC Link

https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103786

RELATED Models

Drug Molecular Descriptors Drug Mol. Descrip.

PROJECT: Cellular-Level Pilot

DATASET DESCRIPTION: Collection of drug molecular descriptor data

Drug Molecular Descriptors

Drug Mol. Descrip.

Cellular-Level Pilot

Short Description:

Collection of drug molecular descriptor data

Long Description:

This dataset contains drug molecular descriptors generated using Dragon 7.0 and Mordred software packages.

One file provides the molecular descriptors for the drugs generated using Dragon 7.0 software package, which calculates 5,270 molecular descriptors. They include the simplest atom types, functional groups and fragment counts, topological and geometrical descriptors, three-dimensional descriptors, but also several properties estimation (such as logP) and drug-like and lead-like alerts (such as the Lipinski’s alert). The Dragon 7.0 software package also generates path-based fingerprints (PFP) and extended connectivity fingerprints (ECFP) for drugs.
The other file provides the molecular descriptors for the drugs generated using Mordred software package, which calculates 1,826 molecular descriptors.

For more information, refer to the GitHub Repository (https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve).

VERSION: Version 1

CONTENT TYPE: Drug Molecular Descriptors

MoDac Linkhttps://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103467

CDRP Models & Software

MoDaC Link

https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103467

PROJECT: Cellular-Level Pilot

DATASET DESCRIPTION: Combined DataFrame that includes drug response data, gene expression data, and drug molecular descriptors of top N cancer types

Integrated DataFrames of Most Prevalent Cancer Types - TopN [Top6/Top21]

TopN Cancer Types

Cellular-Level Pilot

Short Description:

Combined DataFrame that includes drug response data, gene expression data, and drug molecular descriptors of top N cancer types

Long Description:

This asset contains five files. The TopN DataFrames for the Cellular-Level Pilot combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. For more information, refer to the following source links:

Source CCLE	https://portals.broadinstitute.org/ccle/data
Source CTRP	https://portals.broadinstitute.org/ctrp/
Source GDSC	https://www.cancerrxgene.org/downloads/bulk_download
Source NCI-60 – DTP	https://dtp.cancer.gov/databases_tools/bulk_data.htm
Source gCSI	https://pharmacodb.pmgenomics.ca/datasets/4

VERSION: Version 1

CONTENT TYPE: RNA-Seq, Drug Response, Drug Molecular Descriptors

MoDac Linkhttps://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103861

CDRP Models & Software

MoDaC Link

https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-5103861

RELATED Models

Download XLSX