To view details of each card, click icon
Collection of metadata and DataFrames used by machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines
This dataset contains:
- DataFrames and supporting metadata used by Combo, Single Drug Response Predictor (formerly P1B3), Uno, UNOMT, CLRNA, and benchmarking machine learning models in the Cellular-Level Pilot project to predict drug response in various cancer cell lines.
- Gene expression and drug response data for cancer cell lines from the NCI-60 Human Cancer Cell Line Screen (NCI 60), NCI ALMANAC, NCI Sarcoma (SCL), NCI Small Cell Lung Cancer (SCLC), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), Genentech Cell Line Screening Initiative (gCSI), and Cancer Therapeutics Response Portal (CTRP) studies, and molecular descriptors generated using Dragon 7.0 and Mordred software packages.
- Relevant metadata for the cancer cell lines and drug compounds.
- A list of genes from the Library of Integrated Network-Based Cellular Signatures (LINCS) 1000 study. The LINCS1000 gene set was used as a reference to filter cancer cell line data.
The TopN DataFrames for the Cellular-Level Pilot combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. The models can be further evaluated and improved by using an empirical method, Learning curves. For more information, refer to the following links.
GitHub repository links:
Source links:
Aspuru-Guzik VAE |
|
CCLE |
|
CTRP |
|
Dose Response AUC |
|
GDC |
|
GDSC |
|
LINCS1000 |
|
NCI ALMANAC |
|
NCI PDMR |
|
NCI Sarcoma |
|
NCI Small Cell Lung Cancer |
|
NCI-60 - CellMiner |
|
NCI-60 - DTP |
|
gCSI |