Extends the original COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug.
Users interested in the following topics:
- Primary: Cancer biology data modeling
- Secondary: Machine learning, bioinformatics, and computational biology
Enables building of anti-cancer drug response prediction models using selected genes and drugs.
The Enhanced Co-Expression Extrapolation (COXEN) method enhances the original COXEN method to select genes that are predictive of the efficacies of multiple drugs, for the purpose of building general drug response prediction models that are not specific to a particular drug. It was designed for the applications where the drug efficacy data of a set of cancer cases are used to predict the response of another set of cancer cases.
The objective was to create a method that selects genes that are predictive of response to multiple anticancer drugs instead of one anticancer drug as done in previous work.
This resource could be used in drug response models, such as Single Drug Response Predictor, Unified Drug Response Predictor, and Combination Drug Response Predictor, for feature selection of genes.
The original COXEN method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. The enhanced COXEN method selects genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. It first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models.
To use the software package in this repository for enhanced COXEN analyses, users must meet the following criteria:
- Possess the basic skills to run Python scripts.
- Able to process the gene expression data and drug response data into the data format accepted by the enhanced COXEN package.
The scripts folder in this repository (https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Enhanced-COXEN/blob/main/Scripts) includes the following Python scripts:
- EnhancedCOXEN_Functions.py provides all the functions used by the enhanced COXEN method.
- Example_Run.py provides example code demonstrating how to use the functions for enhanced COXEN analysis.
The data folder in this repository (https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Enhanced-COXEN/blob/main/Data) includes a small dataset, composed of the following data files, for demonstrating the utility of this software package:
- Gene_Expression_Data_Of_Set_1.txt provides the gene expression data of cancer case set 1.
- Drug_Response_Data_Of_Set_1.txt provides the drug response data of cancer cases in set 1.
- Gene_Expression_Data_Of_Set_2.txt provides the gene expression data of cancer case set 2, for which drug response needs to be predicted.
The required input data are:
- Gene expressions of cancer case set 1 and cancer case set 2, and
- The drug response data of cancer case set 1.
The results demonstrate that genes selected by the enhanced COXEN method always provide a statistically significantly improved prediction performance (adjusted p-value ≤ 0.05) and increase the power of gene expression data for drug response prediction.
The output includes the indices of selected genes. For details of input and output data, such as data format, refer to the readme file, code comments, and example data included in the package.