Canine TUmor CLassIfication Predictor
(cTULIP)

Short Description

A human-data-trained model for canine primary tumor prediction using gene expression data.

Description and Impact
Impact

Enables identification of primary tumor types in misclassified or outlier samples in canine oncological datasets.

Abstract Summary

Introduction: The domestic dog, Canis familiaris, is quickly gaining traction as an advantageous model for use in the study of cancer, one of the leading causes of death worldwide. Naturally occurring canine cancers share clinical, histological, and molecular characteristics with the corresponding human diseases.

Methods: In this study, we take a deep-learning approach to test how similar the gene expression profile of canine glioma and bladder cancer (BLCA) tumors are to the corresponding human tumors. We likewise develop a tool for identifying misclassified or outlier samples in large canine oncological datasets, analogous to that which was developed for human datasets.

Results: We test a number of machine learning algorithms and found that a convolutional neural network outperformed logistic regression and random forest approaches. We use a recently developed RNA-seq-based convolutional neural network, TULIP, to test the robustness of a human-data-trained primary tumor classification tool on cross-species primary tumor prediction. Our study ultimately highlights the molecular similarities between canine and human BLCA and glioma tumors, showing that protein-coding one-to-one homologs shared between humans and canines, are sufficient to distinguish between BLCA and gliomas.

Discussion: The results of this study indicate that using protein-coding one-to-one homologs as the features in the input layer of TULIP performs good primary tumor prediction in both humans and canines. Furthermore, our analysis shows that our selected features also contain the majority of features with known clinical relevance in BLCA and gliomas. Our success in using a human-data-trained model for cross-species primary tumor prediction also sheds light on the conservation of oncological pathways in humans and canines, further underscoring the importance of the canine model system in the study of human disease.

Hypothesis/Objective

To test the robustness of a human-data-trained primary tumor classification tool on cross-species primary tumor prediction.

Technical Elements
Uniqueness

The canine-adapted version of TULIP (cTULIP) is a deep learning Python-based classification tool that utilizes a 1-dimensional (1D) convolutional neural network (CNN) framework. It was trained on human RNA-seq data from 18 primary tumor types and utilizes 14,761 protein coding genes common between humans and canines (one-to-one orthologous mapping).

Usability

Users must have familiarity with using GitHub and the command line (i.e. Linux/Unix) to run cTULIP. Some experience with Python may be useful if the user would like to make any modifications to the code

Components

The model weight files are located in Model and Data Clearinghouse (MoDaC), and the source code is on GitHub.

Inputs

cTULIP accepts RNA-seq data expressed as FPKM-UQ in CSV, XLSX, and TSV file formats.

Results
Outputs

Prediction files in CSV format.