This is a winning submission from the 2024 AI Data Readiness Challenge.
Explores the AI Data Readiness of CRDC data.
This asset contains the submission from Abhishek Jha and team of Elucidata, the first place winner of the Tier 2: Multi-Modal Data Challenge. In this tier, participants must train an AI/ML model utilizing data from more than one data class.
Use Case: Tier 2 (Multimodal data), Category 4 (Diagnosis)
General use case: Classify cancer cells versus healthy cells in a specific tissue
Specific use case: Use of transcriptomics (RNA-seq) from GDC and proteomics data from PDC to distinguish primary tumor from normal solid tissue in lung in the context of Lung squamous cell carcinoma
A data scientist can run the provided scripts after obtaining the appropriate data from the CGC.
The documentation, pre-processing, and model related files are available in Model and Data Clearinghouse (MoDaC). The data can be accessed via the Cancer Genomic Cloud (CGC).
Assessment of dataset readiness and model predictions.