AIDR Challenge Tier 2: Jha
(AIDR)

Short Description

This is a winning submission from the 2024 AI Data Readiness Challenge.

Description and Impact
Impact

Explores the AI Data Readiness of CRDC data.

Hypothesis/Objective

This asset contains the submission from Abhishek Jha and team of Elucidata, the first place winner of the Tier 2: Multi-Modal Data Challenge. In this tier, participants must train an AI/ML model utilizing data from more than one data class.   

Technical Elements
Uniqueness

Use Case: Tier 2 (Multimodal data), Category 4 (Diagnosis)  

General use case: Classify cancer cells versus healthy cells in a specific tissue  

Specific use case: Use of transcriptomics (RNA-seq) from GDC and proteomics data from PDC to distinguish primary tumor from normal solid tissue in lung in the context of Lung squamous cell carcinoma 

Usability

A data scientist can run the provided scripts after obtaining the appropriate data from the CGC.

Components

The documentation, pre-processing, and model related files are available in Model and Data Clearinghouse (MoDaC). The data can be accessed via the Cancer Genomic Cloud (CGC).

Results
Outputs

Assessment of dataset readiness and model predictions.