Imaging Generator for Tabular Data
(IGTD)

Short Description

Transforms tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image.

Description and Impact
User Community

Users interested in the following subjects:

  • Primary: Machine learning and computational data modeling
  • Secondary: Bioinformatics and computational biology
Impact

Convolutional neural networks (CNNs) can be built based on the image representations for prediction tasks.

Description

Image Generator for Tabular Data (IGTD) is an algorithm for transforming tabular data into images. The algorithm assigns each feature to a unique pixel position in the image representation. The algorithm assigns similar features to neighboring pixels and assigns dissimilar features to pixels that are far apart. As a result of these assignments, the algorithm generates an image for each sample, in which the pixel intensity reflects the value of the corresponding feature in the sample. One of the most important applications for the generated images is to build convolutional neural networks (CNNs) based on the image representations in subsequent analysis. 

Hypothesis/Objective

The objective was to transform tabular data into images for subsequent deep learning analysis using CNNs.

Resource Role

This resource can be used with other resources utilizing a CNN framework.

Technical Elements
Uniqueness

IGTD is a novel algorithm for transforming tabular data into images. Compared with existing methods for converting tabular data into images, IGTD has several advantages:

  • IGTD does not require prior knowledge about the features. Thus, users can use it even without domain knowledge.
  • IGTD generates compact image representations, in which each pixel represents a unique feature. Deep learning based on compact image representations usually requires less memory and time to train the prediction model.
  • IGTD generates compact image representations promptly, which also better preserves the feature neighborhood structure.
  • CNNs trained on IGTD images achieve a better (or similar) prediction performance than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.
  • IGTD provides a flexible framework that users can extend to accommodate diversified data and requirements. In this flexible framework, users can choose size and shape of the image representation.
Usability

To use the software package in this repository, users must meet the following criteria:

  • Possess the basic skills to program and run Python scripts.
  • Able to process the input data into the data format accepted by the package.
  • Understand the input parameters of the IGTD algorithm, so that they can set the parameters appropriately to execute the algorithm.
Level of Documentation
Minimal
Components

The Scripts folder in this repository includes the following Python scripts:

  • IGTD_Functions.py provides all the functions used by the IGTD algorithm. It provides comments explaining the input and output of each function.
  • Example_Run.py provides examples showing how to run the IGTD algorithm for demonstration purposes.

The Data folder in this repository includes a small dataset for demonstrating its utility, which is a gene expression dataset including 100 cancer cell lines and 1600 genes.

Inputs

The required input data are tabular data, in which rows are samples and columns are features.

Input Data Type
Agnostic
Input Data Format
Tabular
Results and Publications
Results

IGTD work has the following contributions.

  • IGTD transforms tabular data into images using a novel approach, which minimizes the difference between feature distance ranking and pixel distance ranking. The optimization keeps similar features close in the image representation.
  • Compared with existing approaches of transforming tabular data into images, IGTD does not require domain knowledge and provides compact image representations with a better preservation of feature neighborhood structure.
  • Using drug response prediction as an example, CNNs trained on IGTD image representations provide a better (or similar) prediction performance than CNNs trained on other image representations and prediction models trained on the original tabular data.
  • IGTD is a flexible framework that can be extended to accommodate diversified data and requirements.

Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.

Outputs

The output includes:

  • Generated image representations in both picture and text file formats,
  • Errors and indices for rearranging features that are obtained in the optimization process,
  • Visualization plots of distance rank matrices of features (before and after optimization), and
  • Visualization plot of distance rank matrix of pixel in image.

For details of input and output data, such as data format and input parameters, refer to the readme file, code comments, and example data included in the package.