Dynamic Importance Sampling
(DynIm)

Short Description

Performs “dynamic” sampling where the input distribution can change over time and the sampling adapts itself to the new distribution.

Description and Impact
User Community

Researchers who want to run multiscale simulations.

Impact

Enables machine learning-based adaptive multiscale simulations for cancer biology.

Description

DynIm is a pure-Python package to perform dynamic-importance (DynIm) sampling on a high-dimensional dataset. DynIm was designed to minimize redundancy and maximize the coverage of the sampled points. DynIm uses the notion of "dissimilarity" from previously selected samples to define the importance of potential selections. Then DynIm selects the ones that are most dissimilar. Simply, DynIm provides a farthest-point sampling approach. Currently, DynIm uses L2 distances in the given high-dimensional space to define similarity and can be configured to use exact as well as approximate distances. Approximate distances are useful for computational viability for large data sizes and large data dimensionality. DynIm also provides a random sampler for comparison of sampling quality.

Hypothesis/Objective

The objective was to develop a software that could minimize redundancy and maximize the coverage of the sampled points using dynamic-importance (DynIm) sampling strategy

Resource Role

This resource is related to other protein-protein interaction and molecular stimulation assessment-based software such as MuMMI and MemSurfer.

Technical Elements
Uniqueness

This is the first tool to perform “dynamic” sampling. That is, the input distribution can change over time, and the sampling adapts itself to the new distribution. This is the key feature that makes it possible to use this in-situ and enable large multiscale simulations.

Usability

Requires installation of Faiss (outside of DynIm). After that, the code is pure-Python and straightforward to use/edit. The code is robust; However, approximate calculation of distances may require experimentation from the user to identify suitable parameters. Data does not need to be preprocessed.

Level of Documentation
Minimal
Components

Refer to the software in the Dynamic Importance Sampling GitHub repository (https://github.com/CBIIT/NCI-DOE-Collab-Pilot2-DynIm).

Inputs
  • Type of data required: High-dimensional dataset as NumPy arrays.
  • Source of data required: Generated from the target simulations.
Input Data Type
NumPy Arrays
Input Data Format
Unspecified
Results and Publications
Results

DynIm was designed to minimize redundancy and maximize the coverage of the sampled points. It employs L2 distances in the given high-dimensional space to define similarity and can be configured to use exact as well as approximate distances. This tool does "dynamic" sampling where the input distribution can change over time, and the sampling adapts itself to the new distribution. This feature makes it possible to use DynIm in-situ and enable large multiscale simulations.

Outputs

Code returns “k” samples at each request, which the user can use in any way needed. The code also creates appropriate checkpoint files as well as a history file that lists the selections made.