Multiscale Machine-Learned Modeling Infrastructure (MuMMI) | Computational Resources for Cancer Research

Short Description

Supports very large and multiscale simulations of molecular dynamic interactions between proteins (or their domains) with each other or with cell membranes.

User Community

Experienced data scientists, computational scientists, artificial intelligence researchers, clinical researchers, and all researchers dealing with sensitive data assets.

Impact

Produces data like KRas4B Campaign 1 Trajectory data for use in models.

Description

The ADMIRRAL team developed the Multiscale Machine-learned Modeling Infrastructure (MuMMI) methodology to study the interaction of active KRAS with the plasma membrane (and related phenomena) on very large temporal and spatial scales. To achieve this, MuMMI connects a macro model of the system to a micro model using “dynamic importance sampling”, based on machine learning (ML), implementing the workflow on world-class supercomputing resources. The ADMIRRAL team also applied MuMMI to a different biological system consisting of a new lipid bilayer with a new type of protein embedded. MuMMI connects biological models of the membrane-protein system on two different scales:

A macro model based on the classical approximation theory for liquids in which 300 proteins (modeled as single-particle beads) move around on a 1x1 μm² cross section of a perfectly flat (2D) plasma membrane.
A micro model that uses Martini coarse-grained (CG) molecular dynamics (MD) simulations to model a 30nm x 30nm “patch” of the macro model that necessarily contains at least one RAS protein.

(ADMIRRAL is the AI-Driven Multi-Scale Investigation of the RAS/RAF Activation Lifecycle.)

Hypothesis/Objective

The objective was to design a multiscale simulation infrastructure to study the dynamic heterogeneity of RAS signaling regulated by the cell membrane.

Resource Role

This resource is related to other protein-protein interaction and molecular stimulation assessment-based software such as MemSurfer and DynIm.

Uniqueness

MuMMI performs massively parallel multiscale simulations using an ML-driven sampling framework. The first layer is a macro scale (Dynamic Density Functional Theory [DDFT] model) with an overlaid MD simulation of RAS particles. For example, the ADMIRRAL team extracted 30 x 30 nm² patches from the 1 x 1 μm² macro snapshots and simulated them at the CG MD level. MuMMI runs each selected patch concurrently, occupying available resources as much as possible.

Usability

Requirements for MuMMI

Initial macro model parameters (from CG training simulations):
1. MuMMI takes radial distribution functions (RDFs) from analysis of the Martini MD CG force field parameters and converts them to free-energy functionals that the macro model needs.
2. Also needed from the CG simulations: lipid self-diffusion coefficients to get the following items:
  - The mobility parameters for the macro model
  - Potentials of mean force
  - Direct correlation function
  - Self-diffusion coefficients
  - Protein diffusivity
  - Initial protein conformations (These conformations require 30 CG MD simulations of standard patch size.)
Stable CG (and macro) simulations.
Pre-training of model for encoding lipid configurations.
Protein density on membrane.
Initial library of protein conformations to sample from during CG simulations.
Working Martini parameter set, structures of the proteins and lipids.
Other physical parameters such as CG setup pull-protein-to-membrane speed, cut-off radii, and so on.
Optimization of analysis routines so that, by using three CPU cores for each simulation, the MuMMI WM can keep up with the frequency of incoming frames from ddcMD.
Crystal structure of active proteins in the lipid membrane context.
Required experimental measurements.
Biologically relevant membrane compositions and test of the membrane protein's stability in both models.
Preparation of the lipid bilayer, such as lipid spacing in each leaflet.
Modeled and optimized (minimized and equilibrated) protein.
CG beads version of the protein structure calculated using martinize.py.
CG-modeled/parametrized protein with all sanity checks.
Extensive sets of CG simulations that perform the following actions:
1. Validate the behavior of mixed lipid systems with and without RAS.
2. Provide input parameters for the macro model resulting in preliminary CG simulation data, CG MD Martini parameterization simulations, or training data.
These actions result in the following parameters to the macro model:
1. Diffusion coefficients for the different lipids.
2. Diffusion coefficients for RAS in the two different orientational states.
3. Lipid-lipid correlation functions.
4. Potentials for lipid-RAS and RAS-RAS interactions.
5. State change rates for RAS.
Hidden Markov model (HMM) analysis to determine orientational states of the protein: The HMM analysis found RAS is generally in two metastable states in the macro model and three states in the micro model.
Hyperparameter optimization and data augmentation (rotations) on the variational autoencoder model to work for the data for the particular biological system.
MemSurfer to perform basic analysis of membrane simulations (such as local areal densities) in preparation for creating a macro model from CG MD data.

Level of Documentation

Minimal

Components

Suite Components

Maestro Workflow Conductor: This component is a Python-based WM that MuMMI uses to run the macro model on partitions of the nodes, run inference on lipid patches to determine their importance, instantiate the CG setup jobs, spawn and track the CG simulations on the important patches, and run the in-situ analysis. It interfaces with Flux in the backend. For more information, refer to Maestro on GitHub.
Flux: This component is the resource manager that MuMMI uses to allow the WM to break up the allocated nodes in custom, optimized ways. You can configure and run it inside of allocated jobs after the scheduler optimally places them on the nodes. Flux assigns the jobs selected in Maestro to the backend scheduler. MuMMI uses a Maestro plugin for Flux to allow the WM’s interface to remain virtually independent of the ongoing development within Flux and to allow the option to switch schedulers in the future. For more information, refer to Flux on GitHub.
ddcMD: This component is Lawrence Livermore National Laboratory's own GPU-accelerated MD software that uses the Martini force field, and it is faster than competitors such as AMBER, GROMACS, and so on. MuMMI uses ddcMD in two ways:
- MuMMI uses a CPU-only version of it to integrate protein equations of motion in the macro model and
- MuMMI uses a customized GPU version of it for the micro model CG simulations using the Martini force field.
For more information, refer to ddcMD on GitHub and ddcMD-utilities on GitHub.
GridSim2D/Moose: This component is the finite element software implementing the equations of motion for the lipids within the dynamic density functional theory framework that is the larger part of the macro model. MuMMI implements the other part of the macro model using a CPU-only version of ddcMD to simulate the protein beads on the lipid membrane, which interact through potentials of mean force. For more information, refer to Moose.
Data Broker: This component improves data management, improves input/output operations, and allows fast data storage and retrieval with database-level fault tolerance. For more information, refer to pytaridx on GitHub.
DynIm: This component is the dynamic importance sampling software that interfaces with the MuMMI WM (for running inference). For more information, refer to DynIm on GitHub.
MemSurfer: This component is an analysis tool that is not part of the MuMMI workflow. MemSurfer is an efficient and versatile tool to compute and analyze membrane surfaces found in a wide variety of large-scale molecular simulations. For more information, refer to MemSurfer on GitHub.

Input Data Format

Unspecified

Results

This resource found the RAS-RAS interactions to be interfacially nonspecific and validates all previously proposed interfaces as part of a broad ensemble of possible interactions. In addition, these interactions suggest that RAS multimer formation is mediated by lipids rather than specific interfacial contacts. These findings can potentially identify non-enzymatically relevant mutations relevant for impacting tumor growth. MuMMI reveals both the broad scope and fine details of membrane remodeling that underlies functionally relevant RAS-lipid dynamics.

Primary Publication

Machine Learning–driven Multiscale Modeling Reveals Lipid-dependent Dynamics of RAS Signaling Proteins

Other Publications

A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer

Outputs

The output of this resource is a trajectory dataset.

AVAILABLE ON GITHUB

https://github.com/CBIIT/NCI-DOE-Collab-Pilot2-MuMMI