CANcer Distributed Learning Environment
(CANDLE)

Short Description

Improves machine/deep learning models by performing hyperparameter optimization.

Description and Impact
Impact

Enables hyperparameter optimization on machine/deep learning models.

Description

CANDLE is software for deep learning at scale. In the early stages, the software was developed for execution on the Department of Energy’s leadership computing resources as part of the Exascale Computing Project (ECP). However, in theory the software can execute on any reasonably configured cluster. This repository serves as a place for top level documents, tests, and other artifacts that span multiple repositories in the ECP-CANDLE organization.

Hypothesis/Objective

The objective was to create a tool that can perform hyperparameter optimization for finding high performing configurations for neural networks.

Technical Elements
Uniqueness

This will be added as available.

Usability

This will be added as available.

Level of Documentation
Minimal
Components

This will be added as available.

Input Data Format
Unspecified
Results and Publications
Results

The results from the study were given by the measure of quantities relevant to the performance of a workflow system, namely:

  • System utilization: Benchmarked on 360 nodes with both random search and model-based-search, results indicate that random search has better resource utilization over model-based search, because model searches cannot proceed to the next sampling iteration until it finishes evaluating all configurations from the previous iteration.
  • Task start-up latency: This latency shows that increasing the number of nodes in the run increases the work done. 
  • Task rate scaling: Loading the software (not even the training data) takes almost a minute, even at the modest scale shown. Thus, the ability to keep the modules loaded in the Python and R interpreters from task to task, a unique Swift/T ability, is critical for these workflows.
Outputs

This will be added as available.