The first NVIDIA DGX-2 AI supercomputers in the U.S. have arrived at the nation’s leading research labs for work driving important scientific discoveries.
DGX-2 systems provide more than two petaflops of deep learning computing power from 16 NVIDIA Tesla V100 Tensor Core GPUs interconnected with NVSwitch technology.
They will go to Brookhaven National Laboratory, in Upton, New York; Oak Ridge National Laboratory, in Oak Ridge, Tennessee; Pacific Northwest Laboratory, in Richland, Washington; and Sandia National Laboratories, in Albuquerque, New Mexico.
The labs are engaged in work ranging from fusion research and climate simulation to human genomics.
Designed to handle the most compute-intensive applications, DGX-2 systems offer performance breakthroughs in the most demanding areas of scientific computing, AI and machine learning.
Here’s how each of the labs intends to use the systems:
Brookhaven National Laboratory
Brookhaven Lab’s Machine Learning group is using the DGX-2 to evaluate several deep learning algorithms for advanced image analysis. To start, the team is examining how well certain algorithms scale, if scaling is linear and where it tops out. They’re also determining how their algorithms can be best optimized to run on the GPU-dense DGX-2.
Additionally, Brookhaven’s cryogenic electron microscopy group and National Synchrotron Light Source II facility team are planning to test how well their machine learning-based streaming, real-time analysis workflows will perform on the system, particularly with high data throughput and multiple users.
“We want to take HPC workloads that are not machine learning-centric many of which are employed in Brookhaven’s core high-energy physics and nuclear science research and conduct performance analyses comparing DGX-2 to our earlier DGX systems,” said Nicholas D’Imperio, Computational Science Laboratory chair with Brookhaven’s Computational Science Initiative, who is spearheading some of the early work using the DGX-2. “This will provide a better picture of performance gains overall that may improve how we enable legacy codes and HPC workflows to operate on such GPU-dense systems.”
Oak Ridge National Laboratory
Oak Ridge National Laboratory’s state-of-the-art experimental facilities and instrumentation produce enormous, and unique, scientific datasets. ORNL will use the DGX-2 systems for data analytics on these datasets, and also for production and development work.
Researchers Valentine Anantharaj and Drew Schmidt, both of the ORNL’s National Center for Computational Sciences, are using the DGX-2 systems to develop innovative techniques based on machine learning to improve the fidelity of complex physical processes in weather and climate simulations.
The system will also provide an onramp to Summit — the world’s most powerful supercomputer — by enabling smaller and more experimental projects to be developed and tested on a Summit-like platform, freeing up Summit to conduct world-class science.
“The NVIDIA DGX-2 platform allows us to analyze data in unique ways, revealing insights that might otherwise remain hidden,” said Jeff Nichols, associate laboratory director for computing and computational sciences at ORNL. “And because its architecture is so similar to Summit’s, the DGX-2 enables the experimentation necessary to ensure that Summit reaches its full potential, particularly in terms of analytics and artificial intelligence.”
Oak Ridge National Laboratory also received the newly available NVIDIA DGX-2H, which contains upgraded CPUs and faster-clocked Tesla V100 GPUs for use in their most computationally intensive workloads.
Pacific Northwest National Laboratory
Pacific Northwest National Laboratory intends to model atmospheric phenomena, such as hurricane intensity, using 4-dimensional temperature and pressure profiles across thousands of square kilometers. Additionally, PNNL intends to use the DGX-2 to improve the precision of whole-body millimeter wave scanning technology, improving safety at airports and reducing false alarms.
The hope is to offer scientists the ability to study systems that were much larger and more complex than was possible before, unlocking new scientific questions and deep learning approaches to advance the state of the art.
Sandia National Laboratories
Sandia has acquired the DGX-2 systems to be the backbone infrastructure for the newly developed Machine Learning as a Service. A team of machine learning subject matter experts will provide a portal and support for a wide range of engineering, code development and scientific problems. The goal is to allow engineers and scientists unfamiliar with machine learning to take advantage of this capability to rapidly solve difficult problems.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. : SAND2018-12918 W.