Publications

Results 26–44 of 44

Search results

Jump to search filters

Toward performance portability of the Albany finite element analysis code using the Kokkos library

International Journal of High Performance Computing Applications

Demeshko, Irina; Watkins, Jerry E.; Kalashnikova, Irina; Guba, Oksana G.; Spotz, William S.; Salinger, Andrew G.; Pawlowski, Roger P.; Heroux, Michael A.

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This article presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We present performance results for the Aeras global atmosphere dynamical core module in Albany. Numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.

More Details

Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

Watkins, Jerry E.; Kalashnikova, Irina

The subject of this report is the performance portability of the Aeras global atmosphere dynamical core (implemented within the Albany multi-physics code) to new and emerging architecture machines using the Kokkos library and programming model. We describe the process of refactoring the finite element assembly process for the 3D hydrostatic model in Aeras and highlight common issues associated with development on GPU architectures. After giving detailed build and execute instructions for Aeras with MPI, OpenMP and CUDA on the Shannon cluster at Sandia National Laboratories and the Titan supercomputer at Oak Ridge National Laboratory, we evaluate the per- formance of the code on a canonical test case known as the baroclinic instability problem. We show a speedup of up to 4 times on 8 OpenMP threads, but we were unable to achieve a speedup on the GPU due to memory constraints. We conclude by providing methods for improving the performance of the code for future optimization.

More Details

The Aeras Next Generation Global Atmosphere Model

Bosler, Peter A.; Bova, S.W.; Demeshko, Irina P.; Fike, Jeffrey A.; Guba, Oksana G.; Overfelt, James R.; Roesler, Erika L.; Salinger, Andrew G.; Smith, Thomas M.; Kalashnikova, Irina; Watkins, Jerry E.

The Next Generation Global Atmosphere Model LDRD project developed a suite of atmosphere models: a shallow water model, an x-z hydrostatic model, and a 3D hydrostatic model, by using Albany, a finite element code. Albany provides access to a large suite of leading-edge Sandia high-performance computing technologies enabled by Trilinos, Dakota, and Sierra. The next-generation capabilities most relevant to a global atmosphere model are performance portability and embedded uncertainty quantification (UQ). Performance portability is the capability for a single code base to run efficiently on diverse set of advanced computing architectures, such as multi-core threading or GPUs. Embedded UQ refers to simulation algorithms that have been modified to aid in the quantifying of uncertainties. In our case, this means running multiple samples for an ensemble concurrently, and reaping certain performance benefits. We demonstrate the effectiveness of these approaches here as a prelude to introducing them into ACME.

More Details
Results 26–44 of 44
Results 26–44 of 44