Publications

Results 1–25 of 43

Search results

Jump to search filters

HOMMEXX 1.0: A performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model

Geoscientific Model Development

Bertagna, Luca B.; Deakin, Michael; Guba, Oksana G.; Sunderland, Daniel S.; Bradley, Andrew M.; Kalashnikova, Irina; Taylor, Mark A.; Salinger, Andrew G.

We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single-and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

More Details

Making openMP ready for c++ executors

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Scogland, Thomas R.W.; Sunderland, Daniel S.; Olivier, Stephen L.; Hollman, David S.; Evans, Noah; De Supinski, Bronis R.

For at least the last 20 years, many have tried to create a general resource management system to support interoperability across various concurrent libraries. The previous strategies all suffered from additional toolchain requirements, and/or a usage of a shared programing model that assumed it owned/controlled access to all resources available to the program. None of these techniques have achieved wide spread adoption. The ubiquity of OpenMP coupled with C++ developing a standard way to describe many different concurrent paradigms (C++23 executors) would allow OpenMP to assume the role of a general resource manager without requiring user code written directly in OpenMP. With a few added features such as the ability to use otherwise idle threads to execute tasks and to specify a task “width”, many interesting concurrent frameworks could be developed in native OpenMP and achieve high performance. Further, one could create concrete C++ OpenMP executors that enable support for general C++ executor based codes, which would allow Fortran, C, and C++ codes to use the same underlying concurrent framework when expressed as native OpenMP or using language specific features. Effectively, OpenMP would become the de facto solution for a problem that has long plagued the HPC community.

More Details

WBS STPR 04 Milestone 4 Report

Sunderland, Daniel S.; Hoemmen, Mark F.; Trott, Christian R.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

WBS STPR 04 Milestone 4 Report

Trott, Christian R.; Sunderland, Daniel S.; Hoemmen, Mark F.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

Profiling and Debugging Support for the Kokkos Programming Model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Sunderland, Daniel S.

Supercomputing hardware is undergoing a period of significant change. In order to cope with the rapid pace of hardware and, in many cases, programming model innovation, we have developed the Kokkos Programming Model – a C++-based abstraction that permits performance portability across diverse architectures. Our experience has shown that the abstractions developed can significantly frustrate debugging and profiling activities because they break expected code proximity and layout assumptions. In this paper we present the Kokkos Profiling interface, a lightweight, suite of hooks to which debugging and profiling tools can attach to gain deep insights into the execution and data structure behaviors of parallel programs written to the Kokkos interface.

More Details

ASC ATDM Level 2 Milestone #6015: Asynchronous Many-Task Software Stack Demonstration

Bennett, Janine C.; Bettencourt, Matthew T.; Clay, Robert L.; Edwards, Harold C.; Glass, Micheal W.; Hollman, David S.; Kolla, Hemanth K.; Lifflander, Jonathan; Littlewood, David J.; Markosyan, Aram H.; Moore, Stan G.; Olivier, Stephen L.; Phipps, Eric T.; Rizzi, Francesco N.; Slattengren, Nicole S.; Sunderland, Daniel S.; Wilke, Jeremiah J.

This report is an outcome of the ASC ATDM Level 2 Milestone 6015: Asynchronous Many-Task Software Stack Demonstration. It comprises a summary and in depth analysis of DARMA and a DARMA-compliant Asynchronous Many-Task (AMT) runtime software stack. Herein performance and productivity of the over- all approach are assessed on benchmarks and proxy applications representative of the Sandia ATDM applications. As part of the effort to assess the perceived strengths and weaknesses of AMT models compared to more traditional methods, experiments were performed on ATS-1 (Advanced Technology Systems) test bed machines and Trinity. In addition to productivity and performance assessments, this report includes findings on the generality of DARMAs backend API as well as findings on interoperability with node- level and network-level system libraries. Together, this information provides a clear understanding of the strengths and limitations of the DARMA approach in the context of Sandias ATDM codes, to guide our future research and development in this area.

More Details
Results 1–25 of 43
Results 1–25 of 43