As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.
Scientific applications run on high-performance computing (HPC) systems are critical for many national security missions within Sandia and the NNSA complex. However, these applications often face performance degradation and even failures that are challenging to diagnose. To provide unprecedented insight into these issues, the HPC Development, HPC Systems, Computational Science, and Plasma Theory & Simulation departments at Sandia crafted and completed their FY21 ASC Level 2 milestone entitled "Integrated System and Application Continuous Performance Monitoring and Analysis Capability." The milestone created a novel integrated HPC system and application monitoring and analysis capability by extending Sandia's Kokkos application portability framework, Lightweight Distributed Metric Service (LDMS) monitoring tool, and scalable storage, analysis, and visualization pipeline. The extensions to Kokkos and LDMS enable collection and storage of application data during run time, as it is generated, with negligible overhead. This data is combined with HPC system data within the extended analysis pipeline to present relevant visualizations of derived system and application metrics that can be viewed at run time or post run. This new capability was evaluated using several week-long, 290-node runs of Sandia's ElectroMagnetic Plasma In Realistic Environments ( EMPIRE ) modeling and design tool and resulted in 1TB of application data and 50TB of system data. EMPIRE developers remarked this capability was incredibly helpful for quickly assessing application health and performance alongside system state. In short, this milestone work built the foundation for expansive HPC system and application data collection, storage, analysis, visualization, and feedback framework that will increase total scientific output of Sandia's HPC users.
The final review for the FY21 Advanced Simulation and Computing (ASC) Computational Systems and Software Environments (CSSE) L2 Milestone #7840 was conducted on August 25th, 2021 at Sandia National Laboratories in Albuquerque, New Mexico. The review committee/panel unanimously agreed that the milestone has been successfully completed, exceeding expectations on several of the key deliverables.