Milestone 1241
Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities.
Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities.
This report documents the completion of milestone STPRO4-26 Engaging the C++ Committee. The Kokkos team attended the three C++ Committee meetings in San Diego, Hawaii, and Cologne with multiple members, updated multiple in-flight proposals (e.g. MDSpan, atomic ref), contributed to numerous proposals central for future capabilities in C++ (e.g. executors, affinity) and organized a new effort to introduce a Basic Linear Algebra library into the C++ standard. We also implemented a production quality version of mdspan as the basis for replacing the vast majority of the implementation of Kokkos::View, and thus start the transitioning of one of the core features in Kokkos to its future replacement.
This report documents the completion of milestone STPRO4-25 Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities. The ROCM code was hardened up to the point of passing all Kokkos unit tests - then AMD deprecated the programming model, forcing us to start over in FY20 with HIP. The Intel ECP Path Forward architecture prototype was developed with some initial capabilities on simulators - but plans changed, so that work will not continue. Instead SYCL will be developed as a backend for Aurora. Remote Spaces was improved. Development is ongoing part of a collaboration with NVIDIA.
This report documents the completion of milestone STPM12-19 Documented Kokkos application usecases. The goal of this milestone was to develop use case examples for common patterns users implement with Kokkos. This work was performed in the fourth quarter of FY19 and resulted in use case descriptions available in the Kokkos Wiki, with code examples.
This report documents the completion of milestone STPM12-17 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held at Argonne National Laboratories from August 27 — August 29, 2019. Attendance being lower than expected (we believe largely due to bad timing), the team focused with a select set of ECP partners on early work in preparation for Aurora. In particular we evaluated issues posed by exposing SYCL and OpenMP target offload to applications via the Kokkos Pro Model.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report documents the completion of milestone STPRO4-13 "Documented Kokkos API", which is part of the Exascale Computing Project (ECP). The goal of this Milestone was to generate documentation for the Kokkos programming model accessible to the open HPC community, beyond what was available via the tutorials. The total documentation for Kokkos now contains the equivalent of about 250 pages in text book format. About a third of it is contained in a more text book like style like the Kokkos Programming Guide, while most of the rest is an API reference modelled after popular C++ reference webpages. On the order of 175 pages was generated new as part of the work for this milestone.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Due to the cost of hardware failures within mission critical and scientific applications, it is necessary for software to provide a mechanism to prevent or recover from interruptions. The Kokkos ecosystem is a programming environment that provides performance and portability to many applications that run on DOE supercomputers as well as smaller scale systems. These applications require a higher level of service due to the cost associated with each simulation or the critical nature of the mission. Software resilience enables an application of manage hardware failures reducing the cost of an interruption. Two different resilience methodologies have been added to the Kokkos ecosystem: checkpointing has been added for restart capabilities and a resilient execution model has been added to account for failures in compute devices. The design and implementation of each of these additions are described, and appropriate examples are included for end users.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Scope and Objectives: Kokkos Support provides cyber resources and conducts training events for current and prospective Kokkos users; In person training events are organized in various venues providing both generic Kokkos tutorials with lectures and exercises, as well as hands-on work on users applications.
Abstract not provided.
Abstract not provided.
Parallel Computing
Sparse matrix-matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, KKSPGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.
This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.
This report documents the completion of milestone STPRO4-5 Kokkos interoperability with general SIMD types to force vectorization on ATS-1. The Kokkos team worked with application developers to enable the utilization of SIMD intrinsics, which allowed up to 3.7x improvement of the affected kernels on ATS-1 in a proxy application. SIMD types are now deployed in the production code base.
This report documents the completion of milestone STPRO4-6 Kokkos Support for ASC applications and libraries. The team provided consultation and support for numerous ASC code projects including Sandias SPARC, EMPIRE, Aria, GEMMA, Alexa, Trilinos, LAMMPS and nimbleSM. Over the year more than 350 Kokkos github issues were resolved, with over 220 requiring fixes and enhancements to the code base. Resolving these requests, with many of them issued by ASC code teams, provided applications with the necessary capabilities in Kokkos to be successful.
This report documents the completion of milestone STPRO4-7 Kokkos R&D: Remote Memory Spaces for One-Sided Halo-Exchange. The goal of this milestone was to develop and deploy an initial capability to support PGAS like communication models integrated into Kokkos via Remote Memory Spaces. The team developed semantic requirements for Remote Memory Spaces and implemented a prototype library leveraging four different communication libraries: libQUO, SHMEM, MPI-OneSided and NVSHMEM. In conjunction with ADCD02-COPA the Remote Memory Space capability was used in ExaMiniMD — a Molecular Dynamics Proxy Application — to explore the current state of the technology and its usability. The obtained results demonstrate that usability is very good, allowing a significant simplification communication routines, but performance is still lacking.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report documents the completion of milestone STPM12-4 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held on-site at Oak Ridge National Lab from July 24 — July 27, 2018. There were over 40 registered participants from 12 institutions, including 7 Kokkos project staff from SNL, LANL, and ORNL. The event consisted of a roughly a two-day tutorial session including hands exercises, followed by 1.5 days of intensive porting work on codes that the participants brought explore, port, and optimize the use of Kokkos with the help of Kokkos project experts.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.