Kokkos Paving the Way to ExaScale
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Transactions on Parallel and Distributed Systems
As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. Here, we describe the novel abstractions that have been added to Kokkos recently such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.
Abstract not provided.
Abstract not provided.
Abstract not provided.
2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.
ACM International Conference Proceeding Series
Sparse triangular solver is an important kernel in many computational applications. However, a fast, parallel, sparse triangular solver on a manycore architecture such as GPU has been an open issue in the field for several years. In this paper, we develop a sparse triangular solver that takes advantage of the supernodal structures of the triangular matrices that come from the direct factorization of a sparse matrix. We implemented our solver using Kokkos and Kokkos Kernels such that our solver is portable to different manycore architectures. This has the additional benefit of allowing our triangular solver to use the team-level kernels and take advantage of the hierarchical parallelism available on the GPU. We compare the effects of different scheduling schemes on the performance and also investigate an algorithmic variant called the partitioned inverse. Our performance results on an NVIDIA V100 or P100 GPU demonstrate that our implementation can be 12.4 × or 19.5 × faster than the vendor optimized implementation in NVIDIA's CuSPARSE library.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This SAND report documents the findings of the LDRD project, "Modeling Complex Relationships in Large-Scale Data using Hypergraphs". The project ran from October 2017 through September 2019. The focus of the project was the development and application of hypergraph data analytics to Sandia relational data applications. In this project, we attempted to apply a hypergraph data analysis method—specifically, hypergraph eigenvector centrality—to Sandia mission problems to identify influential entities (people, location, times, etc.) in the data. Unfortunately, the application data led to graph and hypergraph representations containing disconnected components. To date, there are no well-established techniques for applying eigenvector centrality to such graphs and hypergraphs. In this report, we present several heuristics for computing eigenvector centrality for disconnected graphs. We believe this is an important start to understanding how to approach the similar problem for hypergraphs, but this project concluded before we made progress on that problem. The ideas, methods, and suggestions presented here can be used for further research into this challenging problem. We also present our ideas for generating graphs with known degree and centrality distributions. The goal in presenting this work is to identify a procedure for analyzing such graphs once the problem of addressing disconnected components has been addressed. When working with a single data set, this generator can be used to create many instances of graphs that can be used to analyze the robustness of the centrality computations for the original data set. Although the results did not match perfectly in the case of the Facebook Ego dataset used in the experiments presented here, this again represents a good start in the direction of a graph generator for such problems. We note that there are potential trade-offs between how the degree and centrality distributions are fit to the original data and suggested several possible avenues for follow-on research efforts.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report documents the completion of milestone STPRO4-6 Kokkos Support for ASC applications and libraries. The team provided consultation and support for numerous ASC code projects including Sandias SPARC, EMPIRE, Aria, GEMMA, Alexa, Trilinos, LAMMPS and nimbleSM. Over the year more than 350 Kokkos github issues were resolved, with over 220 requiring fixes and enhancements to the code base. Resolving these requests, with many of them issued by ASC code teams, provided applications with the necessary capabilities in Kokkos to be successful.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.