Publications Search

A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

Slota, George M.; Berry, Jonathan; Hammond, Simon; Olivier, Stephen L.; Phillips, Cynthia A.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI

Multi-Level Memory Algorithmics for Large, Sparse Problems

Berry, Jonathan; Butcher, Neil; Catalyurek, Umit; Kogge, Peter; Lin, Paul; Olivier, Stephen L.; Phillips, Cynthia A.; Rajamanickam, Sivasankaran; Slota, George M.; Voskuilen, Gwendolyn R.; Yasar, Abdurrahman; Young, Jeffrey S.

In this report, we abstract eleven papers published during the project and describe preliminary unpublished results that warrant follow-up work. The topic is multi-level memory algorithmics, or how to effectively use multiple layers of main memory. Modern compute nodes all have this feature in some form.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI

Olivier, Stephen L.; Brightwell, Ronald B.; Foulk, James W.; Younge, Andrew J.; Evans, Noah; Levy, Scott L.N.; Ferreira, Kurt; Grant, Ryan

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Allocators in OpenMP 5.0

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

OpenMP Tasks: New Features in 5.0

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

The Ongoing Evolution of OpenMP

Proceedings of the IEEE

De Supinski, Bronis R.; Scogland, Thomas R.W.; Duran, Alejandro; Klemm, Michael; Mateo, Sergi; Olivier, Stephen L.; Terboven, Christian; Mattson, Timothy G.

This paper presents an overview of the past, present and future of the OpenMP application programming interface (API). While the API originally specified a small set of directives that guided shared memory fork-join parallelization of loops and program sections, OpenMP now provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory. As we look toward the future of OpenMP, we immediately see further evolution of the support for that range of parallelization strategies and the addition of direct support for debugging and performance analysis tools. Looking beyond the next major release of the specification of the OpenMP API, we expect the specification eventually to include support for more parallelization strategies and to embrace closer integration into its Fortran, C and, in particular, C++ base languages, which will likely require the API to adopt additional programming abstractions.

More Details

TYPE Journal Article YEAR 2018

DOI OSTI Scopus

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Foulk, James W.; Foulk, James W.; Hammond, Simon; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white paper entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia National Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Investigating Factors Impacting Exascale Performance of ASC Codes: A Co-design Effort

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Klinkenberg, Jannis; Samfass, Philipp; Terboven, Christian; Duran, Alejandro; Klemm, Michael; Teruel, Xavier; Mateo, Sergi; Olivier, Stephen L.; Muller, Matthias S.

In modern shared-memory NUMA systems which typically consist of two or more multi-core processor packages with local memory, affinity of data to computation is crucial for achieving high performance with an OpenMP program. OpenMP* 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks is missing. In this paper, we investigate several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing. We introduce the task affinity clause that will be part of OpenMP 5.0 and provide the reasoning behind its design. Evaluation with our experimental implementation in the LLVM OpenMP runtime shows that task affinity improves execution performance up to 4.5x on an 8-socket NUMA machine and significantly reduces runtime variability of OpenMP tasks. Our results demonstrate that a variety of applications can benefit from task affinity and that the presented clause is closing the gap of task-to-data affinity in OpenMP 5.0.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

ATDM Operating Systems and On-Node Runtime

Olivier, Stephen L.; Foulk, James W.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Enhancing Qthreads for ECP Science and Energy Impact

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

December 2017 ECP ST Project Review: ECP Project WBS 2.3.1.15 (Qthreads)

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

OpenMPIR: Implementing OpenMP tasks with tapir

Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis

Olivier, Stephen L.; Hammond, Simon; Duran, Alejandro

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Qthreads Node Threading Runtime and NoRMa Node Resource Manager: A HiHAT teaser

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

High Performance Computing - Power Application Programming Interface Specification Version 2.0

Foulk, James W.; Grant, Ryan; Levenhagen, Michael; Olivier, Stephen L.; Foulk, James W.; Ward, Harry L.; Younge, Andrew J.

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Qthreads and On-Node Run time Coordination

Olivier, Stephen L.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Double buffering for MCDRAM on second generation intel® Xeon Phi™ processors with OpenMP

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Qthreads: A library for lightweight threading

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

An Overview of Sandia National Laboratory?s High Performance Computing Power Application Programming Interface (API) Specification

Foulk, James W.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Early experiences with node-level power capping on the cray XC40 platform

Proceedings of E2SC 2015: 3rd International Workshop on Energy Efficient Supercomputing - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Publications

Search results