Publications Search

Hierarchical Task-Data Parallelism using Kokkos and Qthreads

Edwards, Harold C.; Olivier, Stephen L.; Berry, Jonathan; Mackey, Greg E.; Rajamanickam, Sivasankaran; Wolf, Michael; Kim, Kyungjoo; Stelle, George W.

This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factorization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG. Major challenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Cactus Environment Machine: Shared Environment Call-by-Need

Stelle, George W.; Stefanovic, Darko; Olivier, Stephen L.; Forrest, Stephanie

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Qthreads: Run Time Library Support for Task Parallel Programming

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Kokkos/Qthreads Task Parallel Approach to Linear Algebra Based Graph Analytics

Wolf, Michael; Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Software Requirements for ATDM On-Node Resource Management

Olivier, Stephen L.; Foulk, James W.; Brightwell, Ronald B.

This report outlines the software requirements for on-node resource management in the Advanced Simulation and Computing (ASC) Advanced Technology Development and Mitigation (ATDM) project at Sandia National Laboratories (SNL). The need for on-node resource management has arisen from the componentization of the software stack. Componentization aids in managing complexity and making software more composable and reusable. However, components must compete for limited on-node resources for execution (e.g., cores and hardware threads) and memory. The requirements documented in this report support an effort to manage this contention, avoiding oversubscription of resources and enabling their efficient deployment for application execution.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

High Performance Computing: Power Application Programming Interface Specification (V.1.3)

Foulk, James W.; Kelly, Suzanne M.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

An Overview of Sandia National Laboratory?s High Performance Computing Power Application Programming Interface (API) Specification

Foulk, James W.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Cactus Environment Machine: Shared Environment Call-by-Need

Stelle, George W.; Stefanovic, Darko; Olivier, Stephen L.; Forrest, Stephanie

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

ACES and Cray Collaborate on Advanced Power Management for Trinity

Foulk, James W.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Analysis of Application Sensitivity to System Performance Variability in a Dynamic Task Based Runtime

Shipman, Galen; Mccormick, Patrick; Foulk, James W.; Olivier, Stephen L.; Ferreira, Kurt; Sankaran, Ramanan; Treichler, Sean; Aiken, Alex; Bauer, Michael

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Overcoming Challenges in Scalable Power Monitoring with the Power API

Grant, Ryan; Levenhagen, Michael; Olivier, Stephen L.; Debonis, David; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Approaches for task affinity in OpenMP

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Terboven, Christian; Hahnfeld, Jonas; Teruel, Xavier; Mateo, Sergi; Duran, Alejandro; Klemm, Michael; Olivier, Stephen L.; De Supinski, Bronis R.

OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extended tasking to increase functionality and to support optimizations, for instance with the taskloop construct. However, task scheduling remains opaque, which leads to inconsistent performance on NUMA architectures. We assess design issues for task affinity and explore several approaches to enable it. We evaluate these proposals with implementations in the Nanos++ and LLVM OpenMP runtimes that improve performance up to 40% and significantly reduce execution time variation.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Qthreads: A library for lightweight threading

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

An Overview of Sandia National Laboratory?s High Performance Computing Power Application Programming Interface (API) Specification

Foulk, James W.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Early experiences with node-level power capping on the cray XC40 platform

Proceedings of E2SC 2015: 3rd International Workshop on Energy Efficient Supercomputing - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Foulk, James W.; Olivier, Stephen L.; Ferreira, Kurt; Shipman, Galen; Shu, Wei

Power consumption of extreme-scale supercomputers has become a key performance bottleneck. Yet current practices do not leverage power management opportunities, instead running at maximum power. This is not sustainable. Future systems will need to manage power as a critical resource, directing it to where it has greatest benefit. Power capping is one mechanism for managing power budgets, however its behavior is not well understood. This paper presents an empirical evaluation of several key HPC workloads running under a power cap on a Cray XC40 system, and provides a comparison of this technique with p-state control, demonstrating the performance differences of each. These results show: 1.) Maximum performance requires ensuring the cap is not reached; 2.) Performance slowdown under a cap can be attributed to cascading delays which result in unsynchronized performance variability across nodes; and, 3.) Due to lag in reaction time, considerable time is spent operating above the set cap. This work provides a timely and much needed comparison of HPC application performance under a power cap and attempts to enable users and system administrators to understand how to best optimize application performance on power-constrained HPC systems.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Early Experiences with Node-Level Power Capping on the Cray XC40 Platform [PowerPoint]

Foulk, James W.; Olivier, Stephen L.; Ferreira, Kurt; Shipman, Galen; Shu, Wei

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Power API for HPC: Standardizing Power Measurement and Control

Foulk, James W.; Foulk, James W.; Kelly, Suzanne M.; Levenhagen, Michael; Debonis, David; Olivier, Stephen L.; Grant, Ryan

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Qthreads and Thoughts on ULT Standardization

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

OpenMP Tasks: New Features for 4.5 [PowerPoint]

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Task-parallel Sparse Incomplete Cholesky Factorization using Kokkos Portable APIs

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Edwards, Harold C.; Olivier, Stephen L.; Stelle, George W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

High Performance Computing - Power Application Programming Interface Specification

Foulk, James W.; Kelly, Suzanne M.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Achieving practical exascale supercomputing will require massive increases in energy efficiency. The bulk of this improvement will likely be derived from hardware advances such as improved semiconductor device technologies and tighter integration, hopefully resulting in more energy efficient computer architectures. Still, software will have an important role to play. With every generation of new hardware, more power measurement and control capabilities are exposed. Many of these features require software involvement to maximize feature benefits. This trend will allow algorithm designers to add power and energy efficiency to their optimization criteria. Similarly, at the system level, opportunities now exist for energy-aware scheduling to meet external utility constraints such as time of day cost charging and power ramp rate limitations. Finally, future architectures might not be able to operate all components at full capability for a range of reasons including temperature considerations or power delivery limitations. Software will need to make appropriate choices about how to allocate the available power budget given many, sometimes conflicting considerations.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

Asynchronous Many-Task Programming Models for Next Generation Platforms

Wilke, Jeremiah; Bettencourt, Matthew T.; Bova, Steven W.; Franko, Ken; Gamell, Marc; Grant, Ryan; Hammond, Simon; Hollman, David S.; Knight, Samuel; Kolla, Hemanth; Lin, Paul T.; Olivier, Stephen L.; Sjaardema, Gregory D.; Slattengren, Nicole L.; Teranishi, Keita; Bennett, Janine C.; Clay, Robert L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

High Performance Computing - Power Application Programming Interface Specification. Version 1.1 [DRAFT]

Foulk, James W.; Kelly, Suzanne M.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

Exploring MPI Application Performance Under Power Capping on the Cray XC40 Platform

Foulk, James W.; Olivier, Stephen L.; Ferreira, Kurt; Shipman, Galen; Shu, Wei

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Publications

Search results