Publications Search

Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity

Brandt, James M.; Debonis, David; Gentile, Ann C.; Lujan, Jim; Martin, Cindy; Martinez, David; Olivier, Stephen L.; Foulk, James W.; Taerat, Narate; Velarde, Ron

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Dynamic Task Scheduling to Mitigate System Performance Variability

Shipman, Galen; Mccormick, Patrick; Foulk, James W.; Olivier, Stephen L.; Ferreira, Kurt; Chen, Jacqueline H.; Sankaran, Ramanan; Treichler, Sean; Aiken, Alex; Bauer, Michael

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Enabling Advanced Operational Analysis Through Multi-Subsystem Data Integration on Trinity

Brandt, James M.; Debonis, David; Gentile, Ann C.; Lujan, James; Martin, Cindy; Martinez, David; Olivier, Stephen L.; Foulk, James W.; Taerat, Narate; Velarde, Ron

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A Power Application Programming Interface (API) Specification for High Performance Computers (HPC)

Foulk, James W.; Foulk, James W.; Grant, Ryan; Levenhagen, Michael; Debonis, David; Olivier, Stephen L.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Toward an evolutionary task parallel integrated MPI + X Programming Model

Proceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015

Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.

The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node interconnect and processing cores, and hide intra- and inter-node data movement. Our approach maintains the existing coding style commonly employed in computational science and engineering applications. Although we show improved performance on existing computers, up to 131,072 processor cores, the effectiveness of this approach on expected future architectures will require the continued evolution of capabilities throughout the codesign stack. Success then will not only result in decreased time to solution, but would also make better use of the hardware capabilities and reduce power and energy requirements, while fundamentally maintaining the current code configuration strategy.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Towards task-parallel reductions in OpenMP

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ciesko, Jan; Mateo, Sergi; Teruel, Xavier; Martorell, Xavier; Ayguade, Eduard; Labarta, Jesus; Duran, Alex; De Supinski, Bronis R.; Olivier, Stephen L.; Li, Kelvin; Eichenberger, Alexandre E.

Reductions represent a common algorithmic pattern in many scientific applications. OpenMP* has always supported them on parallel and worksharing constructs. OpenMP 3.0’s tasking constructs enable new parallelization opportunities through the annotation of irregular algorithms. Unfortunately the tasking model does not easily allow the expression of concurrent reductions, which limits the general applicability of the programming model to such algorithms. In this work, we present an extension to OpenMP that supports task-parallel reductions on task and taskgroup constructs to improve productivity and programmability. We present specification of the feature and explore issues for programmers and software vendors regarding programming transparency as well as the impact on the current standard with respect to nesting, untied task support and task data dependencies. Our performance evaluation demonstrates comparable results to hand coded task reductions.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Toward an Evolutionary Task Parallel Integrated MPI + X Programming Model

Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Metrics for evaluating energy saving techniques for resilient HPC systems

Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014

Grant, Ryan; Olivier, Stephen L.; Laros, James H.; Brightwell, Ronald B.; Porterfield, Allan K.

The metrics used for evaluating energy saving techniques for future HPC systems are critical to the correct assessment of proposed methods. Current predictions forecast that overcoming reduced system reliability, increased power requirements and energy consumption will be a major design challenge for future systems. Modern runtime energy-saving research efforts do not take into account the energy spent providing reliability. They also do not account for the increase in the probability of failure during application execution due to runtime overhead from energy saving methods. While this is very reasonable for current systems, it is insufficient for future generation systems. By taking into account the energy consumption ramifications of increased runtimes on system reliability, better energy saving techniques can be developed. This paper demonstrates how to determine the impact of runtime energy conservation methods within the context of failure-prone large scale systems. In addition, a survey of several energy savings methodologies is conducted and an analysis is performed with respect to their effectiveness in an environment in which failures occur.

More Details

TYPE Conference YEAR 2014

Scopus OSTI DOI

Run Time Systems R&D with the Qthreads Multithreading Library

Stark, Dylan T.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Run Time Systems R&D with the Qthreads Multithreading Library

Stark, Dylan T.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Recent and Upcoming Enhancements to OpenMP

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Using architecture information and real-time resource state to reduce power consumption and communication costs in parallel applications

Brandt, James M.; Devine, Karen; Gentile, Ann C.; Leung, Vitus J.; Olivier, Stephen L.; Foulk, James W.; Rajamanickam, Sivasankaran; Bunde, David P.; Deveci, Mehmet; Catalyurek, Umit V.

As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowed this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

A Power API for the HPC Community

Debonis, David; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Kelly, Suzanne M.; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

High Performance Computing - Power Application Programming Interface Specification (V.1.0)

Foulk, James W.; Kelly, Suzanne M.; Foulk, James W.; Grant, Ryan; Olivier, Stephen L.; Levenhagen, Michael; Debonis, David

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

The Qthreads Lightweight Multithreading Library

Olivier, Stephen L.; Stark, Dylan T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Metrics for Evalua0ng Energy Saving Techniques for Resilient HPC Systems

Grant, Ryan; Olivier, Stephen L.; Laros, James H.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Addressing Power/Energy Challenges for Extreme Scale HPC

Laros, James H.; Kelly, Suzanne M.; Pedretti, Kevin P.; Grant, Ryan; Levenhagen, Michael; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Zoltan2: Exploiting Geometric Partitioning in Task Mapping for Parallel Computers

Leung, Vitus J.; Rajamanickam, Sivasankaran; Pedretti, Kevin; Olivier, Stephen L.; Devine, Karen

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Unified Task + Data Parallelism on Manycore Architectures

Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George W.; Olivier, Stephen L.; Stark, Dylan T.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Exploiting Geometric Partitioning in Task Mapping for Parallel Computers

Rajamanickam, Sivasankaran; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Devine, Karen

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A Proposal for Task-Generating Loops in OpenMP

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI DOI

Design issues in the semantics and scheduling of asynchronous tasks

Olivier, Stephen L.

The asynchronous task model serves as a useful vehicle for shared memory parallel programming, particularly on multicore and manycore processors. As adoption of model among programmers has increased, support has emerged for the integration of task parallel language constructs into mainstream programming languages, e.g., C and C++. This paper examines some of the design decisions in Cilk and OpenMP concerning semantics and scheduling of asynchronous tasks with the aim of informing the efforts of committees considering language integration, as well as developers of new task parallel languages and libraries.

More Details

TYPE SAND Report YEAR 2013

DOI OSTI

Parallel Scientific Computing at the DOE National Laboratories: Successes and Challenges

Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Publications

Search results