Publications Search

Comparison of CTH and miniAMR

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Assessing the role of mini-applications in predicting key performance characteristics of scientific and engineering applications

Journal of Parallel and Distributed Computing

Barrett, R.F.; Crozier, Paul; Doerfler, Douglas W.; Heroux, Michael A.; Lin, Paul T.; Thornquist, Heidi K.; Trucano, Timothy G.; Vaughan, Courtenay T.

Computational science and engineering application programs are typically large, complex, and dynamic, and are often constrained by distribution limitations. As a means of making tractable rapid explorations of scientific and engineering application programs in the context of new, emerging, and future computing architectures, a suite of "miniapps" has been created to serve as proxies for full scale applications. Each miniapp is designed to represent a key performance characteristic that does or is expected to significantly impact the runtime performance of an application program. In this paper we introduce a methodology for assessing the ability of these miniapps to effectively represent these performance issues. We applied this methodology to three miniapps, examining the linkage between them and an application they are intended to represent. Herein we evaluate the fidelity of that linkage. This work represents the initial steps required to begin to answer the question, "Under what conditions does a miniapp represent a key performance characteristic in a full app?"

More Details

TYPE Journal Article YEAR 2015

Scopus OSTI DOI

Preparation of Codes for Trinity

Vaughan, Courtenay T.; Rajan, Mahesh; Dinge, Dennis; Dohrmann, Clark R.; Franko, Kenneth; Glass, Micheal W.; Pierson, Kendall H.; Tupek, Michael R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Toward an Evolutionary Task Parallel Integrated MPI + X Programming Model

Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Preliminary Performance of Salinas

Vaughan, Courtenay T.; Dohrmann, Clark R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Communication Explorations for Adaptive Mesh Refinement Motivated by Emerging and Future Architectures

Vaughan, Courtenay T.; Barrett, Richard F.; Roweth, Duncan

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

An Evaluation of BitTorrent?s Performance In HPC Environments

Dosanjh, Matthew G.; Kelly, Suzanne M.; Foulk, James W.; Vaughan, Courtenay T.; Bridges, Patrick

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

An Evaluation of BitTorrent's Performance In HPC Environments

Dosanjh, Matthew G.; Kelly, Suzanne M.; Laros, James H.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI DOI

Exploring Workloads of Adaptive Mesh Refinement

Vaughan, Courtenay T.; Barrett, Richard F.; Jayaraj, Jagan

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

An Evaluation of BitTorrent's Performance In HPC Environments

Dosanjh, Matthew G.; Kelly, Suzanne M.; Laros, James H.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI DOI

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details

TYPE Journal Article YEAR 2013

OSTI DOI

An Evaluation of BitTorrent's Performance In HPC Enviroments

Kelly, Suzanne M.; Laros, James H.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

NNSA/ASC Test Bed Update

Hammond, Simon; Barrett, Richard F.; Vaughan, Courtenay T.; Trott, Christian R.; Laros, James H.; Kelly, Suzanne M.; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

A first look at miniAMR

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using the Cray Gemini Performance Counters

Pedretti, Kevin; Vaughan, Courtenay T.; Barrett, Richard F.; Devine, Karen; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Ensuring Continued Scalability of Mesh Based Hydrocodes

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using the Cray Gemini Performance Counters

Pedretti, Kevin; Vaughan, Courtenay T.; Hemmert, Karl S.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Navigating an evolutionary fast path to exascale

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Hammond, Simon; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

The computing community is in the midst of a disruptive architectural change. The advent of manycore and heterogeneous computing nodes forces us to reconsider every aspect of the system software and application stack. To address this challenge there is a broad spectrum of approaches, which we roughly classify as either revolutionary or evolutionary. With the former, the entire code base is re-written, perhaps using a new programming language or execution model. The latter, which is the focus of this work, seeks a piecewise path of effective incremental change. The end effect of our approach will be revolutionary in that the control structure of the application will be markedly different in order to utilize single-instruction multiple-data/thread (SIMD/SIMT), manycore and heterogeneous nodes, but the physics code fragments will be remarkably similar. Our approach is guided by a set of mission driven applications and their proxies, focused on balancing performance potential with the realities of existing application code bases. Although the specifics of this process have not yet converged, we find that there are several important steps that developers of scientific and engineering application programs can take to prepare for making effective use of these challenging platforms. Aiding an evolutionary approach is the recognition that the performance potential of the architectures is, in a meaningful sense, an extension of existing capabilities: vectorization, threading, and a re-visiting of node interconnect capabilities. Therefore, as architectures, programming models, and programming mechanisms continue to evolve, the preparations described herein will provide significant performance benefits on existing and emerging architectures. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus

Assessing the predictive capabilities of mini-applications

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Crozier, Paul; Doerfler, Douglas W.; Hammond, Simon; Heroux, Michael A.; Lin, Paul T.; Trucano, Timothy G.; Vaughan, Courtenay T.; Williams, Alan B.

The push to exascale computing is informed by the assumption that the architecture, regardless of the specific design, will be fundamentally different from petascale computers. The Mantevo project has been established to produce a set of proxies, or 'miniapps,' which enable rapid exploration of key performance issues that impact a broad set of scientific applications programs of interest to ASC and the broader HPC community. Understanding the conditions under which a miniapp can be confidently used as predictive of an applications' behavior must be clearly elucidated. Toward this end, we have developed a methodology for assessing the predictive capabilities of application proxies. Adhering to the spirit of experimental validation, our approach provides a framework for examining data from the application with that provided by their proxies. In this poster we present this methodology, and apply it to three miniapps developed by the Mantevo project. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus