Publications Search

FY22 Proxy App Suite Release

Cook, Jeanine; Aaziz, Omar R.; Vaughan, Courtenay T.; Alexeev, Yuri; Balakrishnan, Ramesh; Fletcher, Graham; Junghans, Christoph; Kim, Youngdae; Liber, Nevin; Liu, Geng; Lund, Amanda; Mayagoitia, Alvaro; Mc Corquodale, Peter; Pavel, Robert; Ramakrishnaiah, Vinay

The FY22 Proxy App Suite Release milestone includes the following activities: Curate a collection of proxy applications that represents the breadth of ECP applications, including application domains, programming models, supporting libraries, numerical methods, etc. Identify gaps in coverage and work with application teams to commission or develop proxies to cover gaps. From within this collection, designate the ”ECP Proxy Application Suite” of 10–15 proxies that balance breadth of coverage with ease of use and quality of implementation. Also designate approximately 6–10 proxies to form the “ECP Machine Learning Proxy Suite”. The ML suite will represent algorithms, use cases, and programming methods typically used by ECP science workloads to incorporate machine learning into their workflows.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Quantitative Performance Assessment of Proxy Apps and Parents

Cook, Jeanine; Aaziz, Omar R.; Vaughan, Courtenay T.; Watson, Gregory; Mccorquodale, Peter; Godoy, William; Delozier, Jenna; Carroll, Mark

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications. Our FY21 milestone (ADCD-504-11) proposed to: Assess the performance and fidelity of proxy applications, including those in the ECP Proxy App Suite, relative to the ECP Application workload on heterogeneous platforms. Use proxy applications and selected ECP applications to assess the utility of critical elements of the Exascale toolchain, especially tools used to collect performance data. Identify gaps in coverage and/or common situations in which proxies may fail to adequately represent ECP applications.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

The ECP Proxy App Project: Highlights and Lessons Learned

Aaziz, Omar R.; Cook, Jeanine

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Rebooting Computing with 3D and Architecture/Superstrider Associative Array Architecture

Debenedictis, Erik; Cook, Jeanine; Srikanth, Sriseshan; Conte, Thomas

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2022

OSTI

Proxy Apps for X (Co-design, Real life and Fun!)

Aaziz, Omar R.; Cook, Jeanine; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2022

DOI OSTI

Quantitative Performance Assessment of Proxy Apps and Parents (Report for ECP Proxy App Project Milestone ADCD-504-28)

Cook, Jeanine; Aaziz, Omar R.; Chen, Si; Godoy, William; Powell, Amy J.; Watson, Gregory; Vaughan, Courtenay T.; Wildani, Avani

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Brandt, James M.; Cook, Jeanine; Aaziz, Omar R.; Allan, Benjamin A.; Devine, Karen; Foulk, James W.; Gentile, Ann C.; Hammond, Simon; Kelley, Brian M.; Lopatina, Lena; Moore, Stan G.; Olivier, Stephen L.; Foulk, James W.; Poliakoff, David; Pawlowski, Roger; Regier, Phillip; Schmitz, Mark E.; Schwaller, Benjamin; Surjadidjaja, Vanessa; Swan, Matthew S.; Tucker, Tom; Tucker, Nick; Vaughan, Courtenay T.; Walton, Sara P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Aaziz, Omar R.; Allan, Benjamin A.; Brandt, James M.; Cook, Jeanine; Devine, Karen; Elliott, James E.; Gentile, Ann C.; Hammond, Simon; Kelley, Brian M.; Lopatina, Lena; Moore, Stan G.; Olivier, Stephen L.; Foulk, James W.; Poliakoff, David; Pawlowski, Roger; Regier, Phillip; Schmitz, Mark E.; Schwaller, Benjamin; Surjadidjaja, Vanessa; Swan, Matthew S.; Tucker, Nick; Tucker, Thomas; Vaughan, Courtenay T.; Walton, Sara P.

Scientific applications run on high-performance computing (HPC) systems are critical for many national security missions within Sandia and the NNSA complex. However, these applications often face performance degradation and even failures that are challenging to diagnose. To provide unprecedented insight into these issues, the HPC Development, HPC Systems, Computational Science, and Plasma Theory & Simulation departments at Sandia crafted and completed their FY21 ASC Level 2 milestone entitled "Integrated System and Application Continuous Performance Monitoring and Analysis Capability." The milestone created a novel integrated HPC system and application monitoring and analysis capability by extending Sandia's Kokkos application portability framework, Lightweight Distributed Metric Service (LDMS) monitoring tool, and scalable storage, analysis, and visualization pipeline. The extensions to Kokkos and LDMS enable collection and storage of application data during run time, as it is generated, with negligible overhead. This data is combined with HPC system data within the extended analysis pipeline to present relevant visualizations of derived system and application metrics that can be viewed at run time or post run. This new capability was evaluated using several week-long, 290-node runs of Sandia's ElectroMagnetic Plasma In Realistic Environments ( EMPIRE ) modeling and design tool and resulted in 1TB of application data and 50TB of system data. EMPIRE developers remarked this capability was incredibly helpful for quickly assessing application health and performance alongside system state. In short, this milestone work built the foundation for expansive HPC system and application data collection, storage, analysis, visualization, and feedback framework that will increase total scientific output of Sandia's HPC users.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Evaluation of oneAPI for FPGAs

Miller, Nicholas; Cook, Jeanine; Hughes, Clayton

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Evaluation of oneAPI for FPGAs

Miller, Nicholas; Cook, Jeanine; Hughes, Clayton

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Developing a Graph Analytics Code : An Analysis of the Chapel Programming Environment

Barrett, Richard F.; Aaziz, Omar R.; Cook, Jeanine; Lehoucq, Rich; Olivier, Stephen L.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Enabling Application and System Data Fusion

Gentile, Ann C.; Brandt, James M.; Cook, Jeanine; Hammond, Simon; Poliakoff, David; Schwaller, Benjamin; Surjadidjaja, Vanessa; Tucker, Thomas O.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

miniAMR port using oneAPI

Miller, Nicholas; Hughes, Clayton; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Evaluation of oneAPI for FPGAs

Miller, Nicholas; Hughes, Clayton; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

LDMS-GPU: Lightweight Distributed Metric Service (LDMS) for NVIDIA GPGPUs

Elwazir, Ammar; Badawy, Abdel-Hameed A.; Aaziz, Omar R.; Cook, Jeanine

GPUs are now a fundamental accelerator for many high-performance computing applications. They are viewed by many as a technology facilitator for the surge in fields like machine learning and Convolutional Neural Networks. To deliver the best performance on a GPU, we need to create monitoring tools to ensure that we optimize the code to get the most performance and efficiency out of a GPU. Since NVIDIA GPUs are currently the most commonly implemented in HPC applications and systems, NVIDIA tools are the solution for performance monitoring. The Light-Weight Distributed Metric System (LDMS) at Sandia is an infrastructure widely adopted for large-scale systems and application monitoring. Sandia has developed CPU application monitoring capability within LDMS. Therefore, we chose to develop a GPU monitoring capability within the same framework. In this report, we discuss the current limitations in the NVIDIA monitoring tools, how we overcame such limitations, and present an overview of the tool we built to monitor GPU performance in LDMS and its capabilities. Also, we discuss our current validation results. Most of the performance counter results are the same in both vendor tools and our tool when using LDMS to collect these results. Furthermore, our tool provides these statistics during the entire runtime of the tool as a time series and not just aggregate statistics at the end of the application run. This allows the user to see the progress of the behavior of the applications during their lifetime.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Validating Communication Similarity between Proxy and Parent Applications through Network Performance Characterization

Aaziz, Omar R.; Cook, Jeanine; Vaughan, Courtenay T.; Jeffery, Kuehn

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Exploring chapel productivity using some graph algorithms

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Barrett, Richard F.; Cook, Jeanine; Olivier, Stephen L.; Aaziz, Omar R.; Jenkins, Chris; Vaughan, Courtenay T.

A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

Attributing Performance Variation from Integrated Application and System Data

Aaziz, Omar R.; Allan, Benjamin A.; Brandt, James M.; Cook, Jeanine; Devine, Karen; Foulk, James W.; Gentile, Ann C.; Olivier, Stephen L.; Foulk, James W.; Tucker, Thomas

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Using LDMS for Performance and Proxy Representativeness Characterization

Cook, Jeanine; Kuehn, Jeffery; Aaziz, Omar R.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

The HIdden Mystery Behind Proxy Applications

Azziz, Omar; Cook, Jeanine; Vaughan, Courtenay T.; Mccorquodale, Peter; Finkel, Hal; Homerding, Brian; Moore, Shirley; Mintz, Tiffany; Watson, Greg; Ramakrishnaiah; Vinay; Pavel, Robert; Uram, Thomas; Liber, Nevin; Foulk, James W.; Lujan, Xavier E.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Using Cosine Similarity to Quantify Representativeness of ECP Proxy Apps

Cook, Jeanine; Kuehn, Jefferey; Aaziz, Omar R.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Fine-Grained Analysis of Communication Similarity between Real and Proxy Applications

Proceedings of PMBS 2019: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Aaziz, Omar R.; Vaughan, Courtenay T.; Cook, Jonathan; Cook, Jeanine; Kuehn, Jeffery; Richards, David

In this work we investigate the dynamic communication behavior of parent and proxy applications, and investigate whether or not the dynamic communication behavior of the proxy matches that of its respective parent application. The idea of proxy applications is that they should match their parent well, and should exercise the hardware and perform similarly, so that from them lessons can be learned about how the HPC system and the application can best be utilized. We show here that some proxy/parent pairs do not need the extra detail of dynamic behavior analysis, while others can benefit from it, and through this we also identified a parent/proxy mismatch and improved the proxy application.

More Details

TYPE Conference Poster YEAR 2019

OSTI Scopus

MetaStrider: Architectures for scalable memory-centric reduction of sparse data streams

ACM Transactions on Architecture and Code Optimization

Srikanth, Sriseshan; Jain, Anirudh; Lennon, Joseph M.; Conte, Thomas M.; Debenedictis, Erik; Cook, Jeanine

Reduction is an operation performed on the values of two or more key-value pairs that share the same key. Reduction of sparse data streams finds application in a wide variety of domains such as data and graph analytics, cybersecurity, machine learning, and HPC applications. However, these applications exhibit low locality of reference, rendering traditional architectures and data representations inefficient. This article presents MetaStrider, a significant algorithmic and architectural enhancement to the state-of-the-art, SuperStrider. Furthermore, these enhancements enable a variety of parallel, memory-centric architectures that we propose, resulting in demonstrated performance that scales near-linearly with available memory-level parallelism.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI Scopus

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, Daniel; Chan, Cy; Cook, Jeanine; Daley, Christopher S.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI

Fine-Grained Analysis of Communication Similarity between Real And Proxy Applications

Aaziz, Omar R.; Vaughan, Courtenay T.; Cook, Jonathan; Cook, Jeanine; Kuehn, Jeffrey

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Publications

Search results