Publications

Results 1–25 of 57
Skip to search filters

Quantitative Performance Assessment of Proxy Apps and Parents (Report for ECP Proxy App Project Milestone ADCD-504-28)

Cook, Jeanine C.; Aaziz, Omar R.; Chen, Si C.; Godoy, William F.; Powell, Amy J.; Watson, Gregory W.; Vaughan, Courtenay T.; Wildani, Avani W.

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

More Details

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Aaziz, Omar R.; Allan, Benjamin A.; Brandt, James M.; Cook, Jeanine C.; Devine, Karen D.; Elliott, James E.; Gentile, Ann C.; Hammond, Simon D.; Kelley, Brian M.; Lopatina, Lena L.; Moore, Stan G.; Olivier, Stephen L.; Pedretti, Kevin P.; Poliakoff, David Z.; Pawlowski, Roger P.; Regier, Phillip A.; Schmitz, Mark E.; Schwaller, Benjamin S.; Surjadidjaja, Vanessa S.; Swan, Matthew S.; Tucker, Nick T.; Tucker, Tom T.; Vaughan, Courtenay T.; Walton, Sara P.

Scientific applications run on high-performance computing (HPC) systems are critical for many national security missions within Sandia and the NNSA complex. However, these applications often face performance degradation and even failures that are challenging to diagnose. To provide unprecedented insight into these issues, the HPC Development, HPC Systems, Computational Science, and Plasma Theory & Simulation departments at Sandia crafted and completed their FY21 ASC Level 2 milestone entitled "Integrated System and Application Continuous Performance Monitoring and Analysis Capability." The milestone created a novel integrated HPC system and application monitoring and analysis capability by extending Sandia's Kokkos application portability framework, Lightweight Distributed Metric Service (LDMS) monitoring tool, and scalable storage, analysis, and visualization pipeline. The extensions to Kokkos and LDMS enable collection and storage of application data during run time, as it is generated, with negligible overhead. This data is combined with HPC system data within the extended analysis pipeline to present relevant visualizations of derived system and application metrics that can be viewed at run time or post run. This new capability was evaluated using several week-long, 290-node runs of Sandia's ElectroMagnetic Plasma In Realistic Environments ( EMPIRE ) modeling and design tool and resulted in 1TB of application data and 50TB of system data. EMPIRE developers remarked this capability was incredibly helpful for quickly assessing application health and performance alongside system state. In short, this milestone work built the foundation for expansive HPC system and application data collection, storage, analysis, visualization, and feedback framework that will increase total scientific output of Sandia's HPC users.

More Details

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Brandt, James M.; Cook, Jeanine C.; Aaziz, Omar R.; Allan, Benjamin A.; Devine, Karen D.; Elliott, James J.; Gentile, Ann C.; Hammond, Simon D.; Kelley, Brian M.; Lopatina, Lena L.; Moore, Stan G.; Olivier, Stephen L.; Pedretti, Kevin P.; Poliakoff, David Z.; Pawlowski, Roger P.; Regier, Phillip A.; Schmitz, Mark E.; Schwaller, Benjamin S.; Surjadidjaja, Vanessa S.; Swan, Matthew S.; Tucker, Tom T.; Tucker, Nick T.; Vaughan, Courtenay T.; Walton, Sara P.

Abstract not provided.

LDMS-GPU: Lightweight Distributed Metric Service (LDMS) for NVIDIA GPGPUs

Elwazir, Ammar E.; Badawy, Abdel-Hameed, B.; Aaziz, Omar R.; Cook, Jeanine C.

GPUs are now a fundamental accelerator for many high-performance computing applications. They are viewed by many as a technology facilitator for the surge in fields like machine learning and Convolutional Neural Networks. To deliver the best performance on a GPU, we need to create monitoring tools to ensure that we optimize the code to get the most performance and efficiency out of a GPU. Since NVIDIA GPUs are currently the most commonly implemented in HPC applications and systems, NVIDIA tools are the solution for performance monitoring. The Light-Weight Distributed Metric System (LDMS) at Sandia is an infrastructure widely adopted for large-scale systems and application monitoring. Sandia has developed CPU application monitoring capability within LDMS. Therefore, we chose to develop a GPU monitoring capability within the same framework. In this report, we discuss the current limitations in the NVIDIA monitoring tools, how we overcame such limitations, and present an overview of the tool we built to monitor GPU performance in LDMS and its capabilities. Also, we discuss our current validation results. Most of the performance counter results are the same in both vendor tools and our tool when using LDMS to collect these results. Furthermore, our tool provides these statistics during the entire runtime of the tool as a time series and not just aggregate statistics at the end of the application run. This allows the user to see the progress of the behavior of the applications during their lifetime.

More Details

Fine-Grained Analysis of Communication Similarity between Real and Proxy Applications

Proceedings of PMBS 2019: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Aaziz, Omar R.; Vaughan, Courtenay T.; Cook, Jonathan E.; Cook, Jeanine C.; Kuehn, Jeffery; Richards, David

In this work we investigate the dynamic communication behavior of parent and proxy applications, and investigate whether or not the dynamic communication behavior of the proxy matches that of its respective parent application. The idea of proxy applications is that they should match their parent well, and should exercise the hardware and perform similarly, so that from them lessons can be learned about how the HPC system and the application can best be utilized. We show here that some proxy/parent pairs do not need the extra detail of dynamic behavior analysis, while others can benefit from it, and through this we also identified a parent/proxy mismatch and improved the proxy application.

More Details

Exploring and quantifying how communication behaviors in proxies relate to real applications

Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Aaziz, Omar R.; Cook, Jeanine C.; Cook, Jonathan E.; Vaughan, Courtenay T.

Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.

More Details

A Methodology for Characterizing the Correspondence between Real and Proxy Applications

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Aaziz, Omar R.; Cook, Jeanine C.; Cook, Jonathan E.; Juedeman, Tanner; Richards, David; Vaughan, Courtenay T.

Proxy applications are a simplified means for stake-holders to evaluate how both hardware and software stacks might perform on the class of real applications that they are meant to model. However, characterizing the relationship between them and their behavior is not an easy task. We present a data-driven methodology for characterizing the relationship between real and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence and divergence. We use new capabilities for application-level monitoring within LDMS (Lightweight Distributed Monitoring System) to capture hardware performance counter and MPI-related data. To demonstrate the utility of this methodology, we present experimental evidence from two system platforms, using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to computation and memory behavior. We also analyze communication patterns separately using mpiP data and show that communication for these four proxy/parent pairs is also similar.

More Details
Results 1–25 of 57
Results 1–25 of 57