Publications

Results 51–57 of 57

Search results

Jump to search filters

The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Cassella, Paul; Enos, Jeremy; Fullop, Joshi; Gentile, Ann C.; Monk, Stephen T.; Naksinehaboon, Nichamon; Ogden, Jeffry B.; Rajan, Mahesh R.; Showerman, Michael; Stevenson, Joel O.; Taerat, Narate; Tucker, Tom

Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the complex interplay between applications competing for shared resources. To gain new insights, monitoring tools must run continuously, system wide, at frequencies appropriate to the metrics of interest while having minimal impact on application performance. We introduce the Lightweight Distributed Metric Service for scalable, lightweight monitoring of large scale computing systems and applications. We describe issues and constraints guiding deployment in Sandia National Laboratories' capacity computing environment and on the National Center for Supercomputing Applications' Blue Waters platform including motivations, metrics of choice, and requirements relating to the scale and specialized nature of Blue Waters. We address monitoring overhead and impact on application performance and provide illustrative profiling results.

More Details

Simulation information regarding Sandia National Laboratories trinity capability improvement metric

Agelastos, Anthony M.; Lin, Paul L.

Sandia National Laboratories, Los Alamos National Laboratory, and Lawrence Livermore National Laboratory each selected a representative simulation code to be used as a performance benchmark for the Trinity Capability Improvement Metric. Sandia selected SIERRA Low Mach Module: Nalu, which is a uid dynamics code that solves many variable-density, acoustically incompressible problems of interest spanning from laminar to turbulent ow regimes, since it is fairly representative of implicit codes that have been developed under ASC. The simulations for this metric were performed on the Cielo Cray XE6 platform during dedicated application time and the chosen case utilized 131,072 Cielo cores to perform a canonical turbulent open jet simulation within an approximately 9-billion-elementunstructured- hexahedral computational mesh. This report will document some of the results from these simulations as well as provide instructions to perform these simulations for comparison.

More Details
Results 51–57 of 57
Results 51–57 of 57