Publications Search

Holistic Measurement Driven System Assessment

Kramer, Bill; Bauer, Greg; Bode, Brett; Showerman, Mike; Enos, Jeremy; Saxton, Aaron; Jha, Saurabh; Kalbarczyk, Zbigniew; Iyer, Ravi; Brandt, James M.; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Application and System Performance Metrics

Gentile, Ann C.; Brandt, James M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.; Andersson, S.; Brandt, James M.; Cardo, N.; Chunduri, S.; Enos, J.; Fields, P.; Gentile, Ann C.; Gerber, R.; Gienger, M.; Greenseid, J.; Greiner, A.; Hadri, B.; He, Y.; Hoppe, D.; Kaila, U.; Kelly, K.; Klein, M.; Kristiansen, A.; Leak, S.; Mason, M.; Bays, Nathan R.; Piccinali, J-G; Repik, Jason J.; Rogers, J.; Salminen, S.; Showerman, M.; Whitney, C.; Williams, J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

Integrating low-latency analysis into HPC system monitoring

ACM International Conference Proceeding Series

Izadpanah, Ramin; Naksinehaboon, Nichamon; Brandt, James M.; Gentile, Ann C.; Dechev, Damian

The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. While raw performance data is increasingly exposed at the component level, the usefulness of the data is dependent on the ability to do meaningful analysis on actionable timescales. However, current system monitoring infrastructures largely focus on data collection, with analysis performed off-system in post-processing mode. This increases the time required to provide analysis and feedback to a variety of consumers. In this work, we enhance the architecture of a monitoring system used on large-scale computational platforms, to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. We leverage the flexible communication topology of the monitoring system to enable placement of transformations based on overhead concerns, while still enabling low-latency exposure on node. Our design internally supports and exposes the raw and transformed data uniformly for both node level and off-system consumers. We show the viability of our implementation for a case with production-relevance: run-time determination of the relative per-node files system demands.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

OVIS Update 08/24/18

Brandt, James M.; Tucker, Thomas; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Characterizing Supercomputer Traffic Networks Through Link-Level Analysis

Jha, Saurabh; Brandt, James M.; Gentile, Ann C.; Kalbarczyk, Zbigneiw; Iyer, Ravishankar

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.; Andersson, S.; Brandt, James M.; Cardo, N.; Chunduri, S.; Enos, J.; Fields, P.; Gentile, Ann C.; Gerber, R.; Gienger, M.; Greenseid, J.; Greiner, A.; Hadri, B.; He, Y.; Hoppe, D.; Kaila, U.; Kelly, K.; Klein, M.; Kristiansen, A.; Leak, S.; Mason, M.; Bays, Nathan R.; Piccinali, J-G; Repik, Jason J.; Rogers, J.; Salminen, S.; Showerman, M.; Whitney, C.; Williams, J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

Application Performance Insights via System Monitoring

Brandt, James M.; Gentile, Ann C.; Hammond, Simon; Cook, Jeanine; Allan, Benjamin A.; Tucker, Thomas; Naksinehaboon, Nichamon; Taerat, Narate; Cook, Jeanine; Aaziz, Omar R.; Ates, Emre; Tuncer, Ozan; Egele, Manuel; Turk, Ata; Coskun, Ayse; Izadpanah, Ramin; Dechev, Damian

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Supporting Failure Analysis with Discoverable Annotated Log Datasets

Leak, Stephen; Greiner, Annette; Brandt, James M.; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Application Performance Insights via System Monitoring

Brandt, James M.; Enos, Jeremy; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Cray System Monitoring: Successes Requirements and Priorities

Ahlgren, Ville; Andersson, Stefan; Brandt, James M.; Cardo, Nicholas; Chunduri, Sudheer; Enos, Jeremy; Fields, Parks; Gentile, Ann C.; Gerber, Richard; Greenseid, Joe; Greiner, Annette; Hadri, Bilel; He, Yun; Hoppe, Dennis; Kaila, Urpo; Kelly, Kaki; Klein, Mark; Kristiansen, Alex; Leak, Steve; Mason, Mike; Pedretti, Kevin; Piccinali, Jean-Guillaume; Repik, Jason J.; Rogers, Jim; Salminen, Susanna; Showerman, Mike; Whitney, Cary; Williams, Jim

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Cray System Monitoring: Successes Requirements and Priorities

Ahlgren, Ville; Andersson, Stefan; Brandt, James M.; Cardo, Nicholas; Chunduri, Sudheer; Enos, Jeremy; Fields, Parks; Gentile, Ann C.; Gerber, Richard; Greenseid, Joe; Greiner, Annette; Hadri, Bilel; He, Yun; Hoppe, Dennis; Kaila, Urpo; Kelly, Kaki; Klein, Mark; Kristiansen, Alex; Leak, Steve; Mason, Mike; Pedretti, Kevin; Piccinali, Jean-Guillaume; Repik, Jason J.; Rogers, Jim; Salminen, Susanna; Showerman, Mike; Whitney, Cary; Williams, Jim

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Runtime HPC System and Application Performance Assessment and Diagnostics

Brandt, James M.; Gentile, Ann C.; Cook, Jonathan; Allan, Benjamin A.; Cook, Jeanine; Aaziz, Omar R.; Tucker, Thomas; Nichamon, Naksinehaboon; Taerat, Narate; Ates, Emre; Tuncer, Ozan; Egele, Manuel; Turk, Ata; Coskun, Ayse

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Continuous Performance Tracking for Kokkos Applications Using LDMS

Brandt, James M.; Hammond, Simon; Tucker, Thomas; Gentile, Ann C.; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Enhanced Profiling for Kokkos Applications

Hammond, Simon; Trott, Christian R.; Ibanez, Daniel A.; Edwards, Harold C.; Sunderland, Daniel; Ellingwood, Nathan D.; Brandt, James M.; Gentile, Ann C.; Cook, Jeanine; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Dynamic Assessment and Feedback

Gentile, Ann C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Live feed Sandia CAPVIZ HPC cluster performance analysis & visualization demonstration

Allan, Benjamin A.; Schmitz, Mark E.; Walsh, Edward J.; Aguilar, Michael J.; Brandt, James M.; Gentile, Ann C.; Ogden, Jeffry B.; Monk, Stephen T.; Noe, John P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Holistic measurement-driven system assessment

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Jha, Saurabh; Brandt, James M.; Gentile, Ann C.; Kalbarczyk, Zbigniew; Bauer, Greg; Enos, Jeremy; Showerman, Michael; Kaplan, Larry; Bode, Brett; Greiner, Annette; Bonnie, Amanda; Mason, Mike; Iyer, Ravishankar K.; Kramer, William

In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

Final Review of FY17 ASC CSSE L2 Milestone #6018 entitled "Analyzing Power Usage Characteristics of Workloads Running on Trinity"

Hoekstra, Robert J.; Hammond, Simon; Hemmert, Karl S.; Gentile, Ann C.; Oldfield, Ron; Lang, Mike; Martin, Steve

The presentation documented the technical approach of the team and summary of the results with sufficient detail to demonstrate both the value and the completion of the milestone. A separate SAND report was also generated with more detail to supplement the presentation.

More Details

TYPE Other Report YEAR 2017

DOI OSTI

Task Placement to Reduce Application Communication Costs

Devine, Karen; Brandt, James M.; Deveci, Mehmet; Gentile, Ann C.; Leung, Vitus J.; Olivier, Stephen L.; Bays, Nathan R.; Rajamanickam, Sivasankaran; Taylor, Mark A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Holistic Measurement Driven System Assessment

Jha, Saurabh; Brandt, James M.; Gentile, Ann C.; Karlbarczyk, Zbigniew; Bauer, Greg; Enos, Jeremy; Showerman, Michael; Kaplan, Larry; Bode, Brett; Greiner, Annette; Bonnie, Amanda; Mason, Mike; Iyer, Ravishankar; Kramer, William

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Discovering Metrics of Network Contention

Brandt, James M.; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo

Formicola, Valerio; Jha, Saurabh; Chen, Daniel; Dong, Wen; Bonnie, Amanda; Mason, Mike; Brandt, James M.; Gentile, Ann C.; Kaplan, Larry; Repik, Jason J.; Enos, Jeremy; Showerman, Mike; Greiner, Annette; Kalbarczyk, Zbigniew; Iyer, Ravishankar; Kramer, Bill

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI