Brandt, J.M., Stroup, K.D., Gentile, A.C., Lueninghoener, C.D., & Donato, E. (2024). Building LDMS Samplers for Slingshot Switches [Conference Presentation]. 10.2172/2565145
Publications
Search results
Jump to search filtersGentile, A.C. (2023). LDMS: New Features and New Directions [Conference Presentation]. 10.2172/2430825
Boito, F., Brandt, J.M., Cardellini, V., Carns, P., Ciorba, F.M., Egan, H., Eleliemy, A., Gentile, A.C., Gruber, T., Hanson, J., Haus, U.U., Huck, K., Ilsche, T., Jakobsche, T., Jones, T., Karlsson, S., Mueen, A., Ott, M., Patki, T., … Yamamoto, K. (2023). Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations [Conference Paper]. Proceedings - IEEE International Conference on Cluster Computing, ICCC. 10.1109/CLUSTERWorkshops61457.2023.00016
Brandt, J.M., & Gentile, A.C. (2022). AppSysFusion: CoMingling of appropriate data to drive Codesign of Applications, HPC Platforms, and Monitoring, Analysis, and Feedback Infrastructure [Conference Presentation]. 10.2172/2006042
Brandt, J.M., Gentile, A.C., Walton, S.P., Allan, B.A., & Tucker, T. (2021). LDMS Version 4.3 Tutorial Part 1: Basics [Conference Presentation]. 10.2172/1899500
Brandt, J.M., Gentile, A.C., & Tucker, T. (2021). LDMS Version 4.3.8 Advanced Tutorial: Part 1 [Conference Presentation]. 10.2172/1898488
Brandt, J.M., Gentile, A.C., & Tucker, T. (2021). LDMS Version 4.3.8 Advanced Tutorial: Part 2 [Conference Presentation]. 10.2172/1898478
Brandt, J.M., Cook, J., Aaziz, O.R., Allan, B.A., Devine, K., Foulk, J.W., Gentile, A.C., Hammond, S., Kelley, B.M., Lopatina, L., Moore, S.G., Olivier, S.L., Foulk, J.W., Poliakoff, D., Pawlowski, R., Regier, P., Schmitz, M.E., Schwaller, B., Surjadidjaja, V., … Walton, S.P. (2021). Integrated System and Application Continuous Performance Monitoring and Analysis Capability [Presentation]. https://www.osti.gov/biblio/1886175
Aaziz, O.R., Allan, B.A., Brandt, J.M., Cook, J., Devine, K., Elliott, J., Gentile, A.C., Hammond, S., Kelley, B.M., Lopatina, L., Moore, S.G., Olivier, S.L., Foulk, J.W., Poliakoff, D., Pawlowski, R., Regier, P., Schmitz, M.E., Schwaller, B., Surjadidjaja, V., … Walton, S.P. (2021). Integrated System and Application Continuous Performance Monitoring and Analysis Capability. https://doi.org/10.2172/1819812
Gentile, A.C., & Brandt, J.M. (2021). Integrating Systems Operations into CoDesign [Conference Presentation]. 10.2172/1877538
Gentile, A.C. (2021). Integrating Systems Management into CoDesign [Presentation]. https://www.osti.gov/biblio/1868437
Gentile, A.C. (2021). Integrating Systems Management into CoDesign [Presentation]. https://www.osti.gov/biblio/1868440
Gentile, A.C. (2021). Integrating Systems Management into CoDesign [Presentation]. https://www.osti.gov/biblio/1868447
Gentile, A.C., Brandt, J.M., Cook, J., Hammond, S., Poliakoff, D., Schwaller, B., Surjadidjaja, V., & Tucker, T.O. (2021). Enabling Application and System Data Fusion [Conference Presentation]. 10.2172/1863505
Brandt, J.M., Enos, J., Gentile, A.C., & Kramer, W. (2021). Including Operations Analytics & Communication In Next Generation CoDesign: [Conference Presentation]. 10.2172/1856310
Brightwell, R.B., Ferreira, K., Grant, R., Levy, S.L.N., Lofstead, G.F., Olivier, S.L., Foulk, J.W., Younge, A.J., Gentile, A.C., & Foulk, J.W. (2021). ALAMO: Autonomous lightweight allocation, management, and optimization [Conference Poster]. Communications in Computer and Information Science. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85107303666&origin=inward
Gentile, A.C. (2020). AI/ML for HPC Operations [Conference Presentation]. 10.2172/1831568
Tucker, T., Gentile, A.C., & Brandt, J.M. (2020). Supporting Dynamic Event Monitoring in the Lightweight Distributed Metric Service (LDMS) [Conference Poster]. https://www.osti.gov/biblio/1812466
Gentile, A.C. (2020). LDMS v4: Writing Sampler and Store Plugins (updated for LDMSCON2020) [Conference Poster]. https://www.osti.gov/biblio/1812470
Aaziz, O.R., Allan, B.A., Brandt, J.M., Cook, J., Devine, K., Foulk, J.W., Gentile, A.C., Olivier, S.L., Foulk, J.W., & Tucker, T. (2020). Attributing Performance Variation from Integrated Application and System Data [Conference Poster]. https://www.osti.gov/biblio/1765520
Gauntt, N.E., Davis, K., Repik, J.J., Brandt, J.M., Gentile, A.C., & Hammond, S. (2019). Design Installation and Operation of the Vortex ART Platform. 10.2172/1562796
Jha, S., Patke, A., Brandt, J.M., Gentile, A.C., Showerman, M., Roman, E., Kalbarczyk, Z.T., Kramer, B., & Iyer, R.K. (2019). A study of network congestion in two supercomputing high-speed interconnects [Conference Poster]. Proceedings - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85076149891&origin=inward
Brandt, J.M., Brown, C.J., Foulk, J.W., Gentile, A.C., Greenseid, J., Kramer, W., Langer, P., Rashid, A., Rhem, K., & Showerman, M. (2019). Exploring New Monitoring and Analysis Capabilities on Cray?s Software Preview System [Conference Poster]. https://www.osti.gov/biblio/1639961
Brandt, J.M., Brown, C.J., Foulk, J.W., Gentile, A.C., Greenseid, J., Kramer, W., Langer, P., Rashid, A., Rhem, K., & Showerman, M. (2019). Exploring New Monitoring and Analysis Capabilities on Cray's Software Preview System (Final Version) [Conference Poster]. https://www.osti.gov/biblio/1640116
Kramer, B., Bauer, G., Bode, B., Showerman, M., Enos, J., Saxton, A., Jha, S., Kalbarczyk, Z., Iyer, R., Brandt, J.M., & Gentile, A.C. (2018). Holistic Measurement Driven System Assessment [Presentation]. https://www.osti.gov/biblio/1592279
Gentile, A.C., & Brandt, J.M. (2018). Application and System Performance Metrics [Presentation]. https://www.osti.gov/biblio/1594278
Ahlgren, V., Andersson, S., Brandt, J.M., Cardo, N., Chunduri, S., Enos, J., Fields, P., Gentile, A.C., Gerber, R., Gienger, M., Greenseid, J., Greiner, A., Hadri, B., He, Y., Hoppe, D., Kaila, U., Kelly, K., Klein, M., Kristiansen, A., … Williams, J. (2018). Large-Scale System Monitoring Experiences and Recommendations [Conference Poster]. 10.1109/CLUSTER.2018.00069
Izadpanah, R., Naksinehaboon, N., Brandt, J.M., Gentile, A.C., & Dechev, D. (2018). Integrating low-latency analysis into HPC system monitoring [Conference Poster]. ACM International Conference Proceeding Series. 10.1145/3225058.3225086
Ahlgren, V., Andersson, S., Brandt, J.M., Cardo, N., Chunduri, S., Enos, J., Fields, P., Gentile, A.C., Gerber, R., Gienger, M., Greenseid, J., Greiner, A., Hadri, B., He, Y., Hoppe, D., Kaila, U., Kelly, K., Klein, M., Kristiansen, A., … Williams, J. (2018). Large-Scale System Monitoring Experiences and Recommendations [Conference Poster]. 10.1109/CLUSTER.2018.00069
Jha, S., Brandt, J.M., Gentile, A.C., Kalbarczyk, Z., & Iyer, R. (2018). Characterizing Supercomputer Traffic Networks Through Link-Level Analysis [Conference Poster]. 10.1109/CLUSTER.2018.00072
Brandt, J.M., Tucker, T., & Gentile, A.C. (2018). OVIS Update 08/24/18 [Presentation]. https://www.osti.gov/biblio/1583057
Brandt, J.M., Gentile, A.C., Hammond, S., Cook, J., Allan, B.A., Tucker, T., Naksinehaboon, N., Taerat, N., Cook, J., Aaziz, O.R., Ates, E., Tuncer, O., Egele, M., Turk, A., Coskun, A., Izadpanah, R., & Dechev, D. (2018). Application Performance Insights via System Monitoring [Presentation]. https://www.osti.gov/biblio/1532642
Ahlgren, V., Andersson, S., Brandt, J.M., Cardo, N., Chunduri, S., Enos, J., Fields, P., Gentile, A.C., Gerber, R., Greenseid, J., Greiner, A., Hadri, B., He, Y., Hoppe, D., Kaila, U., Kelly, K., Klein, M., Kristiansen, A., Leak, S., … Williams, J. (2018). Cray System Monitoring: Successes Requirements and Priorities [Conference Poster]. https://www.osti.gov/biblio/1515628
Brandt, J.M., Enos, J., & Gentile, A.C. (2018). Application Performance Insights via System Monitoring [Conference Poster]. https://www.osti.gov/biblio/1515734
Leak, S., Greiner, A., Brandt, J.M., & Gentile, A.C. (2018). Supporting Failure Analysis with Discoverable Annotated Log Datasets [Conference Poster]. https://www.osti.gov/biblio/1515735
Ahlgren, V., Andersson, S., Brandt, J.M., Cardo, N., Chunduri, S., Enos, J., Fields, P., Gentile, A.C., Gerber, R., Greenseid, J., Greiner, A., Hadri, B., He, Y., Hoppe, D., Kaila, U., Kelly, K., Klein, M., Kristiansen, A., Leak, S., … Williams, J. (2018). Cray System Monitoring: Successes Requirements and Priorities [Conference Poster]. https://www.osti.gov/biblio/1508917
Brandt, J.M., Gentile, A.C., Cook, J., Allan, B.A., Cook, J., Aaziz, O.R., Tucker, T., Nichamon, N., Taerat, N., Ates, E., Tuncer, O., Egele, M., Turk, A., & Coskun, A. (2018). Runtime HPC System and Application Performance Assessment and Diagnostics [Conference Poster]. https://www.osti.gov/biblio/1500155
Hammond, S., Trott, C.R., Ibanez-Granados, D.A., Edwards, H.C., Sunderland, D., Ellingwood, N.D., Brandt, J.M., Gentile, A.C., Cook, J., & Hoekstra, R.J. (2018). Enhanced Profiling for Kokkos Applications [Conference Poster]. https://www.osti.gov/biblio/1495756
Brandt, J.M., Hammond, S., Tucker, T., Gentile, A.C., & Cook, J. (2018). Continuous Performance Tracking for Kokkos Applications Using LDMS [Presentation]. https://www.osti.gov/biblio/1495760
Allan, B.A., Schmitz, M.E., Walsh, E.J., Aguilar, M.J., Brandt, J.M., Gentile, A.C., Ogden, J.B., Monk, S.T., & Noe, J.P. (2017). Live feed Sandia CAPVIZ HPC cluster performance analysis & visualization demonstration [Conference Poster]. https://www.osti.gov/biblio/1511131
Gentile, A.C. (2017). Dynamic Assessment and Feedback [Conference Poster]. https://www.osti.gov/biblio/1484090
Jha, S., Brandt, J.M., Gentile, A.C., Kalbarczyk, Z., Bauer, G., Enos, J., Showerman, M., Kaplan, L., Bode, B., Greiner, A., Bonnie, A., Mason, M., Iyer, R.K., & Kramer, W. (2017). Holistic measurement-driven system assessment [Conference Poster]. Proceedings - IEEE International Conference on Cluster Computing, ICCC. 10.1109/CLUSTER.2017.124
Hoekstra, R.J., Hammond, S., Hemmert, K.S., Gentile, A.C., Oldfield, R., Lang, M., & Martin, S. (2017). Final Review of FY17 ASC CSSE L2 Milestone #6018 entitled "Analyzing Power Usage Characteristics of Workloads Running on Trinity". 10.2172/1395433
Devine, K., Brandt, J.M., Deveci, M., Gentile, A.C., Leung, V.J., Olivier, S.L., Foulk, J.W., Rajamanickam, S., & Taylor, M.A. (2017). Task Placement to Reduce Application Communication Costs [Presentation]. https://www.osti.gov/biblio/1467790
Jha, S., Brandt, J.M., Gentile, A.C., Karlbarczyk, Z., Bauer, G., Enos, J., Showerman, M., Kaplan, L., Bode, B., Greiner, A., Bonnie, A., Mason, M., Iyer, R., & Kramer, W. (2017). Holistic Measurement Driven System Assessment [Conference Poster]. 10.1109/CLUSTER.2017.124
Brandt, J.M., & Gentile, A.C. (2017). Discovering Metrics of Network Contention [Conference Poster]. https://www.osti.gov/biblio/1455357
Deconinck, A., Nam, H.A., Mortin, D., Bonnie, A., Lueninghoener, C., Brandt, J.M., Gentile, A.C., Foulk, J.W., Agelastos, A.M., Vaughan, C.T., Hammond, S., Allan, B.A., Davis, M., & Repik, J.J. (2017). Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II [Conference Poster]. https://www.osti.gov/biblio/1457978
Formicola, V., Jha, S., Chen, D., Dong, W., Bonnie, A., Mason, M., Brandt, J.M., Gentile, A.C., Kaplan, L., Repik, J.J., Enos, J., Showerman, M., Greiner, A., Kalbarczyk, Z., Iyer, R., & Kramer, B. (2017). Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo [Conference Poster]. https://www.osti.gov/biblio/1458099
Deconinck, A., Nam, H.A., Morton, D., Bonnie, A., Lueninghoener, C., Brandt, J.M., Gentile, A.C., Foulk, J.W., Agelastos, A.M., Vaughan, C.T., Hammond, S., Allan, B.A., Davis, M., & Repik, J.J. (2017). Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II (Paper) [Conference Poster]. https://www.osti.gov/biblio/1458138
Gentile, A.C., Brandt, J.M., Agelastos, A.M., Lamb, J.M., Ruggirello, K.P., & Stevenson, J.O. (2017). Contention and Congestion: Challenges and Approaches to Understanding Application Impact [Conference Poster]. https://www.osti.gov/biblio/1425315
Grant, R., Groves, T.L., Foulk, J.W., Gentile, A.C., & Arnold, D. (2017). Understanding and Avoiding Performance Variability in High Performance Networks [Conference Poster]. https://www.osti.gov/biblio/1424876
Agelastos, A.M., Brandt, J.M., Gentile, A.C., Lamb, J.M., Ruggirello, K.P., & Stevenson, J.O. (2016). Defining Metrics to Distill Large-Scale HPC Platform and Application Performance Data into Actionable Quantities ? Resource Contention of File System and Aries Interconnect [Presentation]. https://www.osti.gov/biblio/1806302
Brandt, J.M., & Gentile, A.C. (2016). Discovery interpretation and communication of meaningful information in HPC monitoring data [Presentation]. https://www.osti.gov/biblio/1428043
Agelastos, A.M., Allan, B.A., Brandt, J.M., Gentile, A.C., Lefantzi, S., Monk, S.T., Ogden, J.B., Rajan, M., & Stevenson, J.O. (2016). Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems. Parallel Computing, 58, pp. 90-106. 10.1016/j.parco.2016.05.009
Agelastos, A.M., Brandt, J.M., Gentile, A.C., Lamb, J.M., Ruggirello, K.P., & Stevenson, J.O. (2016). High Performance Computing Metrics to Enable Application-Platform Communication. 10.2172/1562429
Brandt, J.M., Gentile, A.C., Showerman, M., Enos, J., Fullop, J., & Bauer, G. (2016). Large-scale persistent numerical data source monitoring system experiences [Conference Poster]. Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016. 10.1109/IPDPSW.2016.188
Bauer, G., Brandt, J.M., Gentile, A.C., Kot, A., & Showerman, M. (2016). Dynamic Machine Specific Register (MSR) Data Collection as a System Service [Conference Poster]. https://www.osti.gov/biblio/1367074
Brandt, J.M., Froese, E., Gentile, A.C., Kaplan, L., Allan, B.A., & Walsh, E.J. (2016). Network Performance Counter Monitoring and Analysis on the Cray XC Platform [Conference Poster]. https://www.osti.gov/biblio/1367075
Bauer, G., Brandt, J.M., Gentile, A.C., Kot, A., & Showerman, M. (2016). Dynamic Machine Specific Register (MSR) Data Collection as a System Service [Conference Poster]. https://www.osti.gov/biblio/1422086
Brandt, J.M., Froese, E., Gentile, A.C., Kaplan, L., Allan, B.A., & Walsh, E.J. (2016). Network Performance Counter Monitoring and Analysis on the Cray XC Platform [Conference Poster]. https://www.osti.gov/biblio/1422085
Brandt, J.M., Gentile, A.C., Allan, B.A., Lefantzi, S., & Aguilar, M.J. (2016). Monitoring High Speed Network Fabrics: Experiences and Needs [Conference Poster]. https://www.osti.gov/biblio/1365039
Deconinck, A., Bonnie, A., Kelly, K., Sanchez, S., Martin, C., Mason, M., Brandt, J.M., Gentile, A.C., Allan, B.A., Agelastos, A.M., Davis, M., & Berry, M. (2016). Design and Implementation of a Scalable Monitoring System for Trinity [Conference Poster]. https://www.osti.gov/biblio/1365079
Brandt, J.M., Gentile, A.C., Martin, C., Allan, B.A., & Devine, K. (2016). Smart HPC Centers: data analysis feedback and response [Conference Poster]. https://www.osti.gov/biblio/1365213
Brandt, J.M., Devine, K., & Gentile, A.C. (2015). Infrastructure for in situ system monitoring and application data analysis [Conference Poster]. Proceedings of ISAV 2015: 1st International Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. 10.1145/2828612.2828621
Brandt, J.M., Gentile, A.C., Martin, C., Repik, J.J., & Taerat, N. (2015). New systems, new behaviors, new patterns: Monitoring insights from system standup [Conference Poster]. Proceedings - IEEE International Conference on Cluster Computing, ICCC. 10.1109/CLUSTER.2015.116
Gentile, A.C., Brandt, J.M., Lujan, J., Martin, C., Wright, N., & Butler, T. (2015). Monitoring for Cori and Trinity [Presentation]. https://www.osti.gov/biblio/1576137
Brandt, J.M., Devine, K., & Gentile, A.C. (2015). Infrastructure for In Situ System Monitoring and Application Data Analysis [Conference Poster]. 10.1145/2828612.2828621
Brandt, J.M., Collins, W., Gentile, A.C., Martinez, M., McRee, S., Sands, D., & Yaklin, A.C. (2015). Uncovering Bottlenecks in Data Transfer from a Filesystem to HPSS using the Lightweight Distributed Metric Service (LDMS) [Conference Poster]. https://www.osti.gov/biblio/1325529
Grant, R., Foulk, J.W., & Gentile, A.C. (2015). Overtime: A Tool for Analyzing Performance Variation due to Network Interference [Conference Poster]. 10.1145/2831129.2831133
Brandt, J.M., & Gentile, A.C. (2015). Monitoring and Analysis Tools for Numeric and Log Data [Presentation]. https://www.osti.gov/biblio/1307278
Brandt, J.M., Gentile, A.C., Martin, C., Repik, J.J., & Taerat, N. (2015). New Systems New Behaviors New Patterns: Monitoring Insights from System Standup [Conference Poster]. 10.1109/CLUSTER.2015.116
Brandt, J.M., Debonis, D., Gentile, A.C., Lujan, J., Martin, C., Martinez, D., Olivier, S.L., Foulk, J.W., Taerat, N., & Velarde, R. (2015). Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity [Conference Poster]. https://www.osti.gov/biblio/1248686
Agelastos, A.M., Allan, B.A., Brandt, J.M., Gentile, A.C., Lefantzi, S., Monk, S.T., Ogden, J.B., Rajan, M., & Stevenson, J.O. (2015). Toward Rapid Understanding of Production HPC Applications and Systems [Conference Poster]. https://www.osti.gov/biblio/1248841
Brandt, J.M., & Gentile, A.C. (2015). Scalable Integrated High-Fidelity Continuous Monitoring [Conference Poster]. https://www.osti.gov/biblio/1251361
Brandt, J.M., Debonis, D., Gentile, A.C., Lujan, J., Martin, C., Martinez, D., Olivier, S.L., Foulk, J.W., Taerat, N., & Velarde, R. (2015). Enabling Advanced Operational Analysis Through Multi-Subsystem Data Integration on Trinity [Conference Poster]. https://www.osti.gov/biblio/1251362
Brandt, J.M., Devine, K., Gentile, A.C., & Foulk, J.W. (2015). Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping [Conference Poster]. https://www.osti.gov/biblio/1245921
Brandt, J.M., & Gentile, A.C. (2015). Monitoring Application Resource Utilization on the [Presentation]. https://www.osti.gov/biblio/1238546
Brandt, J.M., Devine, K., Gentile, A.C., Leung, V.J., Olivier, S.L., Foulk, J.W., Rajamanickam, S., Bunde, D.P., Deveci, M., & Catalyurek, U.V. (2014). Using architecture information and real-time resource state to reduce power consumption and communication costs in parallel applications. 10.2172/1158537
Brandt, J.M., Devine, K., Gentile, A.C., & Foulk, J.W. (2014). Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping [Presentation]. 10.1109/CLUSTER.2014.6968670
Brandt, J.M., & Gentile, A.C. (2014). Lightweight Distributed Metric Service (LDMS) [Presentation]. https://www.osti.gov/biblio/1496673
Brandt, J.M., & Gentile, A.C. (2014). SNL-Monitoring-Overview_talk [Presentation]. https://www.osti.gov/biblio/1496544
Brandt, J.M., Devine, K., Gentile, A.C., & Foulk, J.W. (2014). Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping [Conference Poster]. 10.1109/CLUSTER.2014.6968670
Grant, R., Pedretti, K., & Gentile, A.C. (2014). Overtime: A Benchmark for Analyzing Performance Variation due to Network Interference [Conference]. 10.1145/2831129.2831133
Brandt, J.M., Gentile, A.C., & Allan, B.A. (2014). Large Scale System Monitoring and Analysis on Blue Waters using OVIS [Conference]. https://www.osti.gov/biblio/1142598
Agelastos, A.M., Allan, B.A., Brandt, J.M., Gentile, A.C., Monk, S.T., Ogden, J.B., Rajan, M., & Stevenson, J.O. (2014). The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications [Conference]. https://doi.org/10.1109/SC.2014.18
Agelastos, A.M., Allan, B.A., Brandt, J.M., Gentile, A.C., Monk, S.T., Ogden, J.B., Rajan, M., & Stevenson, J.O. (2014). Toward Rapid Understanding of Production HPC Applications and Systems [Conference]. 10.1109/CLUSTER.2015.71
Brandt, J.M., & Gentile, A.C. (2014). Large Scale HPC Monitoring [Presentation]. https://www.osti.gov/biblio/1706394
Brandt, J.M., Gentile, A.C., & Allan, B.A. (2014). Large Scale System Monitoring and Analysis on Blue Waters using OVIS (presentation) [Conference]. https://www.osti.gov/biblio/1143220
Agelastos, A.M., Allan, B.A., Brandt, J.M., Cassella, P., Enos, J., Fullop, J., Gentile, A.C., Monk, S.T., Naksinehaboon, N., Ogden, J.B., Rajan, M., Showerman, M., Stevenson, J.O., Taerat, N., & Tucker, T.O. (2014). The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications [Conference Poster]. International Conference for High Performance Computing, Networking, Storage and Analysis, SC. 10.1109/SC.2014.18
Brandt, J.M., & Gentile, A.C. (2013). Lightweight Distributed Metric Service (LDMS): Run-time Resource Utilization Monitoring [Conference]. https://www.osti.gov/biblio/1106397
Gentile, A.C. (2013). LANL_Monitoring_summit_7-23-2013 [Presentation]. https://www.osti.gov/biblio/1666201
Brandt, J.M., & Gentile, A.C. (2013). Copy of High Fidelity Data Collection and Transport Service Applied to the Cray XE6/XK6 (Overheads) [Conference]. https://www.osti.gov/biblio/1078604
Brandt, J.M., & Gentile, A.C. (2013). High Fidelity Data Collection and Transport Service Applied to the Cray XE6/XK6 (Paper) [Conference]. https://www.osti.gov/biblio/1078653
Gentile, A.C. (2013). OVIS Suite of Tools [Presentation]. https://www.osti.gov/biblio/1660615
Brandt, J.M., & Gentile, A.C. (2013). LDMS: Lightweight Distributed Metric Service for HPC Monitoring [Conference]. https://www.osti.gov/biblio/1063385
Brandt, J.M., & Gentile, A.C. (2012). SOS_High_Level_Documentation [Presentation]. https://www.osti.gov/biblio/1649758
Gentile, A.C., & Brandt, J.M. (2012). Lightweight Distributed Metric Service [Presentation]. https://www.osti.gov/biblio/1648269
Gentile, A.C., & Brandt, J.M. (2012). Copy of Lightweight Distributed Metric Service [Presentation]. https://www.osti.gov/biblio/1686350
Barrett, B., Kelly, S.M., Klundt, R.A., Laros, J.H., Leung, V.J., Levenhagen, M., Lofstead, G.F., Moreland, K.D., Oldfield, R., Pedretti, K., Rodrigues, A., Barrett, R.F., Ward, H.L., Vandyke, J.P., Vaughan, C.T., Wheeler, K.B., Brandt, J.M., Brightwell, R.B., Curry, M.L., … Hemmert, K.S. (2012). Demonstration of a Legacy Application's Path to Exascale - ASC L2 Milestone 4467 [Presentation]. https://www.osti.gov/biblio/1688616
Barrett, B., Kelly, S.M., Klundt, R.A., Laros, J.H., Leung, V.J., Levenhagen, M., Lofstead, G.F., Moreland, K.D., Oldfield, R., Pedretti, K.T.T., Rodrigues, A., Barrett, R.F., Thompson, D., Ward, H.L., Vandyke, J.P., Vaughan, C.T., Wheeler, K.B., Brandt, J.M., Brightwell, R.B., … Hemmert, K.S. (2012). Report of experiments and evidence for ASC L2 milestone 4467: demonstration of a legacy application's path to exascale. 10.2172/1039013
Thompson, D., Mayo, J.R., Brandt, J.M., Gentile, A.C., & Wong, M.H. (2012). Modeling Failures in Large-Scale Computer Systems [Presentation]. https://www.osti.gov/biblio/1658072
Brandt, J.M., Gentile, A.C., & Thompson, D. (2011). Develop feedback system for intelligent dynamic resource allocation to improve application performance. 10.2172/1029818
Taerat, N., Brandt, J., Gentile, A.C., Wong, M.H., & Leangsuksun, C. (2011). Baler: Deterministic, lossless log message clustering tool [Conference]. Computer Science - Research and Development. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=80051666459&origin=inward
Brandt, J.M., Chen, F.X., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2011). Framework for Enabling System Understanding [Conference]. https://www.osti.gov/biblio/1107192
Gentile, A.C., Brandt, J.M., & Wong, M.H. (2011). Cleansed Glory Dataset [Presentation]. https://www.osti.gov/biblio/1671903
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., & Wong, M.H. (2010). Scalable HPC monitoring and analysis for understanding and automated response [Conference]. https://www.osti.gov/biblio/1028465
Brandt, J.M., Gentile, A.C., Houf, C.A., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). OVIS 3.2 user's guide. 10.2172/1010855
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pébay, P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example [Conference]. Proceedings of the International Conference on Dependable Systems and Networks. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=77956576732&origin=inward
Brandt, J.M., Gentile, A.C., Roe, D.C., Pebay, P.P., & Wong, M.H. (2010). Understanding large scale HPC systems through scalable monitoring and analysis [Conference]. https://www.osti.gov/biblio/1028363
Brandt, J.M., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). The OVIS analysis architecture. 10.2172/993631
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Copy of Combining Virtualization Resource Characterization and Resource Management to Enable Efficient High Performance Compute Platforms Through Intelligent Dynamic Resource Allocation [Conference]. https://www.osti.gov/biblio/1123732
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Copy of Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems [Conference]. https://www.osti.gov/biblio/1123734
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Copy of Copy of Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems [Conference]. https://www.osti.gov/biblio/1123727
Brandt, J.M., Gentile, A.C., Mayo, J.R., Pebay, P.P., Wong, M.H., de Sapio, V., & Roe, D.C. (2010). Scalable modeling and analysis for resilience [Conference]. https://www.osti.gov/biblio/1017005
Brandt, J.M., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., & Wong, M.H. (2010). Are there observable precursors to HPC platform failures? [Conference]. https://www.osti.gov/biblio/1014644
Brandt, J.M., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., & Wong, M.H. (2010). Are there observable precursors to HPC platform resource failures? [Conference]. https://www.osti.gov/biblio/1014643
de Sapio, V., Brandt, J.M., Gentile, A.C., Kegelmeyer, W.P., Mayo, J.R., Pebay, P.P., Roe, D.C., & Wong, M.H. (2010). A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data [Conference]. https://www.osti.gov/biblio/1002065
Brandt, J.M., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Scalable Information Fusion for Fault Tolerance in Large-Scale HPC [Conference]. https://www.osti.gov/biblio/1124333
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2010). Combining Virtualization Resource Characterization and Resource Management to Enable Efficient High Performance Compute Platforms Through Intelligent Dynamic Resource Allocation [Conference]. https://www.osti.gov/biblio/1124395
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems [Conference]. https://www.osti.gov/biblio/1141760
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Data Fusion and Statistical Analysis: Piercing the Darkness of the Black Box [Conference]. https://www.osti.gov/biblio/1141377
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Interactive Data Fusion Capabilities for Large-Scale Compute Cluster Architects and Administrators [Conference]. https://www.osti.gov/biblio/1141611
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Resource Health Characterizations for Interactive and Autonomous Proactive System Administration and Scheduling Decisions [Conference]. https://www.osti.gov/biblio/1141399
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Quantifying failure prediction in large scale HPC systems: A case study [Conference]. https://www.osti.gov/biblio/1141919
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Scalable Information Fusion for Fault Tolerance in Large-Scale HPC [Conference]. https://www.osti.gov/biblio/1142166
Brandt, J.M., Chen, F.X., de Sapio, V., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Quantifying Failure Prediction in Large Scale HPC Systems: A Case Study [Conference]. https://www.osti.gov/biblio/1141371
Brandt, J.M., Wong, M.H., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., & Thompson, D. (2009). Resource Monitoring and Management with OVIS to Enable HPC in Cloud Computing Environments [Conference]. https://www.osti.gov/biblio/1142202
Adalsteinsson, H., Brandt, J.M., Gentile, A.C., Debusschere, B., Mayo, J.R., Pebay, P.P., Thompson, D., & Wong, M.H. (2009). Combining System Characterization and Novel Execution Modles to Achieve Scalable Robust Computing [Conference]. https://www.osti.gov/biblio/1142087
Brandt, J.M., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). Copy of Methodologies for Advance Warning of Compute Cluster Problems via Statistical Analysis: A Case Study (Conference Presentation) [Conference]. https://www.osti.gov/biblio/1142252
Brandt, J.M., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2009). OVIS 2.0 user%3CU%2B2019%3Es guide. 10.2172/1028957
Brandt, J.M., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., & Wong, M.H. (2009). Methodologies for advance warning of compute cluster problems via statistical analysis : a case study [Conference]. https://www.osti.gov/biblio/950658
Brandt, J.M., Gentile, A.C., Mayo, J.R., Pebay, P.P., Roe, D.C., Thompson, D., & Wong, M.H. (2008). Resource Monitoring and Management with OVIS to Enable HPC in Cloud Computing Environments [Conference]. https://www.osti.gov/biblio/1142365
Gentile, A.C., Kegelmeyer, W.P., & Ulmer, C. (2008). FCLib: The Feature Characterization Library. 10.2172/1130401
Adalsteinsson, H., Armstrong, R.C., Chiang, K., Gentile, A.C., Lloyd, L., Minnich, R.G., Vanderveen, K., Vanrandwyk, J., & Rudish, D.W. (2008). Using Emulation and Simulation to Understand the Large-scale Behavior of the Internet. 10.2172/1130403
Brandt, J.M., Gentile, A.C., Wong, M.H., Thompson, D., Pebay, P.P., Debusschere, B., & Mayo, J.R. (2008). OVIS-2: A Robust Distributed Architecture for Scalable RAS [Conference]. https://www.osti.gov/biblio/1145785
Wong, M.H., Thompson, D., Pebay, P.P., Mayo, J.R., Gentile, A.C., Debusschere, B., & Brandt, J.M. (2007). OVIS-2: A Robust Distributed Architecture for Scalable RAS [Conference]. https://www.osti.gov/biblio/1146338
Brandt, J.M., Gentile, A.C., Pebay, P.P., Thompson, D., Wong, M.H., Debusschere, B., & Mayo, J.R. (2007). Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems [Conference]. https://www.osti.gov/biblio/1146097
Brandt, J.M., Gentile, A.C., Pebay, P.P., Thompson, D., Wong, M.H., & Jolly, J. (2007). OVIS reliably monitors computers using novel parallel calculations [Presentation]. https://www.osti.gov/biblio/1721458
Pebay, P.P., Brandt, J.M., Gentile, A.C., & Wong, M.H. (2006). Monitoring computational clusters with OVIS. 10.2172/899078
Gentile, A.C., Wong, M.H., & Brandt, J.M. (2006). OVIS: A Tool for Intelligent Real-time Monitoring of Computational Clusters [Conference]. https://www.osti.gov/biblio/1264637
Gentile, A.C., Wong, M.H., & Brandt, J.M. (2006). OVIS: A Tool for Intelligent Real-time Monitoring of Computational Clusters [Conference]. https://www.osti.gov/biblio/1264638
Gentile, A.C., & Kegelmeyer, W.P. (2006). Extracting Information from Data: Ease Data Analysis Development with FCLib [Conference]. https://www.osti.gov/biblio/1142260
Gentile, A.C., Marzouk, Y.M., & Pebay, P.P. (2005). Meaningful statistical analysis of large computational clusters. 10.2172/958384
Brandt, J.M., Marzouk, Y.M., Pebay, P.P., & Gentile, A.C. (2005). Meaningful statistical analysis of large computational clusters [Conference]. https://www.osti.gov/biblio/876355