Scott Larson Nicoll Levy
Scalable System Software

Scalable System Software
(505) 844-7292
Sandia National Laboratories, New Mexico
P.O. Box 5800
Albuquerque, NM 87185-1319
Biography
I am a Senior Member of Technical Staff in the Scalable System Software department of the Center for Computing Research (CCR). I research system software for next-generation extreme-scale systems. Specifically, I study the impact of system failures, and other sources of performance interference, on the execution of scientific simulations. I am also investigating application performance in power-constrained environments. I earned my Ph.D. from the University of New Mexico, where I worked with Prof. Patrick Bridges in the Scalable Systems Lab. At Sandia, I work with Kurt Ferreira, Patrick Widener and the 9lives research group on improving the resilience and fault tolerance of large-scale parallel systems.
Education
- Ph.D., Computer Science, University of New Mexico
- B.S., Electrical Engineering, Cornell University
Publications
Stephen Lecler Olivier, Ronald B. Brightwell, Matthew Dosanjh, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Andrew J Younge, (2022). SNL ATDM Software Ecosystem Then and Now: Operating Systems and On-Node Runtime 2023 Exascale Computing Project Annual Meeting Document ID: 1677926
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Joshua David Hemmert, Kevin Pedretti, (2022). Understanding Memory Failures on a Petascale Arm System The 31st International Symposium on High-Performance Parallel and Distributed Computing Document ID: 1527788
Stephen Lecler Olivier, Ronald B. Brightwell, Matthew Dosanjh, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Andrew J Younge, (2022). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime 2022 Exascale Computing Project Annual Meeting (Virtual) Document ID: 1505231
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2022). Characterizing Failures in HPC Using Benford?s Law The SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP22) Document ID: 1471261
Sara Karamati, Clayton Hughes, Karl Scott Hemmert, Ryan E. Grant, William Whitney Schonbein, Scott Larson Nicoll Levy, Thomas M. Conte, Jeffrey Young, Richard W. Buduc, (2022). "Smarter" NICs for Faster Molecular Dynamics: A Case Study 36th IEEE International Parallel & Distributed Processing Symposium Document ID: 1470639
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law FTXS 2021 Workshop on Fault Tolerance for HPC at eXtreme Scale held in conjuction with SC21 Document ID: 1381184
Keira Haskins, Bridges, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications ExaMPI21Workshop on Exascale MPI Document ID: 1370401
Keira Haskins, Patrick Bridges, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications ExaMPI21Workshop on Exascale MPI Document ID: 1380992
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Memory Failures Using Benford?s Law 14th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids Document ID: 1357464
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS 2021) Document ID: 1356401
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Evaluating MPI Resource Usage Summary Statistics Journal of Parallel Computing https://www.osti.gov/search/identifier:1822241 Document ID: 1344897
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Victor G. Kuhns, Nathan DeBardelaben, Sean Blanchard, (2021). Understanding the Effects of DRAM Correctable Error Logging at Scale IEEE Cluster Conference Document ID: 1343103
William Pepper Marts, Matthew Dosanjh, William Whitney Schonbein, Scott Larson Nicoll Levy, Ryan Eric Grant, Patrick Bridges, (2021). MiniMod: A Modular Miniapplication Benchmarking Framework for HPC IEEE Cluster 2021 Document ID: 1331961
Luke Logan, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Xian-He Sun, Anthony Kougkas, (2021). pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory REX-IO at IEEE Cluster 2021 Document ID: 1331395
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2021). An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation Workshop on Resource Arbitration for Dynamic Runtimes (RADR) https://www.osti.gov/search/identifier:1869756 Document ID: 1307404
Stephen Lecler Olivier, Ronald B. Brightwell, Kurt Brian Ferreira, Ryan Eric Grant, Scott Larson Nicoll Levy, Kevin Pedretti, Andrew J Younge, (2021). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime 2021 Exascale Computing Project Annual Meeting (Virtual) https://www.osti.gov/search/identifier:1861479 Document ID: 1293055
Ryan Eric Grant, Scott Larson Nicoll Levy, William Whitney Schonbein, (2021). Co-design of System Software for Compute Accelerators and SmartNICs ASCR Workshop on Reimagining Codesign https://www.osti.gov/search/identifier:1847622 Document ID: 1269039
Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2020). Examining the Impact of Approximate Coordination on Checkpoint/Restart https://ckpt-symposium.lbl.gov/home Document ID: 1254795
William Whitney Schonbein, Ryan Eric Grant, Scott Larson Nicoll Levy, Matthew Dosanjh, William Pepper Marts, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking International Conference on High Performance Computing and Communication (HPCC) Document ID: 1243267
William Whitney Schonbein, Scott Larson Nicoll Levy, William Pepper Marts, Matthew Dosanjh, Ryan Eric Grant, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking International Conference on High Performance Computing and Communications (HPCC) https://www.osti.gov/search/identifier:1830913 Document ID: 1231990
Ryan Eric Grant, William Whitney Schonbein, Scott Larson Nicoll Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs International Conference for High Performance Computing, Networking, Storage and Analysis (SC) https://www.osti.gov/search/identifier:1825980 Document ID: 1209354
Ryan Eric Grant, Whit Schonbein, Scott Larson Nicoll Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware https://www.osti.gov/search/identifier:1825981 Document ID: 1209357
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2020). Evaluating MPI Message Size Summary Statistics EuroMPI/USA ’20 https://www.osti.gov/search/identifier:1825984 Document ID: 1209370
Gary J. Templet Jr., Matthew R. Glickman, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Jeff Mauldin, Thomas Jay Otahal, Craig D. Ulmer, Patrick Widener, Ron A. Oldfield, (2020). FY20 CSSE L2 Milestone 7186 Completion of L2 Milestone 7186 https://www.osti.gov/search/identifier:1820290 Document ID: 1196144
Gary J. Templet Jr., Matthew R. Glickman, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Jeff Mauldin, Thomas Jay Otahal, Craig D. Ulmer, Patrick Widener, Ron A. Oldfield, (2020). Data Services for Visualization and Analysis ? ASC Level II Milestone (7186) https://www.osti.gov/search/identifier:1663267 Document ID: 1196150
Ronald B. Brightwell, Kurt Brian Ferreira, Ryan Eric Grant, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Stephen Lecler Olivier, Kevin Pedretti, Andrew J Younge, Ann C. Gentile, Bradley Keith Brandt, (2020). ALAMO: Autonomous Lightweight Allocation, Management and Optimization Smoky Mountains Computational Sciences and Engineering Conference https://www.osti.gov/search/identifier:1818044 Document ID: 1195366
Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Todd Henry Kordenbrock, (2020). The Case for Explicit Reuse Semantics for RDMA Communication Workshop on Scalable Networks for Advanced Computing Systems (SNACS) https://www.osti.gov/search/identifier:1771921 Document ID: 1104627
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2019). Evaluating Tradeoffs Between MPI Message Matching Offload Hardware Capacity and Performance EuroMPI’19 26th European MPI Users’ Group Meeting https://www.osti.gov/search/identifier:1641378 Document ID: 996487
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2019). Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption International European Conference on Parallel and Distributed Computing https://www.osti.gov/search/identifier:1641289 Document ID: 985494
Matthew Dosanjh, Ryan Eric Grant, Nathan (LANL) Hjelmn, Scott Larson Nicoll Levy, William Whitney Schonbein, (2019). The Upcoming Storm: The Implications of Increasing Core Count on Scalable System Software https://www.osti.gov/search/identifier:1762669 Document ID: 984485
Patrick Widener, Craig D. Ulmer, Scott Larson Nicoll Levy, Todd Henry Kordenbrock, Gary J. Templet, (2019). Mediating data center storage diversity in HPC applications with FAODEL HPC I/O in the Data Center Workshop (HPC-IODC) https://www.osti.gov/search/identifier:1640775 Document ID: 973832
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Whit Schonbein, Ryan Eric Grant, Matthew Dosanjh, (2019). Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance Parallel ComputingSystems & Applications https://www.osti.gov/search/identifier:1502976 Document ID: 937350
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Taniya Siddiqua, Nathan DeBardelebe, Vilas Sridharan, Elisabeth Baseman, (2019). Lessons learned from memory errors observed over the lifetime of Cielo SIAM Conference on Computational Science and Engineering (CSE19) https://www.osti.gov/search/identifier:1639464 Document ID: 935561
Kurt Brian Ferreira, Ryan Eric Grant, Michael J. Levenhagen, Scott Larson Nicoll Levy, Taylor Groves, (2019). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design Concurrency and ComputationPractice and Experience https://www.osti.gov/search/identifier:1501630 Document ID: 913436
Stephen Lecler Olivier, Ronald B. Brightwell, Kevin Pedretti, Andrew J Younge, Noah Evans, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Ryan Eric Grant, (2019). SNL ATDM Software Ecosystem 2019 Exascale Computing Project Annual Meeting https://www.osti.gov/search/identifier:1583026 Document ID: 902074
Matthew Tyler Bettencourt, Richard Michael Jack Kramer, Keith Cartwright, Edward Geoffrey Phillips, Curtis C. Ober, Roger P. Pawlowski, Matthew Scot Swan, Irina Kalashnikova Tezaur, Eric T. Phipps, Sidafa Conde, Eric Christopher Cyr, Craig D. Ulmer, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gary J. Templet, Jonathan J. Hu, Paul Lin, Christian Alexander Glusa, Christopher Siefert, Micheal W. Glass, (2018). ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE https://www.osti.gov/search/identifier:1493832 Document ID: 854521
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Nathan Debardeleben, Taniya Siddiqua, Vilas Sridharan, Elisabeth Baseman, (2018). Lessons Learned from Errors Observed over the Lifetime of Cielo Sc18 https://www.osti.gov/search/identifier:1582542 Document ID: 853852
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2018). Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance EuroMPI 2018 https://www.osti.gov/search/identifier:1569677 Document ID: 830451
Scott Larson Nicoll Levy, Kevin Pedretti, Kurt Brian Ferreira, (2018). Open Science on Trinity’s Knights Landing Partition: An Analysis of User Job Data The 14th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS 2018) https://www.osti.gov/search/identifier:1529450 Document ID: 809168
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2018). Characterizing MPI Matching via Trace-based Simulation Parallel Computing https://www.osti.gov/search/identifier:1457519 Document ID: 809042
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2018). Characterizing MPI Matching via Trace-based Simulation Parallel Computing https://www.osti.gov/search/identifier:1444084 Document ID: 807378
Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Patrick Widener, Margaret Rose Lawson, (2018). Faodel: Data Management for Next-Generation Application Workflows ScienceCloud ‘189th Workshop on Scientific Cloud Computing https://www.osti.gov/search/identifier:1514784 Document ID: 797030
Craig D. Ulmer, Todd Henry Kordenbrock, Margaret Rose Lawson, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gregory D. Sjaardema, Gary J. Templet, Harry Lee Ward, Patrick Widener, (2018). SNL ATDM: I/O and Data Management ECP Annual Meeting https://www.osti.gov/search/identifier:1806512 Document ID: 760274
Margaret Rose Lawson, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Todd Henry Kordenbrock, (2017). EMPRESS?Extensible Metadata PRovider for Extreme-scale Scientific Simulations 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems https://www.osti.gov/search/identifier:1513597 Document ID: 727046
Kurt Brian Ferreira, Ryan Eric Grant, Michael J. Levenhagen, Scott Larson Nicoll Levy, Taylor Groves, (2017). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design ExaMPI2017 – Workshop on Exascale MPI 2017 https://www.osti.gov/search/identifier:1511803 Document ID: 726260
Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Patrick Widener, Todd Henry Kordenbrock, Margaret Rose Lawson, (2017). Faodail: Enabling In Situ Analytics for Next-Generation Systems ISAV 2017In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization https://www.osti.gov/search/identifier:1482474 Document ID: 726225
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). The Unexpected Virtue of Almost: Exploiting MPI Collective Operations to Approximately Coordinate Checkpoints ExaMPI2017 – Workshop on Exascale MPI 2017 https://www.osti.gov/search/identifier:1482473 Document ID: 726227
Margaret Rose Lawson, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Todd Henry Kordenbrock, (2017). EMPRESS-Extensible Metadata PRovider for Extreme-scale Scientific Simulations Parallel Data Storage Workshop-Data Intensive Scalable Computing Workshop (PDSW- DISCS?17) https://www.osti.gov/search/identifier:1481718 Document ID: 725717
Rebecca Kreitinger, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis SC17 The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1478158 Document ID: 703829
Rebecca Kreitinger, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis SC17 The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1573776 Document ID: 703831
Craig D. Ulmer, Ron A. Oldfield, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gary J. Templet, Patrick Widener, (2017). ATDM Data Warehouse: Data Management Services for Exascale Computing Sandia CIS ERB https://www.osti.gov/search/identifier:1466487 Document ID: 670434
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2017). Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data 2017 IEEE International Conference on Cluster Computing (CLUSTER) https://www.osti.gov/search/identifier:1463961 Document ID: 659342
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2017). Characterizing MPI Matching via Trace-based Simulation EuroMPI/USA 2017 https://www.osti.gov/search/identifier:1462518 Document ID: 638253
Taniya (AMD) Siddiqua, Vilas (AMD) Sridharan, Steven E. (AMD) Raasch, Nathan (LANL) DeBardeleben, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Elisabeth (LANL) Baseman, Guan Qiang (LANL), (2017). Lifetime Memory Reliability Data from the Field IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems https://www.osti.gov/search/identifier:1506882 Document ID: 637678
Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2017). It’s not the heat, it’s the humidity: scheduling resilience activity at scale 23rd International European Conference On Parallel And Distributed Computing https://www.osti.gov/search/identifier:1367189 Document ID: 624407
Craig D. Ulmer, Craig D. Ulmer, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gregory D. Sjaardema, Gary J. Templet, Patrick Widener, Ron A. Oldfield, (2017). ATDM Data Warehouse Exascale Computing Project Annual Meeting https://www.osti.gov/search/identifier:1427407 Document ID: 577888
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1410251 Document ID: 554663
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar H. Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms The Message Passing Interface (MPI) Users and Developers Conference https://www.osti.gov/search/identifier:1394099 Document ID: 530057
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) https://www.osti.gov/search/identifier:1372148 Document ID: 476343
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar H. Mondragon, (2016). Understanding Performance Interference in Next-Generation HPC Systems The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1372149 Document ID: 476344
Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2016). An Examination of the Impact of Failure Distribution on Coordinated Checkpoint/Restart Fault Tolerance for HPC at Extreme Scale (FTXS) Workshop https://www.osti.gov/search/identifier:1368866 Document ID: 463972
Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2016). Horseshoes and Hand Grenades: The Case for Appoximate Coordination in Local Checkpointing Protocols 9th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids @ EuroPar 2016 https://www.osti.gov/search/identifier:1368839 Document ID: 463931
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick (UNM) Bridges, Oscar (UNM) Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics:Leveraging latent synchronization in MPI collective algorithms EuroMPI 2016The Message Passing Interface (MPI) Users and Developers Conference https://www.osti.gov/search/identifier:1364728 Document ID: 453704
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar Mondragon, (2016). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale Meeting to discuss fault tolerance research with Los Alamos Nat’l Lab staff https://www.osti.gov/search/identifier:1428024 Document ID: 443447
Elisabeth Baseman, Nathan DeBardeleben, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Steven Rassch, Vilas Sridharan, Taniya Siddiqua, Qiang Guan, (2016). Improving DRAM Fault Characterization Through Machine Learning IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) https://www.osti.gov/search/identifier:1365234 Document ID: 432113
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Similarity Engine: Using Content Similarity to Improve Memory Resilience ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) https://www.osti.gov/search/identifier:1239385 Document ID: 387058
Scott Larson Nicoll Levy, (2015). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems https://www.osti.gov/search/identifier:1226922 Document ID: 354819
Oscar H. Mondragon, Patrick G Bridges, Kurt Brian Ferreira, Patrick Widener, Scott Larson Nicoll Levy, (2015). Scheduling In-Situ Analytics in Next-generation Applications 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing https://www.osti.gov/search/identifier:1333466 Document ID: 354815
Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Nathan D. Fabian, (2015). Canaries in a Coal Mine: Using Application-level Checkpoints to Detect Memory Failures 8th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids in connection with Euro-Par 2015 https://www.osti.gov/search/identifier:1256569 Document ID: 286581
Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2015). Similarity Engine: Using Content Similarity to Improve Memory Resilience International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1530987 Document ID: 264517
Patrick Widener, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Torsten Hoefler, (2014). On noise and the performance benefit of nonblocking collectives International Journal of High Performance Computing Applications https://www.osti.gov/search/identifier:1257977 Document ID: 208093
Kurt Brian Ferreira, Scott Larson Nicoll Levy, Patrick Widener, Dorian Arnold, (2014). Using Machine Learning to Optimize Uncoordinated Checkpointing Performance ASCR Machine Learning Workshop https://www.osti.gov/search/identifier:1319751 Document ID: 187255
Kirk A. Rackow, Scott Larson Nicoll Levy, (2014). Exploring the effect of noise on the performance benefit of non-blocking MPI_Allreduce EuroMPI/Asia 2014 https://www.osti.gov/search/identifier:1145671 Document ID: 5336447
Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick Widener, Dorian Arnold, Torsten Hoefler, (2014). Understanding the Effects of Communication and Coordination on Checkpointing at Scale International Conference for high performance computing, networking, storage, and analysis https://www.osti.gov/search/identifier:1142798 Document ID: 5335351
Kirk A. Rackow, Patrick G. Bridges, Scott Larson Nicoll Levy, (2014). Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach The 43rd International Conference on Parallel Processing (ICPP-2014) https://www.osti.gov/search/identifier:1141101 Document ID: 5333630
Kirk A. Rackow, Patrick Widener, Scott Larson Nicoll Levy, Dorian Arnold, Torsten Hoefler, (2014). Understanding the Effects of Communication on Uncoordinated Checkpointing at Scale ACM International Conference on Supercomputing (ICS 2014) https://www.osti.gov/search/identifier:1140761 Document ID: 5331977
Kirk A. Rackow, Scott Larson Nicoll Levy, Bryan Topp, Dorian Arnold, Torsten Hoefler, (2013). Predicting Coordinated and Uncoordinated Checkpoint/Restart Protocol Performance at Extreme Scales 28th IEEE International Parallel & Distributed Processing Symposium https://www.osti.gov/search/identifier:1115087 Document ID: 5329457
Kirk A. Rackow, Scott Larson Nicoll Levy, Ronald B. Brightwell, Dorian Arnold, Patrick Bridges, (2013). A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration DOE/ASCR A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration https://www.osti.gov/search/identifier:1111081 Document ID: 5324016
Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Ronald B. Brightwell, Patrick G. Bridges, Dorian Arnold, (2013). Asking the right questions: benchmarking fault-tolerant extreme-scale systems Workshop on Resiliency in High-Performance Computing https://www.osti.gov/search/identifier:1083655 Document ID: 5323574
Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick G. Bridges, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale https://www.osti.gov/search/identifier:1088091 Document ID: 5321449
Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick Bridges, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale EuroMPI 2013 — The 20th European MPI Users’ Group Meeting https://www.osti.gov/search/identifier:1078709 Document ID: 5321390
Kirk A. Rackow, Kevin Pedretti, Dorian Arnold, Scott Larson Nicoll Levy, Patrick Bridges, (2013). Protect Yourself: Why Your OS Must Protect Against DRAM Failures International Conference on Architectural Support for Programming Languages and OSs https://www.osti.gov/search/identifier:1062878 Document ID: 5318212
Kirk A. Rackow, Aidan P. Thompson, Christian Robert Trott, Patrick G. Bridges, Scott Larson Nicoll Levy, (2013). An Examination of Content Similarity within the Memory of HPC Applications https://www.osti.gov/search/identifier:1088105 Document ID: 5317155
Showing Results.