Scott Larson Nicoll Levy

Scalable System Software

Author profile picture

Scalable System Software

sllevy@sandia.gov

(505) 844-7292

Sandia National Laboratories, New Mexico
P.O. Box 5800
Albuquerque, NM 87185-1319

Biography

I am a Senior Member of Technical Staff in the Scalable System Software department of the Center for Computing Research (CCR). I research system software for next-generation extreme-scale systems. Specifically, I study the impact of system failures, and other sources of performance interference, on the execution of scientific simulations. I am also investigating application performance in power-constrained environments. I earned my Ph.D. from the University of New Mexico, where I worked with Prof. Patrick Bridges in the Scalable Systems Lab. At Sandia, I work with Kurt Ferreira, Patrick Widener and the 9lives research group on improving the resilience and fault tolerance of large-scale parallel systems.

Education

  • Ph.D., Computer Science, University of New Mexico
  • B.S., Electrical Engineering, Cornell University

Publications

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Joshua David Hemmert, Kevin Pedretti, (2022). Understanding Memory Failures on a Petascale Arm System The 31st International Symposium on High-Performance Parallel and Distributed Computing Document ID: 1527788

Stephen Lecler Olivier, Ronald B. Brightwell, Matthew Dosanjh, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Andrew J Younge, (2022). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime 2022 Exascale Computing Project Annual Meeting (Virtual) Document ID: 1505231

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2022). Characterizing Failures in HPC Using Benford?s Law The SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP22) Document ID: 1471261

Sara Karamati, Clayton Hughes, Karl Scott Hemmert, Ryan E. Grant, William Whitney Schonbein, Scott Larson Nicoll Levy, Thomas M. Conte, Jeffrey Young, Richard W. Buduc, (2022). "Smarter" NICs for Faster Molecular Dynamics: A Case Study 36th IEEE International Parallel & Distributed Processing Symposium Document ID: 1470639

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law FTXS 2021 Workshop on Fault Tolerance for HPC at eXtreme Scale held in conjuction with SC21 Document ID: 1381184

Keira Haskins, Bridges, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications ExaMPI21Workshop on Exascale MPI Document ID: 1370401

Keira Haskins, Patrick Bridges, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications ExaMPI21Workshop on Exascale MPI Document ID: 1380992

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Memory Failures Using Benford?s Law 14th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids Document ID: 1357464

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS 2021) Document ID: 1356401

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2021). Evaluating MPI Resource Usage Summary Statistics Journal of Parallel Computing https://www.osti.gov/search/identifier:1822241 Document ID: 1344897

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Victor G. Kuhns, Nathan DeBardelaben, Sean Blanchard, (2021). Understanding the Effects of DRAM Correctable Error Logging at Scale IEEE Cluster Conference Document ID: 1343103

William Pepper Marts, Matthew Dosanjh, William Whitney Schonbein, Scott Larson Nicoll Levy, Ryan Eric Grant, Patrick Bridges, (2021). MiniMod: A Modular Miniapplication Benchmarking Framework for HPC IEEE Cluster 2021 Document ID: 1331961

Luke Logan, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Xian-He Sun, Anthony Kougkas, (2021). pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory REX-IO at IEEE Cluster 2021 Document ID: 1331395

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2021). An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation Workshop on Resource Arbitration for Dynamic Runtimes (RADR) https://www.osti.gov/search/identifier:1869756 Document ID: 1307404

Stephen Lecler Olivier, Ronald B. Brightwell, Kurt Brian Ferreira, Ryan Eric Grant, Scott Larson Nicoll Levy, Kevin Pedretti, Andrew J Younge, (2021). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime 2021 Exascale Computing Project Annual Meeting (Virtual) https://www.osti.gov/search/identifier:1861479 Document ID: 1293055

Ryan Eric Grant, Scott Larson Nicoll Levy, William Whitney Schonbein, (2021). Co-design of System Software for Compute Accelerators and SmartNICs ASCR Workshop on Reimagining Codesign https://www.osti.gov/search/identifier:1847622 Document ID: 1269039

Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2020). Examining the Impact of Approximate Coordination on Checkpoint/Restart https://ckpt-symposium.lbl.gov/home Document ID: 1254795

William Whitney Schonbein, Ryan Eric Grant, Scott Larson Nicoll Levy, Matthew Dosanjh, William Pepper Marts, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking International Conference on High Performance Computing and Communication (HPCC) Document ID: 1243267

William Whitney Schonbein, Scott Larson Nicoll Levy, William Pepper Marts, Matthew Dosanjh, Ryan Eric Grant, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking International Conference on High Performance Computing and Communications (HPCC) https://www.osti.gov/search/identifier:1830913 Document ID: 1231990

Ryan Eric Grant, William Whitney Schonbein, Scott Larson Nicoll Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs International Conference for High Performance Computing, Networking, Storage and Analysis (SC) https://www.osti.gov/search/identifier:1825980 Document ID: 1209354

Ryan Eric Grant, Whit Schonbein, Scott Larson Nicoll Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware https://www.osti.gov/search/identifier:1825981 Document ID: 1209357

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2020). Evaluating MPI Message Size Summary Statistics EuroMPI/USA ’20 https://www.osti.gov/search/identifier:1825984 Document ID: 1209370

Gary J. Templet Jr., Matthew R. Glickman, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Jeff Mauldin, Thomas Jay Otahal, Craig D. Ulmer, Patrick Widener, Ron A. Oldfield, (2020). FY20 CSSE L2 Milestone 7186 Completion of L2 Milestone 7186 https://www.osti.gov/search/identifier:1820290 Document ID: 1196144

Gary J. Templet Jr., Matthew R. Glickman, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Jeff Mauldin, Thomas Jay Otahal, Craig D. Ulmer, Patrick Widener, Ron A. Oldfield, (2020). Data Services for Visualization and Analysis ? ASC Level II Milestone (7186) https://www.osti.gov/search/identifier:1663267 Document ID: 1196150

Ronald B. Brightwell, Kurt Brian Ferreira, Ryan Eric Grant, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Stephen Lecler Olivier, Kevin Pedretti, Andrew J Younge, Ann C. Gentile, Bradley Keith Brandt, (2020). ALAMO: Autonomous Lightweight Allocation, Management and Optimization Smoky Mountains Computational Sciences and Engineering Conference https://www.osti.gov/search/identifier:1818044 Document ID: 1195366

Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Todd Henry Kordenbrock, (2020). The Case for Explicit Reuse Semantics for RDMA Communication Workshop on Scalable Networks for Advanced Computing Systems (SNACS) https://www.osti.gov/search/identifier:1771921 Document ID: 1104627

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2019). Evaluating Tradeoffs Between MPI Message Matching Offload Hardware Capacity and Performance EuroMPI’19 26th European MPI Users’ Group Meeting https://www.osti.gov/search/identifier:1641378 Document ID: 996487

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2019). Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption International European Conference on Parallel and Distributed Computing https://www.osti.gov/search/identifier:1641289 Document ID: 985494

Matthew Dosanjh, Ryan Eric Grant, Nathan (LANL) Hjelmn, Scott Larson Nicoll Levy, William Whitney Schonbein, (2019). The Upcoming Storm: The Implications of Increasing Core Count on Scalable System Software https://www.osti.gov/search/identifier:1762669 Document ID: 984485

Patrick Widener, Craig D. Ulmer, Scott Larson Nicoll Levy, Todd Henry Kordenbrock, Gary J. Templet, (2019). Mediating data center storage diversity in HPC applications with FAODEL HPC I/O in the Data Center Workshop (HPC-IODC) https://www.osti.gov/search/identifier:1640775 Document ID: 973832

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Whit Schonbein, Ryan Eric Grant, Matthew Dosanjh, (2019). Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance Parallel ComputingSystems & Applications https://www.osti.gov/search/identifier:1502976 Document ID: 937350

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Taniya Siddiqua, Nathan DeBardelebe, Vilas Sridharan, Elisabeth Baseman, (2019). Lessons learned from memory errors observed over the lifetime of Cielo SIAM Conference on Computational Science and Engineering (CSE19) https://www.osti.gov/search/identifier:1639464 Document ID: 935561

Kurt Brian Ferreira, Ryan Eric Grant, Michael J. Levenhagen, Scott Larson Nicoll Levy, Taylor Groves, (2019). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design Concurrency and ComputationPractice and Experience https://www.osti.gov/search/identifier:1501630 Document ID: 913436

Stephen Lecler Olivier, Ronald B. Brightwell, Kevin Pedretti, Andrew J Younge, Noah Evans, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Ryan Eric Grant, (2019). SNL ATDM Software Ecosystem 2019 Exascale Computing Project Annual Meeting https://www.osti.gov/search/identifier:1583026 Document ID: 902074

Matthew Tyler Bettencourt, Richard Michael Jack Kramer, Keith Cartwright, Edward Geoffrey Phillips, Curtis C. Ober, Roger P. Pawlowski, Matthew Scot Swan, Irina Kalashnikova Tezaur, Eric T. Phipps, Sidafa Conde, Eric Christopher Cyr, Craig D. Ulmer, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gary J. Templet, Jonathan J. Hu, Paul Lin, Christian Alexander Glusa, Christopher Siefert, Micheal W. Glass, (2018). ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE https://www.osti.gov/search/identifier:1493832 Document ID: 854521

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Nathan Debardeleben, Taniya Siddiqua, Vilas Sridharan, Elisabeth Baseman, (2018). Lessons Learned from Errors Observed over the Lifetime of Cielo Sc18 https://www.osti.gov/search/identifier:1582542 Document ID: 853852

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2018). Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance EuroMPI 2018 https://www.osti.gov/search/identifier:1569677 Document ID: 830451

Scott Larson Nicoll Levy, Kevin Pedretti, Kurt Brian Ferreira, (2018). Open Science on Trinity’s Knights Landing Partition: An Analysis of User Job Data The 14th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS 2018) https://www.osti.gov/search/identifier:1529450 Document ID: 809168

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2018). Characterizing MPI Matching via Trace-based Simulation Parallel Computing https://www.osti.gov/search/identifier:1457519 Document ID: 809042

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2018). Characterizing MPI Matching via Trace-based Simulation Parallel Computing https://www.osti.gov/search/identifier:1444084 Document ID: 807378

Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Patrick Widener, Margaret Rose Lawson, (2018). Faodel: Data Management for Next-Generation Application Workflows ScienceCloud ‘189th Workshop on Scientific Cloud Computing https://www.osti.gov/search/identifier:1514784 Document ID: 797030

Craig D. Ulmer, Todd Henry Kordenbrock, Margaret Rose Lawson, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gregory D. Sjaardema, Gary J. Templet, Harry Lee Ward, Patrick Widener, (2018). SNL ATDM: I/O and Data Management ECP Annual Meeting https://www.osti.gov/search/identifier:1806512 Document ID: 760274

Margaret Rose Lawson, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Todd Henry Kordenbrock, (2017). EMPRESS?Extensible Metadata PRovider for Extreme-scale Scientific Simulations 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems https://www.osti.gov/search/identifier:1513597 Document ID: 727046

Kurt Brian Ferreira, Ryan Eric Grant, Michael J. Levenhagen, Scott Larson Nicoll Levy, Taylor Groves, (2017). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design ExaMPI2017 – Workshop on Exascale MPI 2017 https://www.osti.gov/search/identifier:1511803 Document ID: 726260

Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Patrick Widener, Todd Henry Kordenbrock, Margaret Rose Lawson, (2017). Faodail: Enabling In Situ Analytics for Next-Generation Systems ISAV 2017In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization https://www.osti.gov/search/identifier:1482474 Document ID: 726225

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). The Unexpected Virtue of Almost: Exploiting MPI Collective Operations to Approximately Coordinate Checkpoints ExaMPI2017 – Workshop on Exascale MPI 2017 https://www.osti.gov/search/identifier:1482473 Document ID: 726227

Margaret Rose Lawson, Gerald Fredrick Lofstead, Scott Larson Nicoll Levy, Patrick Widener, Craig D. Ulmer, Shyamali Mukherjee, Gary J. Templet, Todd Henry Kordenbrock, (2017). EMPRESS-Extensible Metadata PRovider for Extreme-scale Scientific Simulations Parallel Data Storage Workshop-Data Intensive Scalable Computing Workshop (PDSW- DISCS?17) https://www.osti.gov/search/identifier:1481718 Document ID: 725717

Rebecca Kreitinger, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis SC17 The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1478158 Document ID: 703829

Rebecca Kreitinger, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis SC17 The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1573776 Document ID: 703831

Craig D. Ulmer, Ron A. Oldfield, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gary J. Templet, Patrick Widener, (2017). ATDM Data Warehouse: Data Management Services for Exascale Computing Sandia CIS ERB https://www.osti.gov/search/identifier:1466487 Document ID: 670434

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2017). Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data 2017 IEEE International Conference on Cluster Computing (CLUSTER) https://www.osti.gov/search/identifier:1463961 Document ID: 659342

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Kevin Pedretti, Ryan Eric Grant, (2017). Characterizing MPI Matching via Trace-based Simulation EuroMPI/USA 2017 https://www.osti.gov/search/identifier:1462518 Document ID: 638253

Taniya (AMD) Siddiqua, Vilas (AMD) Sridharan, Steven E. (AMD) Raasch, Nathan (LANL) DeBardeleben, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Elisabeth (LANL) Baseman, Guan Qiang (LANL), (2017). Lifetime Memory Reliability Data from the Field IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems https://www.osti.gov/search/identifier:1506882 Document ID: 637678

Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2017). It’s not the heat, it’s the humidity: scheduling resilience activity at scale 23rd International European Conference On Parallel And Distributed Computing https://www.osti.gov/search/identifier:1367189 Document ID: 624407

Craig D. Ulmer, Craig D. Ulmer, Todd Henry Kordenbrock, Scott Larson Nicoll Levy, Gerald Fredrick Lofstead, Shyamali Mukherjee, Gregory D. Sjaardema, Gary J. Templet, Patrick Widener, Ron A. Oldfield, (2017). ATDM Data Warehouse Exascale Computing Project Annual Meeting https://www.osti.gov/search/identifier:1427407 Document ID: 577888

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1410251 Document ID: 554663

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar H. Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms The Message Passing Interface (MPI) Users and Developers Conference https://www.osti.gov/search/identifier:1394099 Document ID: 530057

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) https://www.osti.gov/search/identifier:1372148 Document ID: 476343

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar H. Mondragon, (2016). Understanding Performance Interference in Next-Generation HPC Systems The International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1372149 Document ID: 476344

Scott Larson Nicoll Levy, Kurt Brian Ferreira, (2016). An Examination of the Impact of Failure Distribution on Coordinated Checkpoint/Restart Fault Tolerance for HPC at Extreme Scale (FTXS) Workshop https://www.osti.gov/search/identifier:1368866 Document ID: 463972

Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, (2016). Horseshoes and Hand Grenades: The Case for Appoximate Coordination in Local Checkpointing Protocols 9th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids @ EuroPar 2016 https://www.osti.gov/search/identifier:1368839 Document ID: 463931

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick (UNM) Bridges, Oscar (UNM) Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics:Leveraging latent synchronization in MPI collective algorithms EuroMPI 2016The Message Passing Interface (MPI) Users and Developers Conference https://www.osti.gov/search/identifier:1364728 Document ID: 453704

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick Widener, Patrick G Bridges, Oscar Mondragon, (2016). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale Meeting to discuss fault tolerance research with Los Alamos Nat’l Lab staff https://www.osti.gov/search/identifier:1428024 Document ID: 443447

Elisabeth Baseman, Nathan DeBardeleben, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Steven Rassch, Vilas Sridharan, Taniya Siddiqua, Qiang Guan, (2016). Improving DRAM Fault Characterization Through Machine Learning IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) https://www.osti.gov/search/identifier:1365234 Document ID: 432113

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2016). Similarity Engine: Using Content Similarity to Improve Memory Resilience ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) https://www.osti.gov/search/identifier:1239385 Document ID: 387058

Scott Larson Nicoll Levy, (2015). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems https://www.osti.gov/search/identifier:1226922 Document ID: 354819

Oscar H. Mondragon, Patrick G Bridges, Kurt Brian Ferreira, Patrick Widener, Scott Larson Nicoll Levy, (2015). Scheduling In-Situ Analytics in Next-generation Applications 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing https://www.osti.gov/search/identifier:1333466 Document ID: 354815

Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Nathan D. Fabian, (2015). Canaries in a Coal Mine: Using Application-level Checkpoints to Detect Memory Failures 8th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids in connection with Euro-Par 2015 https://www.osti.gov/search/identifier:1256569 Document ID: 286581

Scott Larson Nicoll Levy, Kurt Brian Ferreira, Patrick G Bridges, (2015). Similarity Engine: Using Content Similarity to Improve Memory Resilience International Conference for High Performance Computing, Networking, Storage and Analysis https://www.osti.gov/search/identifier:1530987 Document ID: 264517

Patrick Widener, Scott Larson Nicoll Levy, Kurt Brian Ferreira, Torsten Hoefler, (2014). On noise and the performance benefit of nonblocking collectives International Journal of High Performance Computing Applications https://www.osti.gov/search/identifier:1257977 Document ID: 208093

Kurt Brian Ferreira, Scott Larson Nicoll Levy, Patrick Widener, Dorian Arnold, (2014). Using Machine Learning to Optimize Uncoordinated Checkpointing Performance ASCR Machine Learning Workshop https://www.osti.gov/search/identifier:1319751 Document ID: 187255

Kirk A. Rackow, Scott Larson Nicoll Levy, (2014). Exploring the effect of noise on the performance benefit of non-blocking MPI_Allreduce EuroMPI/Asia 2014 https://www.osti.gov/search/identifier:1145671 Document ID: 5336447

Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick Widener, Dorian Arnold, Torsten Hoefler, (2014). Understanding the Effects of Communication and Coordination on Checkpointing at Scale International Conference for high performance computing, networking, storage, and analysis https://www.osti.gov/search/identifier:1142798 Document ID: 5335351

Kirk A. Rackow, Patrick G. Bridges, Scott Larson Nicoll Levy, (2014). Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach The 43rd International Conference on Parallel Processing (ICPP-2014) https://www.osti.gov/search/identifier:1141101 Document ID: 5333630

Kirk A. Rackow, Patrick Widener, Scott Larson Nicoll Levy, Dorian Arnold, Torsten Hoefler, (2014). Understanding the Effects of Communication on Uncoordinated Checkpointing at Scale ACM International Conference on Supercomputing (ICS 2014) https://www.osti.gov/search/identifier:1140761 Document ID: 5331977

Kirk A. Rackow, Scott Larson Nicoll Levy, Bryan Topp, Dorian Arnold, Torsten Hoefler, (2013). Predicting Coordinated and Uncoordinated Checkpoint/Restart Protocol Performance at Extreme Scales 28th IEEE International Parallel & Distributed Processing Symposium https://www.osti.gov/search/identifier:1115087 Document ID: 5329457

Kirk A. Rackow, Scott Larson Nicoll Levy, Ronald B. Brightwell, Dorian Arnold, Patrick Bridges, (2013). A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration DOE/ASCR A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration https://www.osti.gov/search/identifier:1111081 Document ID: 5324016

Patrick Widener, Kurt Brian Ferreira, Scott Larson Nicoll Levy, Ronald B. Brightwell, Patrick G. Bridges, Dorian Arnold, (2013). Asking the right questions: benchmarking fault-tolerant extreme-scale systems Workshop on Resiliency in High-Performance Computing https://www.osti.gov/search/identifier:1083655 Document ID: 5323574

Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick G. Bridges, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale https://www.osti.gov/search/identifier:1088091 Document ID: 5321449

Kirk A. Rackow, Scott Larson Nicoll Levy, Patrick Bridges, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale EuroMPI 2013 — The 20th European MPI Users’ Group Meeting https://www.osti.gov/search/identifier:1078709 Document ID: 5321390

Kirk A. Rackow, Kevin Pedretti, Dorian Arnold, Scott Larson Nicoll Levy, Patrick Bridges, (2013). Protect Yourself: Why Your OS Must Protect Against DRAM Failures International Conference on Architectural Support for Programming Languages and OSs https://www.osti.gov/search/identifier:1062878 Document ID: 5318212

Kirk A. Rackow, Aidan P. Thompson, Christian Robert Trott, Patrick G. Bridges, Scott Larson Nicoll Levy, (2013). An Examination of Content Similarity within the Memory of HPC Applications https://www.osti.gov/search/identifier:1088105 Document ID: 5317155

Showing Results. Show More Publications