Scott Larson Nicoll Levy

Scalable System Software

Author profile picture

Scalable System Software

sllevy@sandia.gov

(505) 844-7292

Sandia National Laboratories, New Mexico
P.O. Box 5800
Albuquerque, NM 87185-1319

Biography

I am a Senior Member of Technical Staff in the Scalable System Software department of the Center for Computing Research (CCR). I research system software for next-generation extreme-scale systems. Specifically, I study the impact of system failures, and other sources of performance interference, on the execution of scientific simulations. I am also investigating application performance in power-constrained environments. I earned my Ph.D. from the University of New Mexico, where I worked with Prof. Patrick Bridges in the Scalable Systems Lab. At Sandia, I work with Kurt Ferreira, Patrick Widener and the 9lives research group on improving the resilience and fault tolerance of large-scale parallel systems.

Education

  • Ph.D., Computer Science, University of New Mexico
  • B.S., Electrical Engineering, Cornell University

Publications

  • Schonbein, W., Levy, S.L.N., Grant, R., Temucin, Y., & Temucin, Y. (2023). A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand Verbs [Conference Proceeding]. https://www.osti.gov/biblio/2431172 Publication ID: 128064
  • Marts, W.P., Dosanjh, M.G.F., Schonbein, W., Levy, S.L.N., Bridges, P.G., & Bridges, P.G. (2023). Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery [Conference Proceeding]. ACM International Conference Proceeding Series. https://doi.org/10.1145/3605731.3605884 Publication ID: 131420
  • Schonbein, W., Levy, S.L.N., Dosanjh, M.G.F., Marts, W.P., Reid, E., Grant, R.E., & Grant, R.E. (2023). Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained Communication [Conference Presenation]. ACM International Conference Proceeding Series. https://doi.org/10.2172/2430419 Publication ID: 125440
  • Olivier, S.L., Brightwell, R.B., Dosanjh, M.G.F., Ferreira, K., Levy, S.L.N., Bachman, W.B., Younge, A.J., & Younge, A.J. (2022). SNL ATDM Software Ecosystem Then and Now: Operating Systems and On-Node Runtime [Presentation]. https://www.osti.gov/biblio/2006330 Publication ID: 122008
  • Ferreira, K., Levy, S.L.N., Hemmert, J., Bachman, W.B., & Bachman, W.B. (2022). Understanding Memory Failures on a Petascale Arm System [Conference Paper]. HPDC 2022 – Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing. https://doi.org/10.1145/3502181.3531465 Publication ID: 112016
  • Olivier, S.L., Brightwell, R.B., Dosanjh, M.G.F., Ferreira, K., Levy, S.L.N., Bachman, W.B., Younge, A.J., & Younge, A.J. (2022). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime [Presentation]. https://www.osti.gov/biblio/2002316 Publication ID: 110212
  • Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2022). Characterizing Failures in HPC Using Benford?s Law [Conference Presenation]. https://doi.org/10.2172/2001912 Publication ID: 108664
  • Karamati, S., Hughes, C., Hemmert, K.S., Grant, R.E., Schonbein, W., Levy, S.L.N., Conte, T.M., Young, J., Buduc, R.W., & Buduc, R.W. (2022). ‘Smarter’ NICs for faster molecular dynamics: a case study [Conference Proceeding]. Proceedings – 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022. https://doi.org/10.1109/IPDPS53621.2022.00063 Publication ID: 108652
  • Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2022). Characterizing Memory Failures Using Benford’s Law [Conference Paper]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85133026898&origin=inward Publication ID: 75682
  • Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2021). Evaluating MPI resource usage summary statistics. Parallel Computing, 108. https://doi.org/10.1016/j.parco.2021.102825 Publication ID: 75299
  • Haskins, K., Bridges, ., Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications [Conference Paper]. https://www.osti.gov/biblio/1899492 Publication ID: 76415
  • Haskins, K., Bridges, P., Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications [Conference Paper]. https://www.osti.gov/biblio/1899493 Publication ID: 76416
  • Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2021). Characterizing Per-node Memory Failures Using Benford?s Law [Conference Paper]. https://www.osti.gov/biblio/1886179 Publication ID: 75504
  • Marts, W.P., Dosanjh, M.G.F., Schonbein, W., Levy, S.L.N., Grant, R., Bridges, P., & Bridges, P. (2021). MiniMod: A Modular Miniapplication Benchmarking Framework for HPC [Conference Paper]. https://doi.org/10.1109/Cluster48925.2021.00028 Publication ID: 79517
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2021). An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation [Conference Presenation]. https://doi.org/10.2172/1869756 Publication ID: 78565
  • Olivier, S.L., Brightwell, R.B., Ferreira, K., Grant, R., Levy, S.L.N., Bachman, W.B., Younge, A.J., & Younge, A.J. (2021). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime [Presentation]. https://www.osti.gov/biblio/1861479 Publication ID: 77902
  • Grant, R., Levy, S.L.N., Schonbein, W., & Schonbein, W. (2021). Co-design of System Software for Compute Accelerators and SmartNICs [Conference Paper]. https://www.osti.gov/biblio/1847622 Publication ID: 77227
  • Logan, L.M., Lofstead, G.F., Levy, S.L.N., Widener, P., Sun, X.H., Kougkas, A., & Kougkas, A. (2021). PMEMCPY: A simple, lightweight, and portable I/O library for storing data in persistent memory [Conference Paper]. Proceedings – IEEE International Conference on Cluster Computing, ICCC. https://doi.org/10.1109/Cluster48925.2021.00098 Publication ID: 79209
  • Ferreira, K., Levy, S.L.N., Kuhns, V., Debardeleben, N., Blanchard, S., & Blanchard, S. (2021). Understanding the Effects of DRAM Correctable Error Logging at Scale [Conference Paper]. Proceedings – IEEE International Conference on Cluster Computing, ICCC. https://doi.org/10.1109/Cluster48925.2021.00060 Publication ID: 79606
  • Brightwell, R.B., Ferreira, K., Grant, R., Levy, S.L.N., Lofstead, G.F., Olivier, S.L., Bachman, W.B., Younge, A.J., Gentile, A.C., Bachman, W.B., & Bachman, W.B. (2021). ALAMO: Autonomous lightweight allocation, management, and optimization [Conference Poster]. Communications in Computer and Information Science. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85107303666&origin=inward Publication ID: 74680
  • Schonbein, W., Levy, S.L.N., Marts, W.P., Dosanjh, M.G.F., Grant, R., & Grant, R. (2020). Low-cost MPI Multithreaded Message Matching Benchmarking [Conference Paper]. Proceedings – 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020. https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00022 Publication ID: 71718
  • Grant, R., Schonbein, W., Levy, S.L.N., & Levy, S.L.N. (2020). Radd runtimes: Radical and different distributed runtimes with smartnics [Conference Paper]. Proceedings of IPDRM 2020: 4th Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85100923740&origin=inward Publication ID: 71234
  • Schonbein, W., Grant, R., Levy, S.L.N., Dosanjh, M.G.F., Marts, W.P., & Marts, W.P. (2020). Low-cost MPI Multithreaded Message Matching Benchmarking [Conference Presenation]. https://doi.org/10.2172/1882338 Publication ID: 72015
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2020). Evaluating MPI Message Size Summary Statistics [Conference Proceeding]. https://www.osti.gov/biblio/1825984 Publication ID: 71238
  • Grant, R., Schonbein, W., Levy, S.L.N., & Levy, S.L.N. (2020). RaDD Runtimes:Radical and Different Distributed Runtimes with SmartNICs [Conference Presenation]. https://doi.org/10.2172/1825980 Publication ID: 71233
  • Templet, G.J., Glickman, M.R., Kordenbrock, T., Levy, S.L.N., Lofstead, G.F., Mauldin, J., Otahal, T.J., Ulmer, C.D., Widener, P., Oldfield, R., & Oldfield, R. (2020). FY20 CSSE L2 Milestone 7186 [Presentation]. https://www.osti.gov/biblio/1820290 Publication ID: 74812
  • Templet, G.J., Glickman, M.R., Kordenbrock, T., Levy, S.L.N., Lofstead, G.F., Mauldin, J., Otahal, T.J., Ulmer, C.D., Widener, P., Oldfield, R., & Oldfield, R. (2020). Data Services for Visualization and Analysis – ASC Level II Milestone (7186). https://doi.org/10.2172/1663267 Publication ID: 99700
  • Levy, S.L.N., Widener, P., Ulmer, C.D., Kordenbrock, T., & Kordenbrock, T. (2020). The case for explicit reuse semantics for RDMA communication [Conference Poster]. Proceedings – 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020. https://doi.org/10.1109/IPDPSW50202.2020.00148 Publication ID: 73080
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2020). Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption [Conference Poster]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85086273467&origin=inward Publication ID: 69979
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2019). Evaluating tradeoffs between MPI message matching offload hardware capacity and performance [Conference Poster]. ACM International Conference Proceeding Series. https://doi.org/10.1145/3343211.3343223 Publication ID: 70063
  • Levy, S.L.N., Ferreira, K., Schonbein, W., Grant, R., Dosanjh, M.G.F., & Dosanjh, M.G.F. (2019). Using simulation to examine the effect of MPI message matching costs on application performance. Parallel Computing, 84, pp. 63-74. https://doi.org/10.1016/j.parco.2019.02.008 Publication ID: 67578
  • Levy, S.L.N., Ferreira, K., Siddiqua, T., Debardelebe, N., Sridharan, V., Baseman, E., & Baseman, E. (2019). Lessons learned from memory errors observed over the lifetime of Cielo [Conference Poster]. https://doi.org/10.1109/SC.2018.00046 Publication ID: 67575
  • Ferreira, K., Grant, R., Levenhagen, M., Levy, S.L.N., Groves, T., & Groves, T. (2019). Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching. Concurrency and Computation. Practice and Experience, 32(3). https://doi.org/10.1002/cpe.5150 Publication ID: 64546
  • Dosanjh, M.G.F., Grant, R., Hjelm, N., Levy, S.L.N., Schonbein, W., & Schonbein, W. (2019). The upcoming storm: The implications of increasing core count on scalable system software. Advances in Parallel Computing. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85112602289&origin=inward Publication ID: 69426
  • Widener, P., Ulmer, C.D., Levy, S.L.N., Kordenbrock, T., Templet, G.J., & Templet, G.J. (2019). Mediating Data Center Storage Diversity in HPC Applications with FAODEL [Conference Poster]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-34356-9_22 Publication ID: 69148
  • Olivier, S.L., Brightwell, R.B., Bachman, W.B., Younge, A.J., Evans, N., Levy, S.L.N., Ferreira, K., Grant, R., & Grant, R. (2019). SNL ATDM Software Ecosystem [Presentation]. https://www.osti.gov/biblio/1583026 Publication ID: 64200
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2018). Using simulation to examine the effect of MPI message matching costs on application performance [Conference Poster]. ACM International Conference Proceeding Series. https://doi.org/10.1145/3236367.3236375 Publication ID: 63034
  • Bettencourt, M.T., Kramer, R.M.J., Cartwright, K.L., Phillips, E., Ober, C.C., Pawlowski, R., Swan, M.S., Tezaur, I.K., Phipps, E.T., Conde, S., Cyr, E.C., Ulmer, C.D., Kordenbrock, T., Levy, S.L.N., Templet, G.J., Hu, J.J., Lin, P.T., Glusa, C., Siefert, C., Glass, M.W., & Glass, M.W. (2018). ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. https://doi.org/10.2172/1493832 Publication ID: 58868
  • Ferreira, K., Levy, S.L.N., Bachman, W.B., Grant, R., & Grant, R. (2018). Characterizing MPI matching via trace-based simulation [Conference Poster]. Parallel Computing. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85048343916&origin=inward Publication ID: 57396
  • Levy, S.L.N., Bachman, W.B., Ferreira, K., & Ferreira, K. (2018). Open science on Trinity’s knights landing partition: An analysis of user job data [Conference Poster]. ACM International Conference Proceeding Series. https://doi.org/10.1145/3229710.3229753 Publication ID: 62662
  • Levy, S.L.N., Ferreira, K., Debardeleben, N., Siddiqua, T., Sridharan, V., Baseman, E., & Baseman, E. (2018). Lessons Learned from Errors Observed over the Lifetime of Cielo [Conference Poster]. https://doi.org/10.1109/SC.2018.00046 Publication ID: 63939
  • Ulmer, C.D., Mukherjee, S., Templet, G.J., Levy, S.L.N., Lofstead, G.F., Widener, P., Lawson, M., & Lawson, M. (2018). Faodel: Data Management for Next-Generation Application Workflows [Conference Poster]. https://doi.org/10.1145/3217880.3217888 Publication ID: 62079
  • Ulmer, C.D., Kordenbrock, T., Lawson, M., Levy, S.L.N., Lofstead, G.F., Mukherjee, S., Sjaardema, G.D., Templet, G.J., Ward, H.L., Widener, P., & Widener, P. (2018). SNL ATDM: I/O and Data Management [Presentation]. https://www.osti.gov/biblio/1806512 Publication ID: 59268
  • Widener, P., Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2018). It’s not the heat, it’s the humidity: Scheduling resilience activity at scale [Conference Poster]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85042475218&origin=inward Publication ID: 56360
  • Lawson, M., Lofstead, G.F., Levy, S.L.N., Widener, P., Ulmer, C.D., Mukherjee, S., Templet, G.J., Kordenbrock, T., & Kordenbrock, T. (2017). EMPRESS-Extensible metadata provider for extreme-scale scientific simulations [Conference Poster]. Proceedings of PDSW-DISCS 2017 – 2nd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems – Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis. https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85045872946&origin=inward Publication ID: 54054
  • Levy, S.L.N., Ferreira, K., Widener, P., & Widener, P. (2017). The Unexpected Virtue of Almost: Exploiting MPI Collective Operations to Approximately Coordinate Checkpoints [Conference Poster]. https://doi.org/10.1002/cpe.4890 Publication ID: 54218
  • Ferreira, K., Grant, R., Levenhagen, M., Levy, S.L.N., Groves, T., & Groves, T. (2017). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design [Conference Poster]. https://doi.org/10.1002/cpe.5150 Publication ID: 54225
  • Lawson, M., Lofstead, G.F., Levy, S.L.N., Widener, P., Ulmer, C.D., Mukherjee, S., Templet, G.J., Kordenbrock, T., & Kordenbrock, T. (2017). EMPRESS?Extensible Metadata PRovider for Extreme-scale Scientific Simulations [Conference Poster]. https://www.osti.gov/biblio/1513597 Publication ID: 54449
  • Ulmer, C.D., Mukherjee, S., Templet, G.J., Levy, S.L.N., Lofstead, G.F., Widener, P., Kordenbrock, T., Lawson, M., & Lawson, M. (2017). Faodail: Enabling In Situ Analytics for Next-Generation Systems [Conference Poster]. https://www.osti.gov/biblio/1482474 Publication ID: 54217
  • Kreitinger, R., Levy, S.L.N., Ferreira, K., Widener, P., & Widener, P. (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis [Conference Poster]. https://www.osti.gov/biblio/1478158 Publication ID: 53562
  • Kreitinger, R., Levy, S.L.N., Ferreira, K., Widener, P., & Widener, P. (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis [Conference Poster]. https://www.osti.gov/biblio/1573776 Publication ID: 53563
  • Ferreira, K., Levy, S.L.N., Bachman, W.B., Grant, R., & Grant, R. (2017). Characterizing MPI matching via trace-based simulation. ACM International Conference Proceeding Series, 2017, pp. 1-45. https://doi.org/10.1145/3127024.3127040 Publication ID: 98292
  • Levy, S.L.N., Ferreira, K., Bridges, P.G., & Bridges, P.G. (2017). Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data [Conference Poster]. Proceedings – IEEE International Conference on Cluster Computing, ICCC. https://doi.org/10.1109/CLUSTER.2017.99 Publication ID: 57799
  • Ulmer, C.D., Oldfield, R., Kordenbrock, T., Levy, S.L.N., Lofstead, G.F., Mukherjee, S., Templet, G.J., Widener, P., & Widener, P. (2017). ATDM Data Warehouse: Data Management Services for Exascale Computing [Presentation]. https://www.osti.gov/biblio/1466487 Publication ID: 58113
  • Siddiqua, T., Sridharan, V., Raasch, S.E., Debardeleben, N., Ferreira, K., Levy, S.L.N., Baseman, E., Guan, Q., & Guan, Q. (2017). Lifetime memory reliability data from the field [Conference Poster]. 2017 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017. https://doi.org/10.1109/DFT.2017.8244428 Publication ID: 57295
  • Ulmer, C.D., Kordenbrock, T., Levy, S.L.N., Lofstead, G.F., Mukherjee, S., Sjaardema, G.D., Templet, G.J., Widener, P., Oldfield, R., & Oldfield, R. (2017). ATDM Data Warehouse [Conference Poster]. https://www.osti.gov/biblio/1427407 Publication ID: 53054
  • Widener, P., Ferreira, K., Levy, S.L.N., & Levy, S.L.N. (2017). Horseshoes and hand grenades: The case for approximate coordination in local checkpointing protocols [Conference Poster]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-319-58943-5_50 Publication ID: 50229
  • Levy, S.L.N., Ferreira, K., Bridges, P.G., & Bridges, P.G. (2016). Improving Application Resilience to Memory Errors with Lightweight Compression [Conference Poster]. https://doi.org/10.1109/SC.2016.27 Publication ID: 47905
  • Levy, S.L.N., Ferreira, K., Widener, P., Bridges, P.G., Mondragon, O.H., & Mondragon, O.H. (2016). How I learned to stop worrying and love in situ analytics: Leveraging latent synchronization in MPI collective algorithms [Conference Poster]. ACM International Conference Proceeding Series. https://doi.org/10.1145/2966884.2966920 Publication ID: 52299
  • Baseman, E., Debardeleben, N., Ferreira, K., Levy, S.L.N., Raasch, S., Sridharan, V., Siddiqua, T., Guan, Q., & Guan, Q. (2016). Improving DRAM Fault Characterization through Machine Learning [Conference Poster]. Proceedings – 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016. https://doi.org/10.1109/DSN-W.2016.13 Publication ID: 49553
  • Levy, S.L.N., Ferreira, K., Bridges, P.G., & Bridges, P.G. (2016). Improving Application Resilience to Memory Errors with Lightweight Compression [Conference Poster]. International Conference for High Performance Computing, Networking, Storage and Analysis, SC. https://doi.org/10.1109/SC.2016.27 Publication ID: 51067
  • Levy, S.L.N., Ferreira, K., Widener, P., Bridges, P.G., Mondragon, O.H., & Mondragon, O.H. (2016). Understanding Performance Interference in Next-Generation HPC Systems [Conference Poster]. https://www.osti.gov/biblio/1372149 Publication ID: 51068
  • Levy, S.L.N., Ferreira, K., & Ferreira, K. (2016). An examination of the impact of failure distribution on coordinated checkpoint/restart [Conference Poster]. FTXS 2016 – Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale. https://doi.org/10.1145/2909428.2909430 Publication ID: 50259
  • Levy, S.L.N. (2016). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems. https://www.osti.gov/biblio/1226922 Publication ID: 41675
  • Levy, S.L.N., Ferreira, K., Widener, P., Bridges, P., Mondragon, O., & Mondragon, O. (2016). How I Learned to Stop Worrying and Love In Situ Analytics:Leveraging latent synchronization in MPI collective algorithms [Conference Poster]. https://www.osti.gov/biblio/1364728 Publication ID: 50139
  • Levy, S.L.N., Ferreira, K., Widener, P., Bridges, P.G., Mondragon, O., & Mondragon, O. (2016). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale [Presentation]. https://doi.org/10.1007/978-3-319-10214-6_5 Publication ID: 50027
  • Widener, P., Levy, S.L.N., Ferreira, K., Hoefler, T., & Hoefler, T. (2016). On noise and the performance benefit of nonblocking collectives. International Journal of High Performance Computing Applications, 30(1), pp. 121-133. https://doi.org/10.1177/1094342015611952 Publication ID: 39411
  • Levy, S.L.N., Ferreira, K., Bridges, P.G., & Bridges, P.G. (2016). Similarity Engine: Using Content Similarity to Improve Memory Resilience [Conference Poster]. https://www.osti.gov/biblio/1239385 Publication ID: 46804
  • Mondragon, O.H., Bridges, P.G., Ferreira, K., Widener, P., Levy, S.L.N., & Levy, S.L.N. (2015). Scheduling In-Situ Analytics in Next-generation Applications [Conference Poster]. https://www.osti.gov/biblio/1333466 Publication ID: 41676
  • Levy, S.L.N., Ferreira, K., Bridges, P.G., & Bridges, P.G. (2015). Similarity Engine: Using Content Similarity to Improve Memory Resilience [Conference Poster]. https://www.osti.gov/biblio/1530987 Publication ID: 43098
  • Widener, P., Ferreira, K., Levy, S.L.N., Fabian, N., & Fabian, N. (2015). Canaries in a coal mine: Using application-level checkpoints to detect memory failures [Conference Poster]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84952050378&origin=inward Publication ID: 43835
  • Ferreira, K., Levy, S.L.N., Widener, P., Arnold, D., & Arnold, D. (2014). Using Machine Learning to Optimize Uncoordinated Checkpointing Performance [Conference Poster]. https://www.osti.gov/biblio/1319751 Publication ID: 39111
  • Ferreira, K.B., Levy, S.L.N., & Levy, S.L.N. (2014). Exploring the effect of noise on the performance benefit of non-blocking MPI_Allreduce [Conference]. https://www.osti.gov/biblio/1145671 Publication ID: 40799
  • Ferreira, K.B., Levy, S.L.N., Widener, P., & Widener, P. (2014). Understanding the Effects of Communication and Coordination on Checkpointing at Scale [Conference]. https://doi.org/10.1109/SC.2014.77 Publication ID: 40506
  • Ferreira, K.B., Levy, S.L.N., & Levy, S.L.N. (2014). Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach [Conference]. https://www.osti.gov/biblio/1141101 Publication ID: 38842
  • Levy, S.L.N., Ferreira, K.B., Widener, P., & Widener, P. (2014). Using simulation to evaluate the performance of resilience strategies and process failures. https://doi.org/10.2172/1204092 Publication ID: 36991
  • Ferreira, K.B., Widener, P., Levy, S.L.N., & Levy, S.L.N. (2014). Understanding the Effects of Communication on Uncoordinated Checkpointing at Scale [Conference]. https://www.osti.gov/biblio/1140761 Publication ID: 36856
  • Levy, S.L.N., Ferreira, K.B., & Ferreira, K.B. (2013). Predicting the Impact of Failure Avoidance on Checkpoint/Restart in Extreme-Scale Systems [Conference]. https://www.osti.gov/biblio/1118703 Publication ID: 36607
  • Ferreira, K.B., Levy, S.L.N., & Levy, S.L.N. (2013). Predicting Coordinated and Uncoordinated Checkpoint/Restart Protocol Performance at Extreme Scales [Conference]. https://www.osti.gov/biblio/1115087 Publication ID: 36309
  • Levy, S.L.N., Ferreira, K.B., Widener, P., & Widener, P. (2013). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale [Conference]. https://doi.org/10.1007/978-3-319-10214-6_5 Publication ID: 35680
  • Ferreira, K.B., Levy, S.L.N., Brightwell, R.B., & Brightwell, R.B. (2013). A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration [Conference]. https://www.osti.gov/biblio/1111081 Publication ID: 34214
  • Ferreira, K.B., Levy, S.L.N., & Levy, S.L.N. (2013). A simulation infrastructure for examining the performance of resilience strategies at scale. https://doi.org/10.2172/1088091 Publication ID: 33098
  • Ferreira, K.B., Levy, S.L.N., & Levy, S.L.N. (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale [Conference]. https://www.osti.gov/biblio/1078709 Publication ID: 33050
  • Levy, S.L.N. (2013). Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems [Conference]. https://www.osti.gov/biblio/1064175 Publication ID: 32294
  • Levy, S.L.N., Ferreira, K.B., & Ferreira, K.B. (2013). Using Unreliable Virtual Hardware to Inject Errors in Extreme-Scale Systems [Conference]. https://www.osti.gov/biblio/1063319 Publication ID: 32166
  • Ferreira, K.B., Pedretti, K., Levy, S.L.N., & Levy, S.L.N. (2013). Protect Yourself: Why Your OS Must Protect Against DRAM Failures [Conference]. https://www.osti.gov/biblio/1062878 Publication ID: 31581
  • Ferreira, K.B., Thompson, A.P., Trott, C.R., Levy, S.L.N., & Levy, S.L.N. (2013). An examination of content similarity within the memory of HPC applications. https://doi.org/10.2172/1088105 Publication ID: 31234
Showing 10 of 87 publications.