Publications

5 Results
Skip to search filters

Site Preparation Integration and Operation of Mini-Sierra ART System

Davis, Kevin D.; Gauntt, Nathan E.

ATS platforms are some of the largest, most complex, and most expensive computer systems installed in the United States at just a few major national laboratories. This milestone describes our recent efforts to procure, install, and test a machine called Vortex at Sandia National Laboratories that is compatible with the larger ATS platform Sierra at LLNL. In this milestone, we have 1) configured and procured a machine with similar hardware characteristics as Sierra ATS, 2) installed the machine, verified its physical hardware, and measured its baseline performance, and 3) demonstrated the machine's compatibility with Sierra ATS, and capacity for useful development and testing of Sandia computer codes (such as SPARC), including uses such as nightly regression testing workloads. ACKNOWLEDGEMENTS We would like to acknowledge the special contributions from several organizations and individuals. We would like to thank Mike Glass and Si Hammond for assisting with configuring the procurement within budget and certifying the results. Special thanks to the SNL SPARC code team for executing end-to-end regression tests and providing valuable feedback, including Micah Howard and Sam Browne. Thanks to the SNL SIERRA code team members Rich Drake, and the SNL EMPIRE code team members Matthew Bettencourt for their testing efforts. We thank all our SNL early system user testers from the division 1400 and 1500 code teams for finding and reporting bugs. We would like to thank all the division 9000 staff including HPC management, facilities support staff, and system administration support staff. We would also like to thank IBM vendor staff Juanice Campbell, Jeff Gerhart, Mike Stevens, Sean McCombe, and many others for providing high-quality logistic and technical support. We would like to specially thank Lawrence Livermore National Laboratory and its HPC support staff including Scott Futral, Adam Bertsch, John Gyllenhaal , Py Watson, and Matt Legendre. For accommodating site visits, sharing code, setting up accounts, and supporting our ongoing synchronization efforts, we very much appreciate the significant effort spent to help make this tri- labs partnership a success.

More Details

Standardized Environment for Monitoring Heterogeneous Architectures

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Brown, Connor J.; Schwaller, Benjamin S.; Gauntt, Nathan E.; Allan, Benjamin A.; Davis, Kevin D.

Increasingly diverse architectures and operating systems continue to emerge in the HPC industry. As such, HPC centers are becoming more heterogeneous which introduces a variety of challenges for system administrators. Monitoring a wide array of different platforms by itself is difficult, but the problem compounds in an environment where new platforms are frequently added. Creating a standard monitoring environment across these platforms that allows for simple administration with minimal setup becomes necessary in such situations.This paper presents the solutions introduced in the HPC Development department at Sandia National Laboratories to meet these challenges. This includes our adoption of a multi-stage data-collection pipeline across our clusters that is implemented from the ground up with our Golden Image. We also discuss our infrastructure to support a heterogeneous environment and activities in progress to improve our center. These advances simplify system standup and make monitoring integration easier and faster for new systems which is necessary for our center's domain.

More Details
5 Results
5 Results