Sandia National Laboratories is a premier United States national security laboratory which develops science-based technologies in areas such as nuclear deterrence, energy production, and climate change. Computing plays a key role in its diverse missions, and within that environment, Research Software Engineers (RSEs) and other scientific software developers utilize testing automation to ensure quality and maintainability of their work. We conducted a Participatory Action Research study to explore the challenges and strategies for testing automation through the lens of academic literature. Through the experiences collected and comparison with open literature, we identify these challenges in testing automation and then present strategies for mitigation grounded in evidence-based practice and experience reports that other, similar institutions can assess for their automation needs.
A discussion of systems like ChatGPT, what their legal issues may be, how they are affecting society, and what the ethical considerations of their existence and use are.
It is essential to Sandia National Laboratory’s continued success in scientific and technological advances and mission delivery to embrace a hybrid workforce culture under which current and future employees can thrive. This report focuses on the findings of the Hybrid Work Team for the Center for Computing Research, which met weekly from March to June 2023 and conducted a survey across the Center at Sandia. Conclusions in this report are drawn from the 9 authors of this report, which comprises the Hybrid Work Team, and 15 responses to a center-wide survey, as well as numerous conversations with colleagues. A major finding was widespread dissatisfaction with the quantity, execution, and tooling surrounding formal meetings with remote participants. While there was consensus that remote work enables people to produce high quality individual and technical work, there was also consensus that there was widespread social disconnect, with particular concern about hires that were made after the onset of the Covid-19 pandemic. There were many concerns about tooling and policy to facilitate remote collaboration both within Sandia and with its external collaborators. This report includes recommendations for mitigating these problems. For problems for which obvious recommendations cannot be made, ideas of what a successful solution might look like are presented.
Scientific discovery increasingly relies on interoperable, multimodular workflows generating intermediate data. The complexity of managing intermediate data may cause performance losses or unexpected costs. This paper defines an approach to composing these scientific workflows on cloud services, focusing on workflow data orchestration, management, and scalability. We demonstrate the effectiveness of our approach with the SOMOSPIE scientific workflow that deploys machine learning (ML) models to predict high-resolution soil moisture using an HPC service (LSF) and an open-source cloud-native service (K8s) and object storage. Our approach enables scientists to scale from coarse-grained to fine-grained resolution and from a small to a larger region of interest. Using our empirical observations, we generate a cost model for the execution of workflows with hidden intermediate data on cloud services.
The National Academy of Sciences, Engineering, and Medicine (NASEM) defines reproducibility as 'obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis,' and replicability as 'obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data' [1]. Due to an increasing number of applications of artificial intelligence and machine learning (AI/ML) to fields such as healthcare and digital medicine, there is a growing need for verifiable AI/ML results, and therefore reproducible research and replicable experiments. This paper establishes examples of irreproducible AI/ML applications to medical sciences and quantifies the variance of common AI/ML models (Artificial Neural Network, Naive Bayes classifier, and Random Forest classifiers) for tasks on medical data sets.
The use of containerization technology in high performance computing (HPC) workflows has substantially increased recently because it makes workflows much easier to develop and deploy. Although many HPC workflows include multiple data and multiple applications, they have traditionally all been bundled together into one monolithic container. This hinders the ability to trace the thread of execution, thus preventing scientists from establishing data provenance, or having workflow reproducibility. To provide a solution to this problem we extend the functionality of a popular HPC container runtime, Singularity. We implement both the ability to compose fine-grained containerized workflows and execute these workflows within the Singularity runtime with automatic metadata collection. Specifically, the new functionality collects a record trail of execution and creates data provenance. The use of our augmented Singularity is demonstrated with an earth science workflow, SOMOSPIE. The workflow is composed via our augmented Singularity which creates fine-grained containers and collects the metadata to trace, explain, and reproduce the prediction of soil moisture at a fine resolution.
This work seeks to advance the state of the art in HPC I/O performance analysis and interpretation. In particular, we demonstrate effective techniques to: (1) model output performance in the presence of I/O interference from production loads; (2) build features from write patterns and key parameters of the system architecture and configurations; (3) employ suitable machine learning algorithms to improve model accuracy. We train models with five popular regression algorithms and conduct experiments on two distinct production HPC platforms. We find that the lasso and random forest models predict output performance with high accuracy on both of the target systems. We also explore use of the models to guide adaptation in I/O middleware systems, and show potential for improvements of at least 15% from model-guided adaptation on 70% of samples, and improvements up to 10 × on some samples for both of the target systems.