Publications

Results 1–25 of 117

Search results

Jump to search filters

Measuring Reproduciblity of Machine Learning Methods for Medical Diagnosis

Proceedings - 2022 4th International Conference on Transdisciplinary AI, TransAI 2022

Ahmed, Hana A.; Tchoua, Roselyne; Lofstead, Gerald F.

The National Academy of Sciences, Engineering, and Medicine (NASEM) defines reproducibility as 'obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis,' and replicability as 'obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data' [1]. Due to an increasing number of applications of artificial intelligence and machine learning (AI/ML) to fields such as healthcare and digital medicine, there is a growing need for verifiable AI/ML results, and therefore reproducible research and replicable experiments. This paper establishes examples of irreproducible AI/ML applications to medical sciences and quantifies the variance of common AI/ML models (Artificial Neural Network, Naive Bayes classifier, and Random Forest classifiers) for tasks on medical data sets.

More Details

Augmenting Singularity to Generate Fine-grained Workflows, Record Trails, and Data Provenance

Proceedings - 2022 IEEE 18th International Conference on e-Science, eScience 2022

Kennedy, Dominic; Olaya, Paula; Lofstead, Gerald F.; Vargas, Rodrigo; Taufer, Michela

The use of containerization technology in high performance computing (HPC) workflows has substantially increased recently because it makes workflows much easier to develop and deploy. Although many HPC workflows include multiple data and multiple applications, they have traditionally all been bundled together into one monolithic container. This hinders the ability to trace the thread of execution, thus preventing scientists from establishing data provenance, or having workflow reproducibility. To provide a solution to this problem we extend the functionality of a popular HPC container runtime, Singularity. We implement both the ability to compose fine-grained containerized workflows and execute these workflows within the Singularity runtime with automatic metadata collection. Specifically, the new functionality collects a record trail of execution and creates data provenance. The use of our augmented Singularity is demonstrated with an earth science workflow, SOMOSPIE. The workflow is composed via our augmented Singularity which creates fine-grained containers and collects the metadata to trace, explain, and reproduce the prediction of soil moisture at a fine resolution.

More Details

Interpreting write performance of supercomputer I/O systems with regression models

Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021

Xie, Bing; Tan, Zilong; Carns, Philip; Chase, Jeff; Harms, Kevin; Lofstead, Gerald F.; Oral, Sarp; Vazhkudai, Sudharshan S.; Wang, Feiyi

This work seeks to advance the state of the art in HPC I/O performance analysis and interpretation. In particular, we demonstrate effective techniques to: (1) model output performance in the presence of I/O interference from production loads; (2) build features from write patterns and key parameters of the system architecture and configurations; (3) employ suitable machine learning algorithms to improve model accuracy. We train models with five popular regression algorithms and conduct experiments on two distinct production HPC platforms. We find that the lasso and random forest models predict output performance with high accuracy on both of the target systems. We also explore use of the models to guide adaptation in I/O middleware systems, and show potential for improvements of at least 15% from model-guided adaptation on 70% of samples, and improvements up to 10 × on some samples for both of the target systems.

More Details

ALAMO: Autonomous lightweight allocation, management, and optimization

Communications in Computer and Information Science

Brightwell, Ronald B.; Ferreira, Kurt B.; Grant, Ryan E.; Levy, Scott L.; Lofstead, Gerald F.; Olivier, Stephen L.; Laros, James H.; Younge, Andrew J.; Gentile, Ann C.; Laros, James H.

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

SCTuner: An Autotuner Addressing Dynamic I/O Needs on Supercomputer I/O Subsystems

Proceedings of PDSW 2021: IEEE/ACM 6th International Parallel Data Systems Workshop, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis

Tang, Houjun; Xie, Bing; Byna, Suren; Carns, Philip; Koziol, Quincey; Kannan, Sudarsun; Lofstead, Gerald F.; Oral, Sarp

In high-performance computing (HPC), scientific applications often manage a massive amount of data using I/O libraries. These libraries provide convenient data model abstractions, help ensure data portability, and, most important, empower end users to improve I/O performance by tuning configurations across multiple layers of the HPC I/O stack. We propose SCTuner, an autotuner integrated within the I/O library itself to dynamically tune both the I/O library and the underlying I/O stack at application runtime. To this end, we introduce a statistical benchmarking method to profile the behaviors of individual supercomputer I/O subsystems with varied configurations across I/O layers. We use the benchmarking results as the built-in knowledge in SCTuner, implement an I/O pattern extractor, and plan to implement an online performance tuner as the SCTuner runtime. We conducted a benchmarking analysis on the Summit supercomputer and its GPFS file system Alpine. The preliminary results show that our method can effectively extract the consistent I/O behaviors of the target system under production load, building the base for I/O autotuning at application runtime.

More Details

PMEMCPY: A simple, lightweight, and portable I/O library for storing data in persistent memory

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Logan, Luke; Lofstead, Gerald F.; Levy, Scott L.; Widener, Patrick W.; Sun, Xian H.; Kougkas, Anthony

Persistent memory (PMEM) devices can achieve comparable performance to DRAM while providing significantly more capacity. This has made the technology compelling as an expansion to main memory. Rethinking PMEM as storage devices can offer a high performance buffering layer for HPC applications to temporarily, but safely store data. However, modern parallel I/O libraries, such as HDF5 and pNetCDF, are complicated and introduce significant software and metadata overheads when persisting data to these storage devices, wasting much of their potential. In this work, we explore the potential of PMEM as storage through pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. We demonstrate that our approach is up to 2x faster than other popular parallel I/O libraries under real workloads.

More Details

User-Centric System Fault Identification Using IO500 Benchmark

Proceedings of PDSW 2021: IEEE/ACM 6th International Parallel Data Systems Workshop, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis

Liem, Radita; Povaliaiev, Dmytro; Lofstead, Gerald F.; Kunkel, Julian; Terboven, Christian

I/O performance in a multi-user environment is difficult to predict. Users do not know what I/O performance to expect when running and tuning applications. We propose to use the IO500 benchmark as a way to guide user expectations on their application's performance and to aid identifying root causes of their I/O problems that might come from the system. Our experiments describe how we manage user expectation with IO500 and provide a mechanism for system fault identification. This work also provides us with information of the tail latency problem that needs to be addressed and granular information about the impact of I/O technique choices (POSIX and MPI-IO).

More Details
Results 1–25 of 117
Results 1–25 of 117