Publications

Results 1001–1025 of 9,998

Search results

Jump to search filters

PMEMCPY: A simple, lightweight, and portable I/O library for storing data in persistent memory

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Logan, Luke; Lofstead, Gerald F.; Levy, Scott L.; Widener, Patrick W.; Sun, Xian H.; Kougkas, Anthony

Persistent memory (PMEM) devices can achieve comparable performance to DRAM while providing significantly more capacity. This has made the technology compelling as an expansion to main memory. Rethinking PMEM as storage devices can offer a high performance buffering layer for HPC applications to temporarily, but safely store data. However, modern parallel I/O libraries, such as HDF5 and pNetCDF, are complicated and introduce significant software and metadata overheads when persisting data to these storage devices, wasting much of their potential. In this work, we explore the potential of PMEM as storage through pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. We demonstrate that our approach is up to 2x faster than other popular parallel I/O libraries under real workloads.

More Details

Exploration of multifidelity UQ sampling strategies for computer network applications

International Journal for Uncertainty Quantification

Geraci, Gianluca G.; Crussell, Jonathan C.; Swiler, Laura P.; Debusschere, Bert D.

Network modeling is a powerful tool to enable rapid analysis of complex systems that can be challenging to study directly using physical testing. Two approaches are considered: emulation and simulation. The former runs real software on virtualized hardware, while the latter mimics the behavior of network components and their interactions in software. Although emulation provides an accurate representation of physical networks, this approach alone cannot guarantee the characterization of the system under realistic operative conditions. Operative conditions for physical networks are often characterized by intrinsic variability (payload size, packet latency, etc.) or a lack of precise knowledge regarding the network configuration (bandwidth, delays, etc.); therefore uncertainty quantification (UQ) strategies should be also employed. UQ strategies require multiple evaluations of the system with a number of evaluation instances that roughly increases with the problem dimensionality, i.e., the number of uncertain parameters. It follows that a typical UQ workflow for network modeling based on emulation can easily become unattainable due to its prohibitive computational cost. In this paper, a multifidelity sampling approach is discussed and applied to network modeling problems. The main idea is to optimally fuse information coming from simulations, which are a low-fidelity version of the emulation problem of interest, in order to decrease the estimator variance. By reducing the estimator variance in a sampling approach it is usually possible to obtain more reliable statistics and therefore a more reliable system characterization. Several network problems of increasing difficulty are presented. For each of them, the performance of the multifidelity estimator is compared with respect to the single fidelity counterpart, namely, Monte Carlo sampling. For all the test problems studied in this work, the multifidelity estimator demonstrated an increased efficiency with respect to MC.

More Details

Solving Inverse Problems for Process-Structure Linkages Using Asynchronous Parallel Bayesian Optimization

Minerals, Metals and Materials Series

Laros, James H.; Wildey, Timothy M.

Process-structure linkage is one of the most important topics in materials science due to the fact that virtually all information related to the materials, including manufacturing processes, lies in the microstructure itself. Therefore, to learn more about the process, one must start by thoroughly examining the microstructure. This gives rise to inverse problems in the context of process-structure linkages, which attempt to identify the processes that were used to manufacturing the given microstructure. In this work, we present an inverse problem for structure-process linkages which we solve using asynchronous parallel Bayesian optimization which exploits parallel computing resources. We demonstrate the effectiveness of the method using kinetic Monte Carlo model for grain growth simulation.

More Details

USING DEEP NEURAL NETWORKS TO PREDICT MATERIAL TYPES IN CONDITIONAL POINT SAMPLING APPLIED TO MARKOVIAN MIXTURE MODELS

Proceedings of the International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering, M and C 2021

Davis, Warren L.; Olson, Aaron J.; Popoola, Gabriel A.; Bolintineanu, Dan S.; Rodgers, Theron R.; Vu, Emily

Conditional Point Sampling (CoPS) is a recently developed stochastic media transport algorithm that has demonstrated a high degree of accuracy in 1-D and 3-D calculations for binary mixtures with Markovian mixing statistics. In theory, CoPS has the capacity to be accurate for material structures beyond just those with Markovian statistics. However, realizing this capability will require development of conditional probability functions (CPFs) that are based, not on explicit Markovian properties, but rather on latent properties extracted from material structures. Here, we describe a first step towards extracting these properties by developing CPFs using deep neural networks (DNNs). Our new approach lays the groundwork for enabling accurate transport on many classes of stochastic media. We train DNNs on ternary stochastic media with Markovian mixing statistics and compare their CPF predictions to those made by existing CoPS CPFs, which are derived based on Markovian mixing properties. We find that the DNN CPF predictions usually outperform the existing approximate CPF predictions, but with wider variance. In addition, even when trained on only one material volume realization, the DNN CPFs are shown to make accurate predictions on other realizations that have the same internal mixing behavior. We show that it is possible to form a useful CoPS CPF by using a DNN to extract correlation properties from realizations of stochastically mixed media, thus establishing a foundation for creating CPFs for mixtures other than those with Markovian mixing, where it may not be possible to derive an accurate analytical CPF.

More Details

Scalable3-BO: Big data meets HPC - A scalable asynchronous parallel high-dimensional Bayesian optimization framework on supercomputers

Proceedings of the ASME Design Engineering Technical Conference

Laros, James H.

Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular optimization methods, its capability is hindered by the size of data, the dimensionality of the considered problem, and the nature of sequential optimization. These scalability issues are intertwined with each other and must be tackled simultaneously. In this work, we propose the Scalable3-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. The Scalable3-BO framework is further leveraged with asynchronous parallelization feature, which fully exploits the computational resource on HPC within a computational budget. As a result, the proposed Scalable3-BO framework is scalable in three independent perspectives: with respect to data size, dimensionality, and computational resource on HPC. The goal of this work is to push the frontiers of BO beyond its well-known scalability issues and minimize the wall-clock waiting time for optimizing high-dimensional computationally expensive applications. We demonstrate the capability of Scalable3-BO with 1 million data points, 10,000-dimensional problems, with 20 concurrent workers in an HPC environment.

More Details

Inelastic equation of state for solids

Computer Methods in Applied Mechanics and Engineering

Sanchez, Jason J.

In this study, a complete inelastic equation of state (IEOS) for solids is developed based on a superposition of thermodynamic energy potentials. The IEOS allows for a tensorial stress state by including an isochoric hyperelastic Helmholtz potential in addition to the zero-kelvin isotherm and lattice vibration energy contributions. Inelasticity is introduced through the nonlinear equations of finite strain plasticity which utilize the temperature dependent Johnson–Cook yield model. Material failure is incorporated into the model by a coupling of the damage history variable to the energy potentials. The numerical evaluation of the IEOS requires a nonlinear solution of stress, temperature and history variables associated with elastic trial states for stress and temperature. The model is implemented into the ALEGRA shock and multi-physics code and the applications presented include single element deformation paths, the Taylor anvil problem and an energetically driven thermo-mechanical problem.

More Details

A framework to evaluate IMEX schemes for atmospheric models

Geoscientific Model Development (Online)

Guba, Oksana G.; Taylor, Mark A.; Bradley, Andrew M.; Bosler, Peter A.; Steyer, Andrew S.

We present a new evaluation framework for implicit and explicit (IMEX) Runge–Kutta time-stepping schemes. The new framework uses a linearized nonhydrostatic system of normal modes. We utilize the framework to investigate the stability of IMEX methods and their dispersion and dissipation of gravity, Rossby, and acoustic waves. We test the new framework on a variety of IMEX schemes and use it to develop and analyze a set of second-order low-storage IMEX Runge–Kutta methods with a high Courant–Friedrichs–Lewy (CFL) number. We show that the new framework is more selective than the 2-D acoustic system previously used in the literature. Schemes that are stable for the 2-D acoustic system are not stable for the system of normal modes.

More Details

A framework to evaluate IMEX schemes for atmospheric models

Geoscientific Model Development

Guba, Oksana G.; Taylor, Mark A.; Bradley, Andrew M.; Bosler, Peter A.; Steyer, Andrew S.

We present a new evaluation framework for implicit and explicit (IMEX) Runge-Kutta time-stepping schemes. The new framework uses a linearized nonhydrostatic system of normal modes. We utilize the framework to investigate the stability of IMEX methods and their dispersion and dissipation of gravity, Rossby, and acoustic waves. We test the new framework on a variety of IMEX schemes and use it to develop and analyze a set of second-order low-storage IMEX Runge-Kutta methods with a high Courant-Friedrichs-Lewy (CFL) number. We show that the new framework is more selective than the 2-D acoustic system previously used in the literature. Schemes that are stable for the 2-D acoustic system are not stable for the system of normal modes.

More Details

Extreme Scale Infrasound Inversion and Prediction for Weather Characterization and Acute Event Detection

van Bloemen Waanders, Bart G.; Ober, Curtis C.

Accurate and timely weather predictions are critical to many aspects of society with a profound impact on our economy, general well-being, and national security. In particular, our ability to forecast severe weather systems is necessary to avoid injuries and fatalities, but also important to minimize infrastructure damage and maximize mitigation strategies. The weather community has developed a range of sophisticated numerical models that are executed at various spatial and temporal scales in an attempt to issue global, regional, and local forecasts in pseudo real time. The accuracy however depends on the time period of the forecast, the nonlinearities of the dynamics, and the target spatial resolution. Significant uncertainties plague these predictions including errors in initial conditions, material properties, data, and model approximations. To address these shortcomings, a continuous data collection occurs at an effort level that is even larger than the modeling process. It has been demonstrated that the accuracy of the predictions depends on the quality of the data and is independent to a certain extent on the sophistication of the numerical models. Data assimilation has become one of the more critical steps in the overall weather prediction business and consequently substantial improvements in the quality of the data would have transformational benefits. This paper describes the use of infrasound inversion technology, enabled through exascale computing, that could potentially achieve orders of magnitude improvement in data quality and therefore transform weather predictions with significant impact on many aspects of our society.

More Details

The Hardware of Smaller Clusters (V.3.0)

Lacy, Susan L.; Brightwell, Ronald B.

Chris Saunders and three technologists are in high demand from Sandia’s deep learning teams, and they’re kept busy by building new clusters of computer nodes for researchers who need the power of supercomputing on a smaller scale. Sandia researchers working on Laboratory Directed Research & Development (LDRD) projects, or innovative ideas for solutions on short timeframes, formulate new ideas on old themes and frequently rely on smaller cluster machines to help solve problems before introducing their code to larger HPC resources. These research teams need an agile hardware and software environment where nascent ideas can be tested and cultivated on a smaller scale.

More Details

Efficient, Predictive Tomography of Multi-Qubit Quantum Processors

Blume-Kohout, Robin J.; Nielsen, Erik N.; Rudinger, Kenneth M.; Sarovar, Mohan S.; Young, Kevin C.

After decades of R&D, quantum computers comprising more than 2 qubits are appearing. If this progress is to continue, the research community requires a capability for precise characterization (“tomography”) of these enlarged devices, which will enable benchmarking, improvement, and finally certification as mission-ready. As world leaders in characterization -- our gate set tomography (GST) method is the current state of the art – the project team is keenly aware that every existing protocol is either (1) catastrophically inefficient for more than 2 qubits, or (2) not rich enough to predict device behavior. GST scales poorly, while the popular randomized benchmarking technique only measures a single aggregated error probability. This project explored a new insight: that the combinatorial explosion plaguing standard GST could be avoided by using an ansatz of few-qubit interactions to build a complete, efficient model for multi-qubit errors. We developed this approach, prototyped it, and tested it on a cutting-edge quantum processor developed by Rigetti Quantum Computing (RQC), a US-based startup. We implemented our new models within Sandia’s PyGSTi open-source code, and tested them experimentally on the RQC device by probing crosstalk. We found two major results: first, our schema worked and is viable for further development; second, while the Rigetti device is indeed a “real” 8-qubit quantum processor, its behavior fluctuated significantly over time while we were experimenting with it and this drift made it difficult to fit our models of crosstalk to the data.

More Details

Review of the Carbon Capture Multidisciplinary Science Center (CCMSC) at the University of Utah (2017)

Hoekstra, Robert J.; Malone, C.M.; Montoya, D.R.; Ferencz, M.R.; Kuhl, A.L.; Wagner, J.

The review was conducted on May 8-9, 2017 at the University of Utah. Overall the review team was impressed with the work presented and found that the CCMSC had met or exceeded the Year 3 milestones. Specific details, comments, and recommendations are included in this document.

More Details

Low-cost MPI Multithreaded Message Matching Benchmarking

Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020

Schonbein, William W.; Levy, Scott L.; Marts, William P.; Dosanjh, Matthew D.; Grant, Ryan E.

The Message Passing Interface (MPI) standard allows user-level threads to concurrently call into an MPI library. While this feature is currently rarely used, there is considerable interest from developers in adopting it in the near future. There is reason to believe that multithreaded communication may incur additional message processing overheads in terms of number of items searched during demultiplexing and amount of time spent searching because it has the potential to increase the number of messages exchanged and to introduce non-deterministic message ordering. Therefore, understanding the implications of adding multithreading to MPI applications is important for future application development.One strategy for advancing this understanding is through 'low-cost' benchmarks that emulate full communication patterns using fewer resources. For example, while a complete, 'real-world' multithreaded halo exchange requires 9 or 27 nodes, the low-cost alternative needs only two, making it deployable on systems where acquiring resources is difficult because of high utilization (e.g., busy capacity-computing systems), or impossible because the necessary resources do not exist (e.g., testbeds with too few nodes). While such benchmarks have been proposed, the reported results have been limited to a single architecture or derived indirectly through simulation, and no attempt has been made to confirm that a low-cost benchmark accurately captures features of full (non-emulated) exchanges. Moreover, benchmark code has not been made publicly available.The purpose of the study presented in this paper is to quantify how accurately the low-cost benchmark captures the matching behavior of the full, real-world benchmark. In the process, we also advocate for the feasibility and utility of the low-cost benchmark. We present a 'real-world' benchmark implementing a full multithreaded halo exchange on 9 and 27 nodes, as defined by 5-point and 9-point 2D stencils, and 7-point and 27-point 3D stencils. Likewise, we present a 'low-cost' benchmark that emulates these communication patterns using only two nodes. We then confirm, across multiple architectures, that the low-cost benchmark gives accurate estimates of both number of items searched during message processing, and time spent processing those messages. Finally, we demonstrate the utility of the low-cost benchmark by using it to profile the performance impact of state-of-The-Art Mellanox ConnectX-5 hardware support for offloaded MPI message demultiplexing. To facilitate further research on the effects of multithreaded MPI on message matching behavior, the source of our two benchmarks is to be included in the next release version of the Sandia MPI Micro-Benchmark Suite.

More Details

Evaluating Trade-offs in Potential Exascale Interconnect Technologies

Hemmert, Karl S.; Bair, Ray; Bhatale, Abhinav; Groves, Taylor; Jain, Nikhil; Lewis, Cannada L.; Mubarak, Misbah; Pakin, Scott D.; Ross, Robert; Wilke, Jeremiah J.

This report details work to study trade-offs in topology and network bandwidth for potential interconnects in the exascale (2021-2022) timeframe. The work was done using multiple interconnect models across two parallel discrete event simulators. Results from each independent simulator are shown and discussed and the areas of agreement and disagreement are explored.

More Details

Detecting and tracking drift in quantum information processors

Nature Communications

Proctor, Timothy J.; Revelle, Melissa R.; Nielsen, Erik N.; Rudinger, Kenneth M.; Lobser, Daniel L.; Maunz, Peter; Blume-Kohout, Robin J.; Young, Kevin C.

If quantum information processors are to fulfill their potential, the diverse errors that affect them must be understood and suppressed. But errors typically fluctuate over time, and the most widely used tools for characterizing them assume static error modes and rates. This mismatch can cause unheralded failures, misidentified error modes, and wasted experimental effort. Here, we demonstrate a spectral analysis technique for resolving time dependence in quantum processors. Our method is fast, simple, and statistically sound. It can be applied to time-series data from any quantum processor experiment. We use data from simulations and trapped-ion qubit experiments to show how our method can resolve time dependence when applied to popular characterization protocols, including randomized benchmarking, gate set tomography, and Ramsey spectroscopy. In the experiments, we detect instability and localize its source, implement drift control techniques to compensate for this instability, and then demonstrate that the instability has been suppressed.

More Details
Results 1001–1025 of 9,998
Results 1001–1025 of 9,998