Publications

Results 2201–2400 of 9,998

Search results

Jump to search filters

Data-driven material models for atomistic simulation

Physical Review B

Wood, Mitchell A.; Thompson, Aidan P.; Cusentino, Mary A.; Wirth, B.D.

The central approximation made in classical molecular dynamics simulation of materials is the interatomic potential used to calculate the forces on the atoms. Great effort and ingenuity is required to construct viable functional forms and find accurate parametrizations for potentials using traditional approaches. Machine learning has emerged as an effective alternative approach to develop accurate and robust interatomic potentials. Starting with a very general model form, the potential is learned directly from a database of electronic structure calculations and therefore can be viewed as a multiscale link between quantum and classical atomistic simulations. Risk of inaccurate extrapolation exists outside the narrow range of time and length scales where the two methods can be directly compared. In this work, we use the spectral neighbor analysis potential (SNAP) and show how a fit can be produced with minimal interpolation errors which is also robust in extrapolating beyond training. To demonstrate the method, we have developed a tungsten-beryllium potential suitable for the full range of binary compositions. Subsequently, large-scale molecular dynamics simulations were performed of high energy Be atom implantation onto the (001) surface of solid tungsten. The machine learned W-Be potential generates a population of implantation structures consistent with quantum calculations of defect formation energies. A very shallow (<2nm) average Be implantation depth is predicted which may explain ITER diverter degradation in the presence of beryllium.

More Details

Using simulation to examine the effect of MPI message matching costs on application performance

Parallel Computing

Levy, Scott L.; Ferreira, Kurt B.; Schonbein, Whit; Grant, Ryan E.; Dosanjh, Matthew D.

Attaining high performance with MPI applications requires efficient message matching to minimize message processing overheads and the latency these overheads introduce into application communication. In this paper, we use a validated simulation-based approach to examine the relationship between MPI message matching performance and application time-to-solution. Specifically, we examine how the performance of several important HPC workloads is affected by the time required for matching. Our analysis yields several important contributions: (i) the performance of current workloads is unlikely to be significantly affected by MPI matching unless match queue operations get much slower or match queues get much longer; (ii) match queue designs that provide sublinear performance as a function of queue length are unlikely to yield much benefit unless match queue lengths increase dramatically; and (iii) we provide guidance on how long the mean time per match attempt may be without significantly affecting application performance. The results and analysis in this paper provide valuable guidance on the design and development of MPI message match queues.

More Details

Creating stable productive CSE software development and integration processes in unstable environments on the path to exascale

Proceedings - 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science, SE4Science 2019

Bartlett, Roscoe B.; Frye, Joseph R.

The Sandia National Laboratories (SNL) Advanced Technology Development and Mitigation (ATDM) project focuses on R&D for exascale computational science and engineering (CSE) software. Exascale application (APP) codes are co-developed and integrated with a large number of 2^nd generation Trilinos packages built on top of Kokkos for achieving portable performance. These efforts are challenged by needing to develop and test on many unstable and constantly changing pre-exascale platforms using immature compilers and other system software. Challenges, experiences, and lessons learned are presented for creating stable development and integration workflows for these types of difficult projects. In particular, we describe automated workflows, testing, and integration processes as well as new tools and multi-team collaboration processes for effectively keeping a large number of automated builds and tests working on these unstable platforms.

More Details

Position paper: Towards usability as a first-class quality of HPC scientific software

Proceedings - 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science, SE4Science 2019

Milewicz, Reed M.; Rodeghero, Paige

The modern HPC scientific software ecosystem is instrumental to the practice of science. However, software can only fulfill that role if it is readily usable. In this position paper, we discuss usability in the context of scientific software development, how usability engineering can be incorporated into current practice, and how software engineering research can help satisfy that objective.

More Details

Characterizing the roles of contributors in open-source scientific software projects

IEEE International Working Conference on Mining Software Repositories

Milewicz, Reed M.; Pinto, Gustavo; Rodeghero, Paige

The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g. graduate students) also contribute substantially - in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors' behavior, while paving the road for future software engineering research.

More Details

Fuzzy matching: Hardware accelerated MPI communication middleware

Proceedings - 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019

Dosanjh, Matthew D.; Schonbein, William W.; Grant, Ryan E.; Bridges, Patrick G.; Gazimirsaeed, S.M.; Afsahi, Ahmad

Contemporary parallel scientific codes often rely on message passing for inter-process communication. However, inefficient coding practices or multithreading (e.g., via MPI-THREAD-MULTIPLE) can severely stress the underlying message processing infrastructure, resulting in potentially un-acceptable impacts on application performance. In this article, we propose and evaluate a novel method for addressing this issue: 'Fuzzy Matching'. This approach has two components. First, it exploits the fact most server-class CPUs include vector operations to parallelize message matching. Second, based on a survey of point-to-point communication patterns in representative scientific applications, the method further increases parallelization by allowing matches based on 'partial truth', i.e., by identifying probable rather than exact matches. We evaluate the impact of this approach on memory usage and performance on Knight's Landing and Skylake processors. At scale (262,144 Intel Xeon Phi cores), the method shows up to 1.13 GiB of memory savings per node in the MPI library, and improvement in matching time of 95.9%; smaller-scale runs show run-time improvements of up to 31.0% for full applications, and up to 6.1% for optimized proxy applications.

More Details

Lagrangian relaxation based heuristics for a chance-constrained optimization model of a hybrid solar-battery storage system

Optimization Online Repository

Singh, Bismark S.; Knueven, Ben

Here, we develop a stochastic optimization model for scheduling a hybrid solar-battery storage system. Solar power in excess of the promise can be used to charge the battery, while power short of the promise is met by discharging the battery. We ensure reliable operations by using a joint chance constraint. Models with a few hundred scenarios are relatively tractable; for larger models, we demonstrate how a Lagrangian relaxation scheme provides improved results. To further accelerate the Lagrangian scheme, we embed the progressive hedging algorithm within the subgradient iterations of the Lagrangian relaxation. Lastly, we investigate several enhancements of the progressive hedging algorithm, and find bundling of scenarios results in the best bounds.

More Details

Editorial: The World Is Nonlocal

Journal of Peridynamics and Nonlocal Modeling

Silling, Stewart A.

Nonlocal modeling has come a long way. Researchers in the continuum mechanics and computational mechanics communities increasingly recognize that nonlocality is critical in realistic mathematical models of many aspects of the physical world. Physical interaction over a finite distance is fundamental at the atomic and nanoscale level, in which atoms and molecules interact through multibody potentials. Long-range forces partially determine the mechanics of surfaces and the behavior of dissolved molecules and suspended particles in a fluid. Nonlocality is therefore a vital feature of any continuum model that represents these physical systems at small length scales.

More Details

Simultaneous inversion of shear modulus and traction boundary conditions in biomechanical imaging

Inverse Problems in Science and Engineering

Seidl, Daniel T.; van Bloemen Waanders, Bart G.; Wildey, Timothy M.

We present a formulation to simultaneously invert for a heterogeneous shear modulus field and traction boundary conditions in an incompressible linear elastic plane stress model. Our approach utilizes scalable deterministic methods, including adjoint-based sensitivities and quasi-Newton optimization, to reduce the computational requirements for large-scale inversion with partial differential equation (PDE) constraints. Here, we address the use of regularization for such formulations and explore the use of different types of regularization for the shear modulus and boundary traction. We apply this PDE-constrained optimization algorithm to a synthetic dataset to verify the accuracy in the reconstructed parameters, and to experimental data from a tissue-mimicking ultrasound phantom. In all of these examples, we compare inversion results from full-field and sparse data measurements.

More Details

HOMMEXX 1.0: A performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model

Geoscientific Model Development

Bertagna, Luca B.; Deakin, Michael; Guba, Oksana G.; Sunderland, Daniel S.; Bradley, Andrew M.; Kalashnikova, Irina; Taylor, Mark A.; Salinger, Andrew G.

We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single-and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

More Details

Virtually the Same: Comparing Physical and Virtual Testbeds

2019 International Conference on Computing, Networking and Communications, ICNC 2019

Crussell, Jonathan C.; Kroeger, Thomas M.; Brown, Aaron B.; Phillips, Cynthia A.

Network designers, planners, and security professionals increasingly rely on large-scale testbeds based on virtualization to emulate networks and make decisions about real-world deployments. However, there has been limited research on how well these virtual testbeds match their physical counterparts. Specifically, does the virtualization that these testbeds depend on actually capture real-world behaviors sufficiently well to support decisions?As a first step, we perform simple experiments on both physical and virtual testbeds to begin to understand where and how the testbeds differ. We set up a web service on one host and run ApacheBench against this service from a different host, instrumenting each system during these tests.We define an initial repeatable methodology (algorithm) to quantitatively compare physical and virtual testbeds. Specifically we compare the testbeds at three levels of abstraction: application, operating system (OS) and network. For the application level, we use the ApacheBench results. For OS behavior, we compare patterns of system call orderings using Markov chains. This provides a unique visual representation of the workload and OS behavior in our testbeds. We also drill down into read-system-call behaviors and show how at one level both systems are deterministic and identical, but as we move up in abstractions that consistency declines. Finally, we use packet captures to compare network behaviors and performance. We reconstruct flows and compare per-flow and per-experiment statistics.From these comparisons, we find that the behavior of the workload in the testbeds is similar but that the underlying processes to support it do vary. The low-level network behavior can vary quite widely in packetization depending on the virtual network driver. While these differences can be important, and knowing about them will help experiment designers, the core application and OS behaviors still represent similar processes.

More Details

Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning

IEEE Transactions on Parallel and Distributed Systems

Tuncer, Ozan; Ates, Emre; Zhang, Yijia; Turk, Ata; Brandt, James M.; Leung, Vitus J.; Egele, Manuel; Coskun, Ayse K.

As the size and complexity of high performance computing (HPC) systems grow in line with advancements in hardware and software technology, HPC systems increasingly suffer from performance variations due to shared resource contention as well as software-and hardware-related problems. Such performance variations can lead to failures and inefficiencies, which impact the cost and resilience of HPC systems. To minimize the impact of performance variations, one must quickly and accurately detect and diagnose the anomalies that cause the variations and take mitigating actions. However, it is difficult to identify anomalies based on the voluminous, high-dimensional, and noisy data collected by system monitoring infrastructures. This paper presents a novel machine learning based framework to automatically diagnose performance anomalies at runtime. Our framework leverages historical resource usage data to extract signatures of previously-observed anomalies. We first convert collected time series data into easy-to-compute statistical features. We then identify the features that are required to detect anomalies, and extract the signatures of these anomalies. At runtime, we use these signatures to diagnose anomalies with negligible overhead. We evaluate our framework using experiments on a real-world HPC supercomputer and demonstrate that our approach successfully identifies 98 percent of injected anomalies and consistently outperforms existing anomaly diagnosis techniques.

More Details

Low-Power Deep Learning Inference using the SpiNNaker Neuromorphic Platform

ACM International Conference Proceeding Series

Vineyard, Craig M.; Dellana, Ryan; Aimone, James B.; Rothganger, Fredrick R.; Severa, William M.

With the successes deep neural networks have achieved across a range of applications, researchers have been exploring computational architectures to more efficiently execute their operation. In addition to the prevalent role of graphics processing units (GPUs), many accelerator architectures have emerged. Neuromorphic is one such particular approach which takes inspiration from the brain to guide the computational principles of the architecture including varying levels of biological realism. In this paper we present results on using the SpiNNaker neuromorphic platform (48-chip model) for deep learning neural network inference. We use the Sandia National Laboratories developed Whetstone spiking deep learning library to train deep multi-layer perceptrons and convolutional neural networks suitable for the spiking substrate on the neural hardware architecture. By using the massively parallel nature of SpiNNaker, we are able to achieve, under certain network topologies, substantial network tiling and consequentially impressive inference throughput. Such high-throughput systems may have eventual application in remote sensing applications where large images need to be chipped, scanned, and processed quickly. Additionally, we explore complex topologies that push the limits of the SpiNNaker routing hardware and investigate how that impacts mapping software-implemented networks to on-hardware instantiations.

More Details

Adaptive wavelet compression of large additive manufacturing experimental and simulation datasets

Computational Mechanics

Salloum, Maher S.; Johnson, Kyle J.; Bishop, Joseph E.; Aytac, Jon M.; Dagel, Daryl D.; van Bloemen Waanders, Bart G.

New manufacturing technologies such as additive manufacturing require research and development to minimize the uncertainties in the produced parts. The research involves experimental measurements and large simulations, which result in huge quantities of data to store and analyze. We address this challenge by alleviating the data storage requirements using lossy data compression. We select wavelet bases as the mathematical tool for compression. Unlike images, additive manufacturing data is often represented on irregular geometries and unstructured meshes. Thus, we use Alpert tree-wavelets as bases for our data compression method. We first analyze different basis functions for the wavelets and find the one that results in maximal compression and miminal error in the reconstructed data. We then devise a new adaptive thresholding method that is data-agnostic and allows a priori estimation of the reconstruction error. Finally, we propose metrics to quantify the global and local errors in the reconstructed data. One of the error metrics addresses the preservation of physical constraints in reconstructed data fields, such as divergence-free stress field in structural simulations. While our compression and decompression method is general, we apply it to both experimental and computational data obtained from measurements and thermal/structural modeling of the sintering of a hollow cylinder from metal powders using a Laser Engineered Net Shape process. The results show that monomials achieve optimal compression performance when used as wavelet bases. The new thresholding method results in compression ratios that are two to seven times larger than the ones obtained with commonly used thresholds. Overall, adaptive Alpert tree-wavelets can achieve compression ratios between one and three orders of magnitude depending on the features in the data that are required to preserve. These results show that Alpert tree-wavelet compression is a viable and promising technique to reduce the size of large data structures found in both experiments and simulations.

More Details

Asynchronous Ballistic Reversible Fluxon Logic

IEEE Transactions on Applied Superconductivity

Frank, Michael P.; Lewis, Rupert; Missert, Nancy A.; Wolak, Matthaeus W.; Henry, Michael D.

In a previous study, we described a new abstract circuit model for reversible computation called Asynchronous Ballistic Reversible Computing (ABRC), in which localized information bearing pulses propagate ballistically along signal paths between stateful abstract devices, and elastically scatter off those devices serially, while updating the device state in a logically-reversible and deterministic fashion. The ABRC model has been shown to be capable of universal computation. In the research reported here, we begin exploring how the ABRC model might be realized in practice using single flux quantum (SFQ) solitons (fluxons) in superconducting Josephson junction (JJ) circuits. One natural family of realizations could utilize fluxon polarity to represent binary data in individual pulses propagating near-ballistically along discrete or continuous long Josephson junctions (LJJs) or microstrip passive transmission lines (PTLs), and utilize the flux charge (-1, 0, +1) of a JJ-containing superconducting loop with Φ0 < IcL < 2Φ0 to encode a ternary state variable internal to a device. A natural question then arises as to which of the definable abstract ABRC device functionalities using this data representation might be implementable using a JJ circuit that dissipates only a small fraction of the input fluxon energy. We discuss conservation rules and symmetries considered as constraints to be obeyed in these circuits, and begin the process of classifying the possible ABRC devices in this family having up to 3 bidirectional I/O terminals, and up to 3 internal states.

More Details

WearGP: A computationally efficient machine learning framework for local erosive wear predictions via nodal Gaussian processes

Wear

Laros, James H.; Furlan, John M.; Pagalthivarthi, Krishnan V.; Visintainer, Robert J.; Wildey, Timothy M.; Wang, Yan

Computational fluid dynamics (CFD)-based wear predictions are computationally expensive to evaluate, even with a high-performance computing infrastructure. Thus, it is difficult to provide accurate local wear predictions in a timely manner. Data-driven approaches provide a more computationally efficient way to approximate the CFD wear predictions without running the actual CFD wear models. In this paper, a machine learning (ML) approach, termed WearGP, is presented to approximate the 3D local wear predictions, using numerical wear predictions from steady-state CFD simulations as training and testing datasets. The proposed framework is built on Gaussian process (GP) and utilized to predict wear in a much shorter time. The WearGP framework can be segmented into three stages. At the first stage, the training dataset is built by using a number of CFD simulations in the order of O(102). At the second stage, the data cleansing and data mining processes are performed, where the nodal wear solutions are extracted from the solution database to build a training dataset. At the third stage, the wear predictions are made, using trained GP models. Two CFD case studies including 3D slurry pump impeller and casing are used to demonstrate the WearGP framework, in which 144 training and 40 testing data points are used to train and test the proposed method, respectively. The numerical accuracy, computational efficiency and effectiveness between the WearGP framework and CFD wear model for both slurry pump impellers and casings are compared. It is shown that the WearGP framework can achieve highly accurate results that are comparable with the CFD results, with a relatively small size training dataset, with a computational time reduction on the order of 105 to 106.

More Details

Single and double hole quantum dots in strained Ge/SiGe quantum wells

Nanotechnology

Hardy, Will H.; Harris, Charles T.; Su, Yi H.; Chuang, Yen; Moussa, Jonathan; Maurer, Leon M.; Li, Jiun Y.; Lu, Tzu-Ming L.; Luhman, Dwight R.

Even as today's most prominent spin-based qubit technologies are maturing in terms of capability and sophistication, there is growing interest in exploring alternate material platforms that may provide advantages, such as enhanced qubit control, longer coherence times, and improved extensibility. Recent advances in heterostructure material growth have opened new possibilities for employing hole spins in semiconductors for qubit applications. Undoped, strained Ge/SiGe quantum wells are promising candidate hosts for hole spin-based qubits due to their low disorder, large intrinsic spin-orbit coupling strength, and absence of valley states. Here, we use a simple one-layer gated device structure to demonstrate both a single quantum dot as well as coupling between two adjacent quantum dots. The hole effective mass in these undoped structures, m∗ ∼ 0.08 m 0, is significantly lower than for electrons in Si/SiGe, pointing to the possibility of enhanced tunnel couplings in quantum dots and favorable qubit-qubit interactions in an industry-compatible semiconductor platform.

More Details

Global Solution Strategies for the Network-Constrained Unit Commitment Problem with AC Transmission Constraints

IEEE Transactions on Power Systems

Castillo, Anya; Watson, Jean-Paul W.; Laird, Carl D.

We propose a novel global solution algorithm for the network-constrained unit commitment problem that incorporates a nonlinear alternating current (ac) model of the transmission network, which is a nonconvex mixed-integer nonlinear programming problem. Our algorithm is based on the multi-tree global optimization methodology, which iterates between a mixed-integer lower-bounding problem and a nonlinear upper-bounding problem. We exploit the mathematical structure of the unit commitment problem with ac power flow constraints and leverage second-order cone relaxations, piecewise outer approximations, and optimization-based bounds tightening to provide a globally optimal solution at convergence. Numerical results on four benchmark problems illustrate the effectiveness of our algorithm, both in terms of convergence rate and solution quality.

More Details

Small scale to extreme: Methods for characterizing energy efficiency in supercomputing applications

Sustainable Computing: Informatics and Systems

Younge, Andrew J.

Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.

More Details
Results 2201–2400 of 9,998
Results 2201–2400 of 9,998