Publications

Results 2401–2600 of 9,998

Search results

Jump to search filters

Global Solution Strategies for the Network-Constrained Unit Commitment Problem with AC Transmission Constraints

IEEE Transactions on Power Systems

Castillo, Anya; Watson, Jean-Paul W.; Laird, Carl D.

We propose a novel global solution algorithm for the network-constrained unit commitment problem that incorporates a nonlinear alternating current (ac) model of the transmission network, which is a nonconvex mixed-integer nonlinear programming problem. Our algorithm is based on the multi-tree global optimization methodology, which iterates between a mixed-integer lower-bounding problem and a nonlinear upper-bounding problem. We exploit the mathematical structure of the unit commitment problem with ac power flow constraints and leverage second-order cone relaxations, piecewise outer approximations, and optimization-based bounds tightening to provide a globally optimal solution at convergence. Numerical results on four benchmark problems illustrate the effectiveness of our algorithm, both in terms of convergence rate and solution quality.

More Details

An adaptive local reduced basis method for solving PDEs with uncertain inputs and evaluating risk

Computer Methods in Applied Mechanics and Engineering

Kouri, Drew P.; Aquino, Wilkins A.; Zou, Zilong

Many physical systems are modeled using partial differential equations (PDEs) with uncertain or random inputs. For such systems, naively propagating a fixed number of samples of the input probability law (or an approximation thereof) through the PDE is often inadequate to accurately quantify the “risk” associated with critical system responses. In this paper, we develop a goal-oriented, adaptive sampling and local reduced basis approximation for PDEs with random inputs. Our method determines a set of samples and an associated (implicit) Voronoi partition of the parameter domain on which we build local reduced basis approximations of the PDE solution. The samples are selected in an adaptive manner using an a posteriori error indicator. A notable advantage of the proposed approach is that the computational cost of the approximation during the adaptive process remains constant. We provide theoretical error bounds for our approximation and numerically demonstrate the performance of our method when compared to widely used adaptive sparse grid techniques. In addition, we tailor our approach to accurately quantify the risk of quantities of interest that depend on the PDE solution. We demonstrate our method on an advection–diffusion example and a Helmholtz example.

More Details

Small scale to extreme: Methods for characterizing energy efficiency in supercomputing applications

Sustainable Computing: Informatics and Systems

Younge, Andrew J.

Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.

More Details

Spectral risk measures: the risk quadrangle and optimal approximation

Mathematical Programming

Kouri, Drew P.

We develop a general risk quadrangle that gives rise to a large class of spectral risk measures. The statistic of this new risk quadrangle is the average value-at-risk at a specific confidence level. As such, this risk quadrangle generates a continuum of error measures that can be used for superquantile regression. For risk-averse optimization, we introduce an optimal approximation of spectral risk measures using quadrature. We prove the consistency of this approximation and demonstrate our results through numerical examples.

More Details

Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Concurrency and Computation. Practice and Experience

Ferreira, Kurt B.; Grant, Ryan E.; Levenhagen, Michael J.; Levy, Scott L.; Groves, Taylor

Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms of performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.

More Details

Compressed optimization of device architectures for semiconductor quantum devices compressed optimization of device architectures... ADAM FREES et al

Physical Review Applied

Ward, Daniel R.; Frees, Adam; Gamble, John K.; Blume-Kohout, Robin J.; Eriksson, M.A.; Friesen, Mark; Coppersmith, S.N.

Recent advances in nanotechnology have enabled researchers to manipulate small collections of quantum-mechanical objects with unprecedented accuracy. In semiconductor quantum-dot qubits, this manipulation requires controlling the dot orbital energies, the tunnel couplings, and the electron occupations. These properties all depend on the voltages placed on the metallic electrodes that define the device, the positions of which are fixed once the device is fabricated. While there has been much success with small numbers of dots, as the number of dots grows, it will be increasingly useful to control these systems with as few electrode voltage changes as possible. Here, we introduce a protocol, which we call the "compressed optimization of device architectures" (CODA), in order both to efficiently identify sparse sets of voltage changes that control quantum systems and to introduce a metric that can be used to compare device designs. As an example of the former, we apply this method to simulated devices with up to 100 quantum dots and show that CODA automatically tunes devices more efficiently than other common nonlinear optimizers. To demonstrate the latter, we determine the optimal lateral scale for a triple quantum dot, yielding a simulated device that can be tuned with small voltage changes on a limited number of electrodes.

More Details

SST-GPU: An Execution -Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model

Khairy, Mahmoud; Zhang, Mengchi; Green, Roland; Hammond, Simon D.; Hoekstra, Robert J.; Rogers, Timothy; Hughes, Clayton H.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel acceleration where the context for thousands of concurrent threads are resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. The design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. To address the need for a scalable, decentralized GPU model that can model large GPUs, chiplet-based GPUs and multi-node GPUs, this report details the first steps in integrating the open-source, execution driven GPGPU-Sim into the SST framework. The first stage of this project, creates two elements: a kernel scheduler SST element accepts work from SST CPU models and schedules it to an SM-collection element that performs cycle-by-cycle timing using SSTs Mem Hierarchy to model a flexible memory system.

More Details

Curvature Based Analysis to Identify and Categorize Trajectory Segments

Schrum Jr., Paul T.; Laros, James H.; Newton, Benjamin D.

Since the attacks carried out against the United States on September 11, 2001, which involved the commandeering of commercial aircraft, interest has increased in performing trajectory analysis of vehicle types not constrained by roadways or railways, i.e., aircraft and watercraft. Anomalous trajectories need to be automatically identified along with other trajectories of interest to flag them for further investigation. There is also interest in analyzing trajectories without a focus on anomaly detection. Various approaches to analyzing these trajectories have been undertaken with useful results to date. In this research, we seek to augment trajectory analysis by carrying out analysis of the trajectory curvature along with other parameters, including distance and total deflection (change in direction). At each point triplet in the ordered sequence of points, these parameters are computed. Adjacent point triplets with similar values are grouped together to form a higher level of semantic categorization. These categorizations are then analyzed to form a yet higher level of categorization which has more specific semantic meaning. This top level of categorization is then summarized for all trajectories under study, allowing for fast identification of trajectories with various semantic characteristics.

More Details

An optimization-based framework to define the probabilistic design space of pharmaceutical processes with model uncertainty

Processes

Laky, Daniel; Xu, Shu; Rodriguez, Jose S.; Vaidyaraman, Shankar; Munoz, Salvador G.; Laird, Carl D.

To increase manufacturing flexibility and system understanding in pharmaceutical development, the FDA launched the quality by design (QbD) initiative. Within QbD, the design space is the multidimensional region (of the input variables and process parameters) where product quality is assured. Given the high cost of extensive experimentation, there is a need for computational methods to estimate the probabilistic design space that considers interactions between critical process parameters and critical quality attributes, as well as model uncertainty. In this paper we propose two algorithms that extend the flexibility test and flexibility index formulations to replace simulation-based analysis and identify the probabilistic design space more efficiently. The effectiveness and computational efficiency of these approaches is shown on a small example and an industrial case study.

More Details

ASCR Workshop on In Situ Data Management

Peterka, Tom; Bard, Deborah; Bennett, Janine C.; Bethel, E.W.; Oldfield, Ron A.; Pouchard, Line; Sweeney, Christine; Wolf, Matthew

In January 2019, the U.S. Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions for in situ data management (ISDM). The workshop defined ISDM as the practices, capabilities, and procedures to control the organization of data and enable the coordination and communication among heterogeneous tasks, executing simultaneously in a high-performance computing system, cooperating toward a common objective. The workshop revealed two primary, interdependent motivations for processing and managing data in situ. The first motivation is that the in situ methodology enables scientific discovery from a broad range of data sources over a wide scale of computing platforms: leadership-class systems, clusters, clouds, workstations, and embedded devices at the edge. The successful development of ISDM capabilities will benefit real-time decision-making, design optimization, and data-driven scientific discovery. The second motivation is the need to decrease data volumes. ISDM can make critical contributions to managing large data volumes from computations and experiments to minimize data movement, save storage space, and boost resource efficiency, often while simultaneously increasing scientific precision.

More Details

Description and evaluation of the Community Ice Sheet Model (CISM) v2.1

Geoscientific Model Development

Lipscomb, William H.; Price, Stephen F.; Hoffman, Matthew J.; Leguy, Gunter R.; Bennett, Andrew R.; Bradley, Sarah L.; Evans, Katherine J.; Fyke, Jeremy G.; Kennedy, Joseph H.; Perego, Mauro P.; Ranken, Douglas M.; Sacks, William J.; Salinger, Andrew G.; Vargo, Lauren J.; Worley, Patrick H.

We describe and evaluate version 2.1 of the Community Ice Sheet Model (CISM). CISM is a parallel, 3-D thermomechanical model, written mainly in Fortran, that solves equations for the momentum balance and the thickness and temperature evolution of ice sheets. CISM's velocity solver incorporates a hierarchy of Stokes flow approximations, including shallow-shelf, depth-integrated higher order, and 3-D higher order. CISM also includes a suite of test cases, links to third-party solver libraries, and parameterizations of physical processes such as basal sliding, iceberg calving, and sub-ice-shelf melting. The model has been verified for standard test problems, including the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) experiments, and has participated in the initMIP-Greenland initialization experiment. In multimillennial simulations with modern climate forcing on a 4 km grid, CISM reaches a steady state that is broadly consistent with observed flow patterns of the Greenland ice sheet. CISM has been integrated into version 2.0 of the Community Earth System Model, where it is being used for Greenland simulations under past, present, and future climates. The code is open-source with extensive documentation and remains under active development.

More Details

VideoSwarm: Analyzing video ensembles

IS and T International Symposium on Electronic Imaging Science and Technology

Martin, Shawn; Sielicki, Milosz A.; Gittinger, Jaxon M.; Letter, Matthew L.; Hunt, Warren L.; Crossno, Patricia J.

We present VideoSwarm, a system for visualizing video ensembles generated by numerical simulations. VideoSwarm is a web application, where linked views of the ensemble each represent the data using a different level of abstraction. VideoSwarm uses multidimensional scaling to reveal relationships between a set of simulations relative to a single moment in time, and to show the evolution of video similarities over a span of time. VideoSwarm is a plug-in for Slycat, a web-based visualization framework which provides a web-server, database, and Python infrastructure. The Slycat framework provides support for managing multiple users, maintains access control, and requires only a Slycat supported commodity browser (such as Firefox, Chrome, or Safari).

More Details

Understanding the Machine Learning Needs of ECP Applications

Ellis, John E.; Rajamanickam, Sivasankaran R.

In order to support the codesign needs of ECP applications in current and future hardware in the area of machine learning, the ExaLearn team at Sandia studied the different machine learning use cases in three different ECP applications. This report is a summary of the needs of the three applications. The Sandia ExaLearn team will develop a proxy application representative of ECP application needs, specifically the ExaSky and EXAALT ECP projects. The proxy application will allow us to demonstrate performance portable kernels within machine learning codes. Furthermore, current training scalability of machine learning networks in these applications is negatively affected by large batch sizes. Training throughput of the network will increase as batch size increases, but network accuracy and generalization worsens. The proxy application will contain hybrid model- and data-parallelism to improve training efficiency while maintaining network accuracy. The proxy application will also target optimizing 3D convolutional layers, specific to scientific machine learning, which have not been as thoroughly explored by industry.

More Details

ECP Milestone Memo WBS 2.3.4.13 ECP/VTK-m FY19Q1 [MS-19/01-03] ZFP / Release / Clip STDA05-17

Moreland, Kenneth D.

The STDA05-17 milestone comprises the following 3 deliverables. VTK-m Release 2 We will provide a release of VTK-m software and associated documentation. The source code repository will be tagged at a stable state, and, at a minimum, tarball captures of the source code will be made available from the web site. A version of the VTK-m User's Guide documenting this release will also be made available. Productionize zfp compression The "ZFP: Compressed Floating-Point Arrays" project (WBS 1.3.4.13) is creating an implementation of ZFP compression in VTK-m. Their implementation will be focused on operating in CUDA. The VTK-m project will assist by generalizing the implementation to other devices (such as multi-core CPUs). We will also assist in productionizing the code such that it can be used by external projects and products. Clip Clip operations intersect meshes with implicit functions. It is the foundation of spatial subsetting algorithms, such as "box," and the foundation of data-based subsetting, such as "isovolume." The algorithm requires considering thousands of possible cases, and is thus quite difficult to implement. This milestone will implement clipping to be sufficient for Visit's and ParaView's needs.

More Details

ECP Milestone Memo WBS 2.3.4.13 ECP/VTK-m FY18Q4 [MS-18/09-10] Dynamic Types / Rendering Topologies STDA05-16

Moreland, Kenneth D.

The STDA05-16 milestone comprises the following 3 distinct deliverables. OpenMP VTK-m currently supports three types of devices: serial CPU, TBB, and CUDA. To run algorithms on multicore CPU-type devices (such as Xeon and Xeon Phi), TBB is required. However, there are known issues with integrating a software product using TBB with another one using OpenMP. Therefore, we will add an OpenMP device to the VTK-m software. When engaged, this device will run parallel algorithms using OpenMP directives. This will mesh more nicely with other code also using OpenMP. Rendering Topological Entities VTK-m currently supports surface rendering by tessellation of data structures,and rendering the resulting triangles. We will extend current functionality to include face, edge, and point rendering. Better Dynamic Types Impl For the best efficiency across all platforms, VTK-m algorithms use static typing with C++ templates. However, many libraries like VTK, ParaView, and Visit use dynamic types with virtual functions because data types often cannot be determined at compile time. We have an interface in VTK-m to merge these two typing mechanisms by generating all possible combinations of static types when faced with a dynamic type. Although this mechanism works, it generates very large executables and takes a very long time to compile. As we move forward, it is clear that these problems will get worse and become infeasible at exascale. We will rectify the problem by introducing some level of virtual methods, which require only a single code path, within VTK-m algorithms. This first milestone produces a design document to propose an approach to the new system.

More Details

Recent advancements in multilevel-multifidelity techniques for forward UQ in the DARPA sequoia project

AIAA Scitech 2019 Forum

Geraci, Gianluca G.; Eldred, Michael S.; Gorodetsky, Alex A.; Jakeman, John D.

In the context of the DARPA funded project SEQUOIA we are interested in the design under uncertainty of a jet engine nozzle subject to the performance requirements of a reconnaissance mission for a small unmanned military aircraft. This design task involves complex and expensive aero-thermo-structural computational analyses where it is of a paramount importance to also include the effect of the uncertain variables to obtain reliable predictions of the device’s performance. In this work we focus on the forward propagation analysis which is a key part of the design under uncertainty workflow. This task cannot be tackled directly by means of single fidelity approaches due to the prohibitive computational cost associated to each realization. We report here a summary of our latest advancement regarding several multilevel and multifidelity strategies designed to alleviate these challenges. The overall goal of these techniques is to reduce the computational cost of analyzing a high-fidelity model by resorting to less accurate, but less computationally demanding, lower fidelity models. The features of these multifidelity UQ approaches are initially illustrated and demonstrated on several model problems and afterward for the aero-thermo-structural analysis of the jet engine nozzle.

More Details

Communication-efficient property preservation in tracer transport

SIAM Journal on Scientific Computing

Bradley, Andrew M.; Bosler, Peter A.; Guba, Oksana G.; Taylor, Mark A.; Barnett, Gregory A.

Atmospheric tracer transport is a computationally demanding component of the atmospheric dynamical core of weather and climate simulations. Simulations typically have tens to hundreds of tracers. A tracer field is required to preserve several properties, including mass, shape, and tracer consistency. To improve computational efficiency, it is common to apply different spatial and temporal discretizations to the tracer transport equations than to the dynamical equations. Using different discretizations increases the difficulty of preserving properties. This paper provides a unified framework to analyze the property preservation problem and classes of algorithms to solve it. We examine the primary problem and a safety problem; describe three classes of algorithms to solve these; introduce new algorithms in two of these classes; make connections among the algorithms; analyze each algorithm in terms of correctness, bound on its solution magnitude, and its communication efficiency; and study numerical results. A new algorithm, QLT, has the smallest communication volume, and in an important case it redistributes mass approximately locally. These algorithms are only very loosely coupled to the underlying discretizations of the dynamical and tracer transport equations and thus are broadly and efficiently applicable. In addition, they may be applied to remap problems in applications other than tracer transport.

More Details

High performance erasure coding for very large stripe sizes

Simulation Series

Haddock, Walker; Bangalore, Purushotham V.; Curry, Matthew L.; Skjellum, Anthony

Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, stripes can be subdivided into large numbers of shards. This paper’s contribution is a prototype nearline disk object storage system based on Ceph. We show that using general purpose graphical processing units (GPGPU) for erasure coding on file transfer nodes is effective when using a large number of shards. We describe an architecture for nearline disk archive storage for use with high performance computing (HPC) and demonstrate the performance with benchmarking results. We compare the benchmark performance of our design with the IntelR⃝ Storage Acceleration Library (ISA-L) CPU based erasure coding libraries using the native Ceph erasure coding feature.

More Details

Finepoints: Partitioned multithreaded MPI communication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Grant, Ryan E.; Dosanjh, Matthew D.; Levenhagen, Michael J.; Brightwell, Ronald B.; Skjellum, Anthony

The MPI multithreading model has been historically difficult to optimize; the interface that it provides for threads was designed as a process-level interface. This model has led to implementations that treat function calls as critical regions and protect them with locks to avoid race conditions. We hypothesize that an interface designed specifically for threads can provide superior performance than current approaches and even outperform single-threaded MPI. In this paper, we describe a design for partitioned communication in MPI that we call finepoints. First, we assess the existing communication models for MPI two-sided communication and then introduce finepoints as a hybrid of MPI models that has the best features of each existing MPI communication model. In addition, “partitioned communication” created with finepoints leverages new network hardware features that cannot be exploited with current MPI point-to-point semantics, making this new approach both innovative and useful both now and in the future. To demonstrate the validity of our hypothesis, we implement a finepoints library and show improvements against a state-of-the-art multithreaded optimized Open MPI implementation on a Cray XC40 with an Aries network. Our experiments demonstrate up to a 12 × reduction in wait time for completion of send operations. This new model is shown working on a nuclear reactor physics neutron-transport proxy-application, providing up to 26.1% improvement in communication time and up to 4.8% improvement in runtime over the best performing MPI communication mode, single-threaded MPI.

More Details

Robust uncertainty quantification using response surface approximations of discontinuous functions

International Journal for Uncertainty Quantification

Wildey, Timothy M.; Gorodetsky, A.A.; Belme, A.C.; Shadid, John N.

This paper considers response surface approximations for discontinuous quantities of interest. Our objective is not to adaptively characterize the interface defining the discontinuity. Instead, we utilize an epistemic description of the uncertainty in the location of a discontinuity to produce robust bounds on sample-based estimates of probabilistic quantities of interest. We demonstrate that two common machine learning strategies for classification, one based on nearest neighbors (Voronoi cells) and one based on support vector machines, provide reasonable descriptions of the region where the discontinuity may reside. In higher dimensional spaces, we demonstrate that support vector machines are more accurate for discontinuities defined by smooth interfaces. We also show how gradient information, often available via adjoint-based approaches, can be used to define indicators to effectively detect a discontinuity and to decompose the samples into clusters using an unsupervised learning technique. Numerical results demonstrate the epistemic bounds on probabilistic quantities of interest for simplistic models and for a compressible fluid model with a shock-induced discontinuity.

More Details

Exploration of multifidelity approaches for uncertainty quantification in network applications

Proceedings of the 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering, UNCECOMP 2019

Geraci, Gianluca G.; Swiler, Laura P.; Crussell, Jonathan C.; Debusschere, Bert D.

Communication networks have evolved to a level of sophistication that requires computer models and numerical simulations to understand and predict their behavior. A network simulator is a software that enables the network designer to model several components of a computer network such as nodes, routers, switches and links and events such as data transmissions and packet errors in order to obtain device and network level metrics. Network simulations, as many other numerical approximations that model complex systems, are subject to the specification of parameters and operative conditions of the system. Very often the full characterization of the system and their input is not possible, therefore Uncertainty Quantification (UQ) strategies need to be deployed to evaluate the statistics of its response and behavior. UQ techniques, despite the advancements in the last two decades, still suffer in the presence of a large number of uncertain variables and when the regularity of the systems response cannot be guaranteed. In this context, multifidelity approaches have gained popularity in the UQ community recently due to their flexibility and robustness with respect to these challenges. The main idea behind these techniques is to extract information from a limited number of high-fidelity model realizations and complement them with a much larger number of a set of lower fidelity evaluations. The final result is an estimator with a much lower variance, i.e. a more accurate and reliable estimator can be obtained. In this contribution we investigate the possibility to deploy multifidelity UQ strategies to computer network analysis. Two numerical configurations are studied based on a simplified network with one client and one server. Preliminary results for these tests suggest that multifidelity sampling techniques might be used as effective tools for UQ tools in network applications.

More Details

Methods of sensitivity analysis in geologic disposal safety assessment (GDSA) framework

International High-Level Radioactive Waste Management 2019, IHLRWM 2019

Stein, Emily S.; Swiler, Laura P.; Sevougian, Stephen D.

Probabilistic simulations of the post-closure performance of a generic deep geologic repository for commercial spent nuclear fuel in shale host rock provide a test case for comparing sensitivity analysis methods available in Geologic Disposal Safety Assessment (GDSA) Framework, the U.S. Department of Energy's state-of-the-art toolkit for repository performance assessment. Simulations assume a thick low-permeability shale with aquifers (potential paths to the biosphere) above and below the host rock. Multi-physics simulations on the 7-million-cell grid are run in a high-performance computing environment with PFLOTRAN. Epistemic uncertain inputs include properties of the engineered and natural systems. The output variables of interest, maximum I-129 concentrations (independent of time) at observation points in the aquifers, vary over several orders of magnitude. Variance-based global sensitivity analyses (i.e., calculations of sensitivity indices) conducted with Dakota use polynomial chaos expansion (PCE) and Gaussian process (GP) surrogate models. Results of analyses conducted with raw output concentrations and with log-transformed output concentrations are compared. Using log-transformed concentrations results in larger sensitivity indices for more influential input variables, smaller sensitivity indices for less influential input variables, and more consistent values for sensitivity indices between methods (PCE and GP) and between analyses repeated with samples of different sizes.

More Details

Gate-defined quantum dots in Ge/SiGe quantum wells as a platform for spin qubits

ECS Transactions

Hardy, Will H.; Su, Y.H.; Chuang, Y.; Maurer, Leon M.; Brickson, Mitchell I.; Baczewski, Andrew D.; Li, J.Y.; Lu, Tzu-Ming L.; Luhman, Dwight R.

In the field of semiconductor quantum dot spin qubits, there is growing interest in leveraging the unique properties of hole-carrier systems and their intrinsically strong spin-orbit coupling to engineer novel qubits. Recent advances in semiconductor heterostructure growth have made available high quality, undoped Ge/SiGe quantum wells, consisting of a pure strained Ge layer flanked by Ge-rich SiGe layers above and below. These quantum wells feature heavy hole carriers and a cubic Rashba-type spin-orbit interaction. Here, we describe progress toward realizing spin qubits in this platform, including development of multi-metal-layer gated device architectures, device tuning protocols, and charge-sensing capabilities. Iterative improvement of a three-layer metal gate architecture has significantly enhanced device performance over that achieved using an earlier single-layer gate design. We discuss ongoing, simulation-informed work to fine-tune the device geometry, as well as efforts toward a single-spin qubit demonstration.

More Details

Geometric uncertainty quantification and robust design for 2D satellite shielding

International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering, M and C 2019

Pautz, Shawn D.; Adams, Brian M.; Bruss, Donald E.

The design of satellites usually includes the objective of minimizing mass due to high launch costs, which is challenging due to the need to protect sensitive electronics from the space radiation environment by means of radiation shielding. This is further complicated by the need to account for uncertainties, e.g. in manufacturing. There is growing interest in automated design optimization and uncertainty quantification (UQ) techniques to help achieve that objective. Traditional optimization and UQ approaches that rely exclusively on response functions (e.g. dose calculations) can be quite expensive when applied to transport problems. Previously we showed how adjoint-based transport sensitivities used in conjunction with gradient-based optimization algorithms can be quite effective in designing mass-efficient electron and/or proton shields in one- or two-dimensional Cartesian geometries. In this paper we extend that work to UQ and to robust design (i.e. optimization that considers uncertainties) in 2D. This consists primarily of using the sensitivities to geometric changes, originally derived for optimization, within relevant algorithms for UQ and robust design. We perform UQ analyses on previous optimized designs given some assumed manufacturing uncertainties. We also conduct a new optimization exercise that accounts for the same uncertainties. Our results show much improved computational efficiencies over previous approaches.

More Details

SBF-BO-2CoGP: A sequential bi-fidelity constrained Bayesian optimization for design applications

Proceedings of the ASME Design Engineering Technical Conference

Laros, James H.; Wildey, Timothy M.; Mccann, Scott

Bayesian optimization is an effective surrogate-based optimization method that has been widely used for simulation-based applications. However, the traditional Bayesian optimization (BO) method is only applicable to single-fidelity applications, whereas multiple levels of fidelity exist in reality. In this work, we propose a bi-fidelity known/unknown constrained Bayesian optimization method for design applications. The proposed framework, called sBF-BO-2CoGP, is built on a two-level CoKriging method to predict the objective function. An external binary classifier, which is also another CoKriging model, is used to distinguish between feasible and infeasible regions. The sBF-BO-2CoGP method is demonstrated using a numerical example and a flip-chip application for design optimization to minimize the warpage deformation under thermal loading conditions.

More Details

Checkpointing Strategies for Shared High-Performance Computing Platforms

International Journal of Networking and Computing

Ferreira, Kurt B.

Input/output (I/O) from various sources often contend for scarcely available bandwidth. For example, checkpoint/restart (CR) protocols can help to ensure application progress in failure-prone environments. However, CR I/O alongside an application's normal, requisite I/O can increase I/O contention and might negatively impact performance. In this work, we consider different aspects (system-level scheduling policies and hardware) that optimize the overall performance of concurrently executing CR-based applications that share I/O resources. We provide a theoretical model and derive a set of necessary constraints to minimize the global waste on a given platform. Our results demonstrate that Young/Daly's optimal checkpoint interval, despite providing a sensible metric for a single, undisturbed application, is not sufficient to optimally address resource contention at scale. We show that by combining optimal checkpointing periods with contention-aware system-level I/O scheduling strategies, we can significantly improve overall application performance and maximize the platform throughput. Finally, we evaluate how specialized hardware, namely burst buffers, may help to mitigate the I/O contention problem. Altogether, these results provide critical analysis and direct guidance on how to design efficient, CR ready, large -scale platforms without a large investment in the I/O subsystem.

More Details

An asymptotically compatible meshfree quadrature rule for nonlocal problems with applications to peridynamics

Computer Methods in Applied Mechanics and Engineering

Trask, Nathaniel A.; You, Huaiqian; Yu, Yue; Parks, Michael L.

We present a meshfree quadrature rule for compactly supported nonlocal integro-differential equations (IDEs) with radial kernels. We apply this rule to develop a meshfree discretization of a peridynamic solid mechanics model that requires no background mesh. Existing discretizations of peridynamic models have been shown to exhibit a lack of asymptotic compatibility to the corresponding linearly elastic local solution. By posing the quadrature rule as an equality constrained least squares problem, we obtain asymptotically compatible convergence by introducing polynomial reproduction constraints. Our approach naturally handles traction-free conditions, surface effects, and damage modeling for both static and dynamic problems. We demonstrate high-order convergence to the local theory by comparing to manufactured solutions and to cases with crack singularities for which an analytic solution is available. Finally, we verify the applicability of the approach to realistic problems by reproducing high-velocity impact results from the Kalthoff–Winkler experiments.

More Details

An efficient, globally convergent method for optimization under uncertainty using adaptive model reduction and sparse grids

SIAM-ASA Journal on Uncertainty Quantification

Zahr, Matthew J.; Carlberg, Kevin T.; Kouri, Drew P.

This work introduces a new method to efficiently solve optimization problems constrained by partial differential equations (PDEs) with uncertain coefficients. The method leverages two sources of inexactness that trade accuracy for speed: (1) stochastic collocation based on dimension-Adaptive sparse grids (SGs), which approximates the stochastic objective function with a limited number of quadrature nodes, and (2) projection-based reduced-order models (ROMs), which generate efficient approximations to PDE solutions. These two sources of inexactness lead to inexact objective function and gradient evaluations, which are managed by a trust-region method that guarantees global convergence by adaptively refining the SG and ROM until a proposed error indicator drops below a tolerance specified by trust-region convergence theory. A key feature of the proposed method is that the error indicator|which accounts for errors incurred by both the SG and ROM|must be only an asymptotic error bound, i.e., a bound that holds up to an arbitrary constant that need not be computed. This enables the method to be applicable to a wide range of problems, including those where sharp, computable error bounds are not available; this distinguishes the proposed method from previous works. Numerical experiments performed on a model problem from optimal ow control under uncertainty verify global convergence of the method and demonstrate the method's ability to outperform previously proposed alternatives.

More Details

Quantifying hydraulic and water quality uncertainty to inform sampling of drinking water distribution systems

Journal of Water Resources Planning and Management

Hart, David B.; Rodriguez, J.S.; Burkhardt, Jonathan; Borchers, Brian; Laird, Carl D.; Murray, Regan; Klise, Katherine A.; Haxton, Terranna

Sampling of drinking water distribution systems is performed to ensure good water quality and protect public health. Sampling also satisfies regulatory requirements and is done to respond to customer complaints or emergency situations. Water distribution system modeling techniques can be used to plan and inform sampling strategies. However, a high degree of accuracy and confidence in the hydraulic and water quality models is required to support real-time response. One source of error in these models is related to uncertainty in model input parameters. Effective characterization of these uncertainties and their effect on contaminant transport during a contamination incident is critical for providing confidence estimates in model-based design and evaluation of different sampling strategies. In this paper, the effects of uncertainty in customer demand, isolation valve status, bulk reaction rate coefficient, contaminant injection location, start time, duration, and rate on the size and location of the contaminant plume are quantified for two example water distribution systems. Results show that the most important parameter was the injection location. The size of the plume was also affected by the reaction rate coefficient, injection rate, and injection duration, whereas the exact location of the plume was additionally affected by the isolation valve status. Uncertainty quantification provides a more complete picture of how contaminants move within a water distribution system and more information when using modeling results to select sampling locations.

More Details

Cactus Environment Machine: Shared Environment Call-by-Need

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Stelle, George; Stefanovic, Darko; Olivier, Stephen L.; Forrest, Stephanie

Existing machines for lazy evaluation use a flat representation of environments, storing the terms associated with free variables in an array. Combined with a heap, this structure supports the shared intermediate results required by lazy evaluation. We propose and describe an alternative approach that uses a shared environment to minimize the overhead of delayed computations. We show how a shared environment can act as both an environment and a mechanism for sharing results. To formalize this approach, we introduce a calculus that makes the shared environment explicit, as well as a machine to implement the calculus, the Cactus Environment Machine. A simple compiler implements the machine and is used to run experiments for assessing performance. The results show reasonable performance and suggest that incorporating this approach into real-world compilers could yield performance benefits in some scenarios.

More Details

Second-Order Multiplier Updates to Accelerate Admm Methods in Optimization Under Uncertainty

Computer Aided Chemical Engineering

Rodriguez, Jose S.; Hackebeil, Gabriel; Siirola, John D.; Zavala, Victor M.; Laird, Carl D.

There is a need for efficient optimization strategies to efficiently solve large-scale, nonlinear optimization problems. Many problem classes, including design under uncertainty are inherently structured and can be accelerated with decomposition approaches. This paper describes a second-order multiplier update for the alternating direction method of multipliers (ADMM) to solve nonlinear stochastic programming problems. We exploit connections between ADMM and the Schur-complement decomposition to derive an accelerated version of ADMM. Specifically, we study the effectiveness of performing a Newton-Raphson algorithm to compute multiplier estimates for the method of multipliers (MM). We interpret ADMM as a decomposable version of MM and propose modifications to the multiplier update of the standard ADMM scheme based on improvements observed in MM. The modifications to the ADMM algorithm seek to accelerate solutions of optimization problems for design under uncertainty and the numerical effectiveness of the approaches is demonstrated on a set of ten stochastic programming problems. Practical strategies for improving computational performance are discussed along with comparisons between the algorithms. We observe that the second-order update achieves convergence in fewer unconstrained minimizations for MM on general nonlinear problems. In the case of ADMM, the second-order update reduces significantly the number of subproblem solves for convex quadratic programs (QPs).

More Details

High fidelity surrogate modeling of fuel dissolution for probabilistic assessment of repository performance

International High-Level Radioactive Waste Management 2019, IHLRWM 2019

Mariner, Paul M.; Swiler, Laura P.; Seidl, Daniel T.; Debusschere, Bert J.; Vo, Jonathan; Frederick, Jennifer M.; Jerden, James L.

Two surrogate models are under development to rapidly emulate the effects of the Fuel Matrix Degradation (FMD) model in GDSA Framework. One is a polynomial regression surrogate with linear and quadratic fits, and the other is a k-Nearest Neighbors regressor (kNNr) method that operates on a lookup table. Direct coupling of the FMD model to GDSA Framework is too computationally expensive. Preliminary results indicate these surrogate models will enable GDSA Framework to rapidly simulate spent fuel dissolution for each individual breached spent fuel waste package in a probabilistic repository simulation. This capability will allow uncertainties in spent fuel dissolution to be propagated and sensitivities in FMD inputs to be quantified and ranked against other inputs.

More Details

Statistical models of dengue fever

Communications in Computer and Information Science

Link, Hamilton E.; Richter, Samuel N.; Leung, Vitus J.; Brost, Randolph B.; Phillips, Cynthia A.; Staid, Andrea S.

We use Bayesian data analysis to predict dengue fever outbreaks and quantify the link between outbreaks and meteorological precursors tied to the breeding conditions of vector mosquitos. We use Hamiltonian Monte Carlo sampling to estimate a seasonal Gaussian process modeling infection rate, and aperiodic basis coefficients for the rate of an “outbreak level” of infection beyond seasonal trends across two separate regions. We use this outbreak level to estimate an autoregressive moving average (ARMA) model from which we extrapolate a forecast. We show that the resulting model has useful forecasting power in the 6–8 week range. The forecasts are not significantly more accurate with the inclusion of meteorological covariates than with infection trends alone.

More Details

The Tularosa study: An experimental design and implementation to quantify the effectiveness of cyber deception

Proceedings of the Annual Hawaii International Conference on System Sciences

Ferguson-Walter, Kimberly J.; Shade, Temmie B.; Rogers, Andrew V.; Trumbo, Michael C.; Nauer, Kevin S.; Divis, Kristin; Jones, Aaron P.; Combs, Angela C.; Abbott, Robert G.

The Tularosa study was designed to understand how defensive deception-including both cyber and psychological-affects cyber attackers. Over 130 red teamers participated in a network penetration task over two days in which we controlled both the presence of and explicit mention of deceptive defensive techniques. To our knowledge, this represents the largest study of its kind ever conducted on a professional red team population. The design was conducted with a battery of questionnaires (e.g., experience, personality, etc.) and cognitive tasks (e.g., fluid intelligence, working memory, etc.), allowing for the characterization of a “typical” red teamer, as well as physiological measures (e.g., galvanic skin response, heart rate, etc.) to be correlated with the cyber events. This paper focuses on the design, implementation, data, population characteristics, and begins to examine preliminary results.

More Details

MueLu User's Guide

Berger-Vergiat, Luc B.; Glusa, Christian A.; Hu, Jonathan J.; Siefert, Christopher S.; Tuminaro, Raymond S.; Mayr, Matthias; Prokopenko, Andrey; Wiesner, Tobias

This is the official user guide for MUELU multigrid library in Trilinos version 12.13 (Dev). This guide provides an overview of MUELU, its capabilities, and instructions for new users who want to start using MUELU with a minimum of effort. Detailed information is given on how to drive MUELU through its XML interface. Links to more advanced use cases are given. This guide gives information on how to achieve good parallel performance, as well as how to introduce new algorithms Finally, readers will find a comprehensive listing of available MUELU options. Any options not documented in this manual should be considered strictly experimental.

More Details

Progress in scramjet design optimization under uncertainty using simulations of the HIFiRE direct connect rig

AIAA Scitech 2019 Forum

Geraci, Gianluca G.; Menhorn, Friedrich; Huan, Xun; Safta, Cosmin S.; Marzouk, Youssef M.; Najm, H.N.; Eldred, Michael S.

We present an overview of optimization under uncertainty efforts under the DARPA Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) ScramjetUQ project. We introduce the mathematical frameworks and computational tools employed for performing this task. In particular, we provide details in the optimization and multilevel uncertainty quantification algorithms, which are available through the SNOWPAC and DAKOTA software packages. The overall workflow is first demonstrated on a simplified model design problem with non-reacting inviscid supersonic flows. Preliminary results and updates are then reported for a in-progress scramjet design optimization case using large-eddy simulations of supersonic reactive flows inside the HIFiRE Direct Connect Rig.

More Details

Making openMP ready for c++ executors

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Scogland, Thomas R.W.; Sunderland, Daniel S.; Olivier, Stephen L.; Hollman, David S.; Evans, Noah; De Supinski, Bronis R.

For at least the last 20 years, many have tried to create a general resource management system to support interoperability across various concurrent libraries. The previous strategies all suffered from additional toolchain requirements, and/or a usage of a shared programing model that assumed it owned/controlled access to all resources available to the program. None of these techniques have achieved wide spread adoption. The ubiquity of OpenMP coupled with C++ developing a standard way to describe many different concurrent paradigms (C++23 executors) would allow OpenMP to assume the role of a general resource manager without requiring user code written directly in OpenMP. With a few added features such as the ability to use otherwise idle threads to execute tasks and to specify a task “width”, many interesting concurrent frameworks could be developed in native OpenMP and achieve high performance. Further, one could create concrete C++ OpenMP executors that enable support for general C++ executor based codes, which would allow Fortran, C, and C++ codes to use the same underlying concurrent framework when expressed as native OpenMP or using language specific features. Effectively, OpenMP would become the de facto solution for a problem that has long plagued the HPC community.

More Details

Making bread: Biomimetic strategies for artificial intelligence now and in the future

Frontiers in Neuroscience

Krichmar, Jeffrey L.; Severa, William M.; Khan, Muhammad S.; Olds, James L.

The Artificial Intelligence (AI) revolution foretold of during the 1960s is well underway in the second decade of the twenty first century. Its period of phenomenal growth likely lies ahead. AI-operated machines and technologies will extend the reach of Homo sapiens far beyond the biological constraints imposed by evolution: outwards further into deep space, as well as inwards into the nano-world of DNA sequences and relevant medical applications. And yet, we believe, there are crucial lessons that biology can offer that will enable a prosperous future for AI. For machines in general, and for AI's especially, operating over extended periods or in extreme environments will require energy usage orders of magnitudes more efficient than exists today. In many operational environments, energy sources will be constrained. The AI's design and function may be dependent upon the type of energy source, as well as its availability and accessibility. Any plans for AI devices operating in a challenging environment must begin with the question of how they are powered, where fuel is located, how energy is stored and made available to the machine, and how long the machine can operate on specific energy units. While one of the key advantages of AI use is to reduce the dimensionality of a complex problem, the fact remains that some energy is required for functionality. Hence, the materials and technologies that provide the needed energy represent a critical challenge toward future use scenarios of AI and should be integrated into their design. Here we look to the brain and other aspects of biology as inspiration for Biomimetic Research for Energy-efficient AI Designs (BREAD).

More Details

Physics–Dynamics Coupling with Element-Based High-Order Galerkin Methods: Quasi-Equal-Area Physics Grid

Monthly Weather Review

Herrington, Adam R.; Lauritzen, Peter H.; Taylor, Mark A.; Goldhaber, Steve; Eaton; Reed; Ullrich, Paul A.

Atmospheric modeling with element-based high-order Galerkin methods presents a unique challenge to the conventional physics–dynamics coupling paradigm, due to the highly irregular distribution of nodes within an element and the distinct numerical characteristics of the Galerkin method. The conventional coupling procedure is to evaluate the physical parameterizations (physics) on the dynamical core grid. Evaluating the physics at the nodal points exacerbates numerical noise from the Galerkin method, enabling and amplifying local extrema at element boundaries. Grid imprinting may be substantially reduced through the introduction of an entirely separate, approximately isotropic finite-volume grid for evaluating the physics forcing. Integration of the spectral basis over the control volumes provides an area-average state to the physics, which is more representative of the state in the vicinity of the nodal points rather than the nodal point itself and is more consistent with the notion of a “large-scale state” required by conventional physics packages. This study documents the implementation of a quasi-equal-area physics grid into NCAR’s Community Atmosphere Model Spectral Element and is shown to be effective at mitigating grid imprinting in the solution. The physics grid is also appropriate for coupling to other components within the Community Earth System Model, since the coupler requires component fluxes to be defined on a finite-volume grid, and one can be certain that the fluxes on the physics grid are, indeed, volume averaged.

More Details

Mediating Data Center Storage Diversity in HPC Applications with FAODEL

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick W.; Ulmer, Craig D.; Levy, Scott L.; Kordenbrock, Todd H.; Templet, Gary J.

Composition of computational science applications into both ad hoc pipelines for analysis of collected or generated data and into well-defined and repeatable workflows is becoming increasingly popular. Meanwhile, dedicated high performance computing storage environments are rapidly becoming more diverse, with both significant amounts of non-volatile memory storage and mature parallel file systems available. At the same time, computational science codes are being coupled to data analysis tools which are not filesystem-oriented. In this paper, we describe how the FAODEL data management service can expose different available data storage options and mediate among them in both application- and FAODEL-directed ways. These capabilities allow applications to exploit their knowledge of the different types of data they may exchange during a workflow execution, and also provide FAODEL with mechanisms to proactively tune data storage behavior when appropriate. We describe the implementation of these capabilities in FAODEL and how they are used by applications, and present preliminary performance results demonstrating the potential benefits of our approach.

More Details

Code-verification techniques for hypersonic reacting flows in thermochemical nonequilibrium

AIAA Aviation 2019 Forum

Freno, Brian A.; Carnes, Brian C.; Weirs, Vincent G.

The study of hypersonic flows and their underlying aerothermochemical reactions is particularly important in the design and analysis of vehicles exiting and reentering Earth’s atmosphere. Computational physics codes can be employed to simulate these phenomena; however, code verification of these codes is necessary to certify their credibility. To date, few approaches have been presented for verifying codes that simulate hypersonic flows, especially flows reacting in thermochemical nonequilibrium. In this paper, we present our code-verification techniques for hypersonic reacting flows in thermochemical nonequilibrium, as well as their deployment in the Sandia Parallel Aerodynamics and Reentry Code (SPARC).

More Details

DARMA-EMPIRE Integration and Performance Assessment – Interim Report

Lifflander, Jonathan; Bettencourt, Matthew T.; Slattengren, Nicole S.; Templet, Gary J.; Miller, Phil; Perrinel, Meriadeg; Rizzi, Francesco N.; Pebay, Philippe P.

We begin by presenting an overview of the general philosophy that is guiding the novel DARMA developments, followed by a brief reminder about the background of this project. We finally present the FY19 design requirements. As the Exascale era arises, DARMA is uniquely positioned at the forefront of asychronous many-task (AMT) research and development (R&D) to explore emerging programming model paradigms for next-generation HPC applications at Sandia, across NNSA labs, and beyond. The DARMA project explores how to fundamentally shift the expression(PM) and execution(EM)of massively concurrent HPC scientific algorithms to be more asynchronous, resilient to executional aberrations in heterogeneous/unpredictable environments, and data-dependency conscious—thereby enabling an intelligent, dynamic, and self-aware runtime to guide execution.

More Details

Determination of ballistic limit of skin-stringer panels using nonlinear, strain-rate dependent peridynamics

AIAA Scitech 2019 Forum

Cuenca, Fernando; Weckner, Olaf; Silling, Stewart A.; Rassaian, Mostafa

Significant testing is required to design and certify primary aircraft structures subject to High Energy Dynamic Impact (HEDI) events; current work under the NASA Advanced Composites Consortium (ACC) HEDI Project seeks to determine the state-of-the-art of dynamic fracture simulations for composite structures in these events. This paper discusses one of three Progressive Damage Analysis (PDA) methods selected for the second phase of the NASA ACC project: peridynamics, through its implementation in EMU. A brief discussion of peridynamic theory is provided, including the effects of nonlinearity and strain rate dependence of the matrix followed by a blind prediction and test-analysis correlation for ballistic impact testing performed for configured skin-stringer panels.

More Details

Evaluation of chlorine booster station placement for water security

Computer Aided Chemical Engineering

Seth, Arpan; Hackebeil, Gaberiel A.; Haxton, Terranna; Murray, Regan; Laird, Carl D.; Klise, Katherine A.

Drinking water utilities use booster stations to maintain chlorine residuals throughout water distribution systems. Booster stations could also be used as part of an emergency response plan to minimize health risks in the event of an unintentional or malicious contamination incident. The benefit of booster stations for emergency response depends on several factors, including the reaction between chlorine and an unknown contaminant species, the fate and transport of the contaminant in the water distribution system, and the time delay between detection and initiation of boosted levels of chlorine. This paper takes these aspects into account and proposes a mixed-integer linear program formulation for optimizing the placement of booster stations for emergency response. A case study is used to explore the ability of optimally placed booster stations to reduce the impact of contamination in water distribution systems.

More Details

CAD DEFEATURING USING MACHINE LEARNING

Proceedings of the 28th International Meshing Roundtable, IMR 2019

Owen, Steven J.; Shead, Timothy M.; Martin, Shawn

We describe new machine-learning-based methods to defeature CAD models for tetrahedral meshing. Using machine learning predictions of mesh quality for geometric features of a CAD model prior to meshing we can identify potential problem areas and improve meshing outcomes by presenting a prioritized list of suggested geometric operations to users. Our machine learning models are trained using a combination of geometric and topological features from the CAD model and local quality metrics for ground truth. We demonstrate a proof-of-concept implementation of the resulting work ow using Sandia's Cubit Geometry and Meshing Toolkit.

More Details

The upcoming storm: The implications of increasing core count on scalable system software

Advances in Parallel Computing

Dosanjh, Matthew D.; Grant, Ryan E.; Hjelm, Nathan; Levy, Scott L.; Schonbein, William W.

As clock speeds have stagnated, the number of cores in a node has been drastically increased to improve processor throughput. Most scalable system software was designed and developed for single-threaded environments. Multithreaded environments become increasingly prominent as application developers optimize their codes to leverage the full performance of the processor; however, these environments are incompatible with a number of assumptions that have driven scalable system software development. This paper will highlight a case study of this mismatch focusing on MPI message matching. MPI message matching has been designed and optimized for traditional serial execution. The reduced determinism in the order of MPI calls can significantly reduce the performance of MPI message matching, potentially overtaking time-per-iteration targets of many applications. Different proposed techniques attempt to address these issues and enable multithreaded MPI usage. These approaches highlight a number of tradeoffs that make adapting MPI message matching complex. This case study and its proposed solutions highlight a number of general concepts that need to be leveraged in the design of next generation scaleable system software.

More Details

Software for sparse tensor decomposition on emerging computing architectures

SIAM Journal on Scientific Computing

Phipps, Eric T.; Kolda, Tamara G.

In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to different architectures, and we explain how to do this for the matricized tensor times Khatri-Rao product (MTTKRP), which is the key kernel in canonical polyadic tensor decomposition. Our implementation leverages the Kokkos framework, which enables a single code to achieve high performance across multiple architectures that differ in how they approach fine-grained parallelism. We also introduce a new construct for portable thread-local arrays, which we call compile-time polymorphic arrays. Not only are the specifics of our approaches and implementation interesting for tuning tensor computations, but they also provide a roadmap for developing other portable high-performance codes. As a last step in optimizing performance, we modify the MTTKRP algorithm itself to do a permuted traversal of tensor nonzeros to reduce atomic-write contention. We test the performance of our implementation on 16- and 68-core Intel CPUs and the K80 and P100 NVIDIA GPUs, showing that we are competitive with state-of-the-art architecture-specific codes while having the advantage of being able to run on a variety of architectures.

More Details

Gate-defined quantum dots in Ge/SiGe quantum wells as a platform for spin qubits

ECS Transactions

Hardy, Will H.; Su, Y.H.; Chuang, Y.; Maurer, Leon M.; Brickson, Mitchell I.; Baczewski, Andrew D.; Li, J.Y.; Lu, Tzu-Ming L.; Luhman, Dwight R.

In the field of semiconductor quantum dot spin qubits, there is growing interest in leveraging the unique properties of hole-carrier systems and their intrinsically strong spin-orbit coupling to engineer novel qubits. Recent advances in semiconductor heterostructure growth have made available high quality, undoped Ge/SiGe quantum wells, consisting of a pure strained Ge layer flanked by Ge-rich SiGe layers above and below. These quantum wells feature heavy hole carriers and a cubic Rashba-type spin-orbit interaction. Here, we describe progress toward realizing spin qubits in this platform, including development of multi-metal-layer gated device architectures, device tuning protocols, and charge-sensing capabilities. Iterative improvement of a three-layer metal gate architecture has significantly enhanced device performance over that achieved using an earlier single-layer gate design. We discuss ongoing, simulation-informed work to fine-tune the device geometry, as well as efforts toward a single-spin qubit demonstration.

More Details
Results 2401–2600 of 9,998
Results 2401–2600 of 9,998