Optimization Verification and Engineered Reliability of Quantum Computers (OVER-QC)
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We begin by presenting an overview of the general philosophy that is guiding the novel DARMA developments, followed by a brief reminder about the background of this project. We finally present the FY19 design requirements. As the Exascale era arises, DARMA is uniquely positioned at the forefront of asychronous many-task (AMT) research and development (R&D) to explore emerging programming model paradigms for next-generation HPC applications at Sandia, across NNSA labs, and beyond. The DARMA project explores how to fundamentally shift the expression(PM) and execution(EM)of massively concurrent HPC scientific algorithms to be more asynchronous, resilient to executional aberrations in heterogeneous/unpredictable environments, and data-dependency conscious—thereby enabling an intelligent, dynamic, and self-aware runtime to guide execution.
Abstract not provided.
Abstract not provided.
AIChE Journal
While peak shaving is commonly used to reduce power costs, chemical process facilities that can reduce power consumption on demand during emergencies (e.g., extreme weather events) bring additional value through improved resilience. For process facilities to effectively negotiate demand response (DR) contracts and make investment decisions regarding flexibility, they need to quantify their additional value to the grid. We present a grid–centric mixed–integer stochastic programming framework to determine the value of DR for improving grid resilience in place of capital investments that can be cost prohibitive for system operators. We formulate problems using both a linear approximation and a nonlinear alternating current power flow model. Our numerical results with both models demonstrate that DR can be used to reduce the capital investment necessary for resilience, increasing the value that chemical process facilities bring through DR. Furthermore, the linearized model often underestimates the amount of DR needed in our case studies.
Physical Review A
Photodetection plays a key role in basic science and technology, with exquisite performance having been achieved down to the single-photon level. Further improvements in photodetectors would open new possibilities across a broad range of scientific disciplines and enable new types of applications. However, it is still unclear what is possible in terms of ultimate performance and what properties are needed for a photodetector to achieve such performance. Here, we present a general modeling framework for photodetectors whereby the photon field, the absorption process, and the amplification process are all treated as one coupled quantum system. The formalism naturally handles field states with single or multiple photons as well as a variety of detector configurations and includes a mathematical definition of ideal photodetector performance. The framework reveals how specific photodetector architectures introduce limitations and tradeoffs for various performance metrics, providing guidance for optimization and design.
Proceedings - IEEE 14th International Conference on eScience, e-Science 2018
Large-scale collaborative scientific software projects require more knowledge than any one person typically possesses. This makes coordination and communication of knowledge and expertise a key factor in creating and safeguarding software quality, without which we cannot have sustainable software. However, as researchers attempt to scale up the production of software, they are confronted by problems of awareness and understanding. This presents an opportunity to develop better practices and tools that directly address these challenges. To that end, we conducted a case study of developers of the Trilinos project. We surveyed the software development challenges addressed and show how those problems are connected with what they know and how they communicate. Based on these data, we provide a series of practicable recommendations, and outline a path forward for future research.
Proceedings - International Carnahan Conference on Security Technology
Physical security systems (PSS) and humans are inescapably tied in the current physical security paradigm. Yet, physical security system evaluations often end at the console that displays information to the human. That is, these evaluations do not account for human-in-The-loop factors that can greatly impact performance of the security system, even though methods for doing so are well-established. This paper highlights two examples of methods for evaluating the human component of the current physical security system. One of these methods is qualitative, focusing on the information the human needs to adequately monitor alarms on a physical site. The other of these methods objectively measures the impact of false alarm rates on threat detection. These types of human-centric evaluations are often treated as unnecessary or not cost effective under the belief that human cognition is straightforward and errors can be either trained away or mitigated with technology. These assumptions are not always correct, are often surprising, and can often only be identified with objective assessments of human-system performance. Thus, taking the time to perform human element evaluations can identify unintuitive human-system weaknesses and can provide significant cost savings in the form of mitigating vulnerabilities and reducing costly system patches or retrofits to correct an issue after the system has been deployed.
The ECP/VTK-m project is providing the core capabilities to perform scientific visualization on Exascale architectures. The ECP/VTK-m project fills the critical feature gap of performing visualization and analysis on processors like graphics-based processors and many integrated core. The results of this project will be delivered in tools like ParaView, Vislt, and Ascent as well as in stand-alone form. Moreover, these projects are depending on this ECP effort to be able to make effective use of ECP architectures.
Journal of Computational and Applied Mathematics
This work explores the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. This study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of the original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.
IEEE Access
Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. Finally, in addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.
Proceedings of the Annual International Symposium on Microarchitecture, MICRO
With Non-Volatile Memories (NVMs) beginning to enter the mainstream computing market, it is time to consider how to secure NVM-equipped computing systems. Recent Meltdown and Spectre attacks are evidence that security must be intrinsic to computing systems and not added as an afterthought. Processor vendors are taking the first steps and are beginning to build security primitives into commodity processors. One security primitive that is associated with the use of emerging NVMs is memory encryption. Memory encryption, while necessary, is very challenging when used with NVMs because it exacerbates the write endurance problem. Secure architectures use cryptographic metadata that must be persisted and restored to allow secure recovery of data in the event of power-loss. Specifically, encryption counters must be persistent to enable secure and functional recovery of an interrupted system. However, the cost of ensuring and maintaining persistence for these counters can be significant. In this paper, we propose a novel scheme to maintain encryption counters without the need for frequent updates. Our new memory controller design, Osiris, repurposes memory Error-Correction Codes (ECCs) to enable fast restoration and recovery of encryption counters. To evaluate our design, we use Gem5 to run eight memory-intensive workloads selected from SPEC2006 and U.S. Department of Energy (DoE) proxy applications. Compared to a write-Through counter-cache scheme, on average, Osiris can reduce 48.7% of the memory writes (increase lifetime by 1.95x), and reduce the performance overhead from 51.5% (for write-Through) to only 5.8%. Furthermore, without the need for backup battery or extra power-supply hold-up time, Osiris performs better than a battery-backed write-back (5.8% vs. 6.6% overhead) and has less write-Traffic (2.6% vs. 5.9% overhead).
Cyber-Physical Systems Security
Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-pointmultiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning "unnecessary" weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.
Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017
A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expression (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities.In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naïve solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naïve solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.
Journal of Computational Physics
Predictive analysis of complex computational models, such as uncertainty quantification (UQ), must often rely on using an existing database of simulation runs. In this paper we consider the task of performing low-multilinear-rank regression on such a database. Specifically we develop and analyze an efficient gradient computation that enables gradient-based optimization procedures, including stochastic gradient descent and quasi-Newton methods, for learning the parameters of a functional tensor-train (FT). We compare our algorithms with 22 other nonparametric and parametric regression methods on 10 real-world data sets and show that for many physical systems, exploiting low-rank structure facilitates efficient construction of surrogate models. We use a number of synthetic functions to build insight into behavior of our algorithms, including the rank adaptation and group-sparsity regularization procedures that we developed to reduce overfitting. Finally we conclude the paper by building a surrogate of a physical model of a propulsion plant on a naval vessel.
Journal of Computational Physics
High resolution simulation of viscous fingering can offer an accurate and detailed prediction for subsurface engineering processes involving fingering phenomena. The fully implicit discontinuous Galerkin (DG) method has been shown to be an accurate and stable method to model viscous fingering with high Peclet number and mobility ratio. In this paper, we present two techniques to speedup large scale simulations of this kind. The first technique relies on a simple p-adaptive scheme in which high order basis functions are employed only in elements near the finger fronts where the concentration has a sharp change. As a result, the number of degrees of freedom is significantly reduced and the simulation yields almost identical results to the more expensive simulation with uniform high order elements throughout the mesh. The second technique for speedup involves improving the solver efficiency. We present an algebraic multigrid (AMG) preconditioner which allows the DG matrix to leverage the robust AMG preconditioner designed for the continuous Galerkin (CG) finite element method. The resulting preconditioner works effectively for fixed order DG as well as p-adaptive DG problems. With the improvements provided by the p-adaptivity and AMG preconditioning, we can perform high resolution three-dimensional viscous fingering simulations required for miscible displacement with high Peclet number and mobility ratio in greater detail than before for well injection problems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Shor's groundbreaking quantum algorithm for integer factoring provides an exponential speedup over the best-known classical algorithms. In the 20 years since Shor's algorithm was conceived, only a handful of fundamental quantum algorithmic kernels, generally providing modest polynomial speedups over classical algorithms, have been invented. To better understand the potential advantage quantum resources provide over their classical counterparts, one may consider other resources than execution time of algorithms. Quantum Approximation Algorithms direct the power of quantum computing towards optimization problems where quantum resources provide higher-quality solutions instead of faster execution times. We provide a new rigorous analysis of the recent Quantum Approximate Optimization Algorithm, demonstrating that it provably outperforms the best known classical approximation algorithm for special hard cases of the fundamental Maximum Cut graph-partitioning problem. We also develop new types of classical approximation algorithms for finding near-optimal low-energy states of physical systems arising in condensed matter by extending seminal discrete optimization techniques. Our interdisciplinary work seeks to unearth new connections between discrete optimization and quantum information science.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Nanoscale
Gate-controllable spin-orbit coupling is often one requisite for spintronic devices. For practical spin field-effect transistors, another essential requirement is ballistic spin transport, where the spin precession length is shorter than the mean free path such that the gate-controlled spin precession is not randomized by disorder. In this letter, we report the observation of a gate-induced crossover from weak localization to weak anti-localization in the magneto-resistance of a high-mobility two-dimensional hole gas in a strained germanium quantum well. From the magneto-resistance, we extract the phase-coherence time, spin-orbit precession time, spin-orbit energy splitting, and cubic Rashba coefficient over a wide density range. The mobility and the mean free path increase with increasing hole density, while the spin precession length decreases due to increasingly stronger spin-orbit coupling. As the density becomes larger than ∼6 × 1011 cm-2, the spin precession length becomes shorter than the mean free path, and the system enters the ballistic spin transport regime. We also report here the numerical methods and code developed for calculating the magneto-resistance in the ballistic regime, where the commonly used HLN and ILP models for analyzing weak localization and anti-localization are not valid. These results pave the way toward silicon-compatible spintronic devices.