Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-pointmultiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning "unnecessary" weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.
A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expression (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities.In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naïve solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naïve solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.
Shor's groundbreaking quantum algorithm for integer factoring provides an exponential speedup over the best-known classical algorithms. In the 20 years since Shor's algorithm was conceived, only a handful of fundamental quantum algorithmic kernels, generally providing modest polynomial speedups over classical algorithms, have been invented. To better understand the potential advantage quantum resources provide over their classical counterparts, one may consider other resources than execution time of algorithms. Quantum Approximation Algorithms direct the power of quantum computing towards optimization problems where quantum resources provide higher-quality solutions instead of faster execution times. We provide a new rigorous analysis of the recent Quantum Approximate Optimization Algorithm, demonstrating that it provably outperforms the best known classical approximation algorithm for special hard cases of the fundamental Maximum Cut graph-partitioning problem. We also develop new types of classical approximation algorithms for finding near-optimal low-energy states of physical systems arising in condensed matter by extending seminal discrete optimization techniques. Our interdisciplinary work seeks to unearth new connections between discrete optimization and quantum information science.
High resolution simulation of viscous fingering can offer an accurate and detailed prediction for subsurface engineering processes involving fingering phenomena. The fully implicit discontinuous Galerkin (DG) method has been shown to be an accurate and stable method to model viscous fingering with high Peclet number and mobility ratio. In this paper, we present two techniques to speedup large scale simulations of this kind. The first technique relies on a simple p-adaptive scheme in which high order basis functions are employed only in elements near the finger fronts where the concentration has a sharp change. As a result, the number of degrees of freedom is significantly reduced and the simulation yields almost identical results to the more expensive simulation with uniform high order elements throughout the mesh. The second technique for speedup involves improving the solver efficiency. We present an algebraic multigrid (AMG) preconditioner which allows the DG matrix to leverage the robust AMG preconditioner designed for the continuous Galerkin (CG) finite element method. The resulting preconditioner works effectively for fixed order DG as well as p-adaptive DG problems. With the improvements provided by the p-adaptivity and AMG preconditioning, we can perform high resolution three-dimensional viscous fingering simulations required for miscible displacement with high Peclet number and mobility ratio in greater detail than before for well injection problems.
Gate-controllable spin-orbit coupling is often one requisite for spintronic devices. For practical spin field-effect transistors, another essential requirement is ballistic spin transport, where the spin precession length is shorter than the mean free path such that the gate-controlled spin precession is not randomized by disorder. In this letter, we report the observation of a gate-induced crossover from weak localization to weak anti-localization in the magneto-resistance of a high-mobility two-dimensional hole gas in a strained germanium quantum well. From the magneto-resistance, we extract the phase-coherence time, spin-orbit precession time, spin-orbit energy splitting, and cubic Rashba coefficient over a wide density range. The mobility and the mean free path increase with increasing hole density, while the spin precession length decreases due to increasingly stronger spin-orbit coupling. As the density becomes larger than ∼6 × 1011 cm-2, the spin precession length becomes shorter than the mean free path, and the system enters the ballistic spin transport regime. We also report here the numerical methods and code developed for calculating the magneto-resistance in the ballistic regime, where the commonly used HLN and ILP models for analyzing weak localization and anti-localization are not valid. These results pave the way toward silicon-compatible spintronic devices.
Here, the feasibility of Neumann series expansion of Maxwell’s equations in the electrostatic limit is investigated for potentially rapid and approximate subsurface imaging of geologic features proximal to metallic infrastructure in an oilfield environment. While generally useful for efficient modeling of mild conductivity perturbations in uncluttered settings, we raise the question of its suitability for situations, such as oilfield, where metallic artifacts are pervasive, and in some cases, in direct electrical contact with the conductivity perturbation on which the Neumann series is computed. Convergence of the Neumann series and its residual error are computed using the hierarchical finite element framework for a canonical oilfield model consisting of an “L” shaped, steel-cased well, energized by a steady state electrode, and penetrating a small set of mildly conducting fractures near the heel of the well. For a given node spacing h in the finite element mesh, we find that the Neumann series is ultimately convergent if the conductivity is small enough - a result consistent with previous presumptions on the necessity of small conductivity perturbations. However, we also demonstrate that the spectral radius of the Neumann series operator grows as ~ 1/h, thus suggesting that in the limit of the continuous problem h → 0, the Neumann series is intrinsically divergent for all conductivity perturbation, regardless of their smallness. The hierarchical finite element methodology itself is critically analyzed and shown to possess the h2 error convergence of traditional linear finite elements, thereby supporting the conclusion of an inescapably divergent Neumann series for this benchmark example. Application of the Neumann series to oilfield problems with metallic clutter should therefore be done with careful consideration to the coupling between infrastructure and geology. Here, the methods used here are demonstrably useful in such circumstances.
Accurate and efficient constitutive modeling remains a cornerstone issue for solid mechanics analysis. Over the years, the LAMÉ advanced material model library has grown to address this challenge by implementing models capable of describing material systems spanning soft polymers to stiff ceramics including both isotropic and anisotropic responses. Inelastic behaviors including (visco)plasticity, damage, and fracture have all incorporated for use in various analyses. This multitude of options and flexibility, however, comes at the cost of many capabilities, features, and responses and the ensuing complexity in the resulting implementation. Therefore, to enhance confidence and enable the utilization of the LAMÉ library in application, this effort seeks to document and verify the various models in the LAMÉ library. Specifically, the broader strategy, organization, and interface of the library itself is first presented. The physical theory, numerical implementation, and user guide for a large set of models is then discussed. Importantly, a number of verification tests are performed with each model to not only have confidence in the model itself but also highlight some important response characteristics and features that may be of interest to end-users. Finally, in looking ahead to the future, approaches to add material models to this library and further expand the capabilities are presented.