A hypothetical scenario is utilized to explore privacy and security considerations for intelligent systems, such as a Personal Assistant for Learning (PAL). Two categories of potential concerns are addressed: factors facilitated by user models, and factors facilitated by systems. Among the strategies presented for risk mitigation is a call for ongoing, iterative dialog among privacy, security, and personalization researchers during all stages of development, testing, and deployment.
International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Sridharan, Vilas; Debardeleben, Nathan; Blanchard, Sean; Ferreira, Kurt B.; Gurumurthi, Sudhanva; Shalf, John
Several recent publications have shown that hardware faults in the memory subsystem are commonplace. These faults are predicted to become more frequent in future systems that contain orders of magnitude more DRAM and SRAM than found in current memory subsystems. These memory subsystems will need to provide resilience techniques to tolerate these faults when deployed in high-performance computing systems and data centers containing tens of thousands of nodes. Therefore, it is critical to understand the efficacy of current hardware resilience techniques to determine whether they will be suitable for future systems. In this paper, we present a study of DRAM and SRAM faults and errors from the field. We use data from two leadership-class high-performance computer systems to analyze the reliability impact of hardware resilience schemes that are deployed in current systems. Our study has several key findings about the efficacy of many currently-deployed reliability techniques such as DRAM ECC, DDR address/command parity, and SRAM ECC and parity. We also perform a methodological study, and find that counting errors instead of faults, a common practice among researchers and data center operators, can lead to incorrect conclusions about system reliability. Finally, we use our data to project the needs of future large-scale systems. We find that SRAM faults are unlikely to pose a significantly larger reliability threat in the future, while DRAM faults will be a major concern and stronger DRAM resilience schemes will be needed to maintain acceptable failure rates similar to those found on today's systems.
An intuitive realization of a qubit is an electron charge at two well-defined positions of a double quantum dot. This qubit is simple and has the potential for high-speed operation because of its strong coupling to electric fields. However, charge noise also couples strongly to this qubit, resulting in rapid dephasing at all but one special operating point called the 'sweet spot'. In previous studies d.c. voltage pulses have been used to manipulate semiconductor charge qubits but did not achieve high-fidelity control, because d.c. gating requires excursions away from the sweet spot. Here, by using resonant a.c. microwave driving we achieve fast (greater than gigahertz) and universal single qubit rotations of a semiconductor charge qubit. The Z-axis rotations of the qubit are well protected at the sweet spot, and we demonstrate the same protection for rotations about arbitrary axes in the X-Y plane of the qubit Bloch sphere. We characterize the qubit operation using two tomographic approaches: standard process tomography and gate set tomography. Both methods consistently yield process fidelities greater than 86% with respect to a universal set of unitary single-qubit operations.
We present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1]. The SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis, thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities.
ASC Level 2 Milestone FY 2013 continuation. L2 revealed memory pressures from using in situ analysis; Developed tools to determine memory usage; Reduced memory footprint by more than 50%.
The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.
This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallel - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.
Caskurlu, Bugra; Mkrtchyan, Vahan; Parekh, Ojas D.; Subramani, K.
Graphs can be used to model risk management in various systems. Particularly, Caskurlu et al. in [7] have considered a system, which has threats, vulnerabilities and assets, and which essentially represents a tripartite graph. The goal in this model is to reduce the risk in the system below a predefined risk threshold level. One can either restricting the permissions of the users, or encapsulating the system assets. The pointed out two strategies correspond to deleting minimum number of elements corresponding to vulnerabilities and assets, such that the flow between threats and assets is reduced below the predefined threshold level. It can be shown that the main goal in this risk management system can be formulated as a Partial Vertex Cover problem on bipartite graphs. It is well-known that the Vertex Cover problem is in P on bipartite graphs, however; the computational complexity of the Partial Vertex Cover problem on bipartite graphs has remained open. In this paper, we establish that the Partial Vertex Cover problem is NP-hard on bipartite graphs, which was also recently independently demonstrated [N. Apollonio and B. Simeone, Discrete Appl. Math., 165 (2014), pp. 37–48; G. Joret and A. Vetta, preprint, arXiv:1211.4853v1 [cs.DS], 2012]. We then identify interesting special cases of bipartite graphs, for which the Partial Vertex Cover problem, the closely related Budgeted Maximum Coverage problem, and their weighted extensions can be solved in polynomial time. We also present an 8/9-approximation algorithm for the Budgeted Maximum Coverage problem in the class of bipartite graphs. We show that this matches and resolves the integrality gap of the natural LP relaxation of the problem and improves upon a recent 4/5-approximation.
This study began with a challenge from program area managers at Sandia National Laboratories to technical staff in the energy, climate, and infrastructure security areas: apply a systems-level perspective to existing science and technology program areas in order to determine technology gaps, identify new technical capabilities at Sandia that could be applied to these areas, and identify opportunities for innovation. The Arctic was selected as one of these areas for systems level analyses, and this report documents the results. In this study, an emphasis was placed on the arctic atmosphere since Sandia has been active in atmospheric research in the Arctic since 1997. This study begins with a discussion of the challenges and benefits of analyzing the Arctic as a system. It goes on to discuss current and future needs of the defense, scientific, energy, and intelligence communities for more comprehensive data products related to the Arctic; assess the current state of atmospheric measurement resources available for the Arctic; and explain how the capabilities at Sandia National Laboratories can be used to address the identified technological, data, and modeling needs of the defense, scientific, energy, and intelligence communities for Arctic support.
Current commercial software tools for transmission and generation investment planning have limited stochastic modeling capabilities. Because of this limitation, electric power utilities generally rely on scenario planning heuristics to identify potentially robust and cost effective investment plans for a broad range of system, economic, and policy conditions. Several research studies have shown that stochastic models perform significantly better than deterministic or heuristic approaches, in terms of overall costs. However, there is a lack of practical solution techniques to solve such models. In this paper we propose a scalable decomposition algorithm to solve stochastic transmission and generation planning problems, respectively considering discrete and continuous decision variables for transmission and generation investments. Given stochasticity restricted to loads and wind, solar, and hydro power output, we develop a simple scenario reduction framework based on a clustering algorithm, to yield a more tractable model. The resulting stochastic optimization model is decomposed on a scenario basis and solved using a variant of the Progressive Hedging (PH) algorithm. We perform numerical experiments using a 240-bus network representation of the Western Electricity Coordinating Council in the US. Although convergence of PH to an optimal solution is not guaranteed for mixed-integer linear optimization models, we find that it is possible to obtain solutions with acceptable optimality gaps for practical applications. Our numerical simulations are performed both on a commodity workstation and on a high-performance cluster. The results indicate that large-scale problems can be solved to a high degree of accuracy in at most 2 h of wall clock time.
Building the next-generation of extreme-scale distributed systems will require overcoming several challenges related to system resilience. As the number of processors in these systems grow, the failure rate increases proportionally. One of the most common sources of failure in large-scale systems is memory. In this paper, we propose a novel runtime for transparently exploiting memory content similarity to improve system resilience by reducing the rate at which memory errors lead to node failure. We evaluate the viability of this approach by examining memory snapshots collected from eight high-performance computing (HPC) applications and two important HPC operating systems. Based on the characteristics of the similarity uncovered, we conclude that our proposed approach shows promise for addressing system resilience in large-scale systems.
The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node interconnect and processing cores, and hide intra- and inter-node data movement. Our approach maintains the existing coding style commonly employed in computational science and engineering applications. Although we show improved performance on existing computers, up to 131,072 processor cores, the effectiveness of this approach on expected future architectures will require the continued evolution of capabilities throughout the codesign stack. Success then will not only result in decreased time to solution, but would also make better use of the hardware capabilities and reduce power and energy requirements, while fundamentally maintaining the current code configuration strategy.
In cybersecurity forensics and incident response, the story of what has happened is the most important artifact yet the one least supported by tools and techniques. Existing tools focus on gathering and manipulating low-level data to allow an analyst to investigate exactly what happened on a host system or a network. Higher-level analysis is usually left to whatever ad hoc tools and techniques an individual may have developed. We discuss visual representations of narrative in the context of cybersecurity incidents with an eye toward multi-scale illustration of actions and actors. We envision that this representation could smoothly encompass individual packets on a wire at the lowest level and nation-state-level actors at the highest. We present progress to date, discuss the impact of technical risk on this project and highlight opportunities for future work.
The physical foundations and domain of applicability of the Kayenta constitutive model are presented along with descriptions of the source code and user instructions. Kayenta, which is an outgrowth of the Sandia GeoModel, includes features and fitting functions appropriate to a broad class of materials including rocks, rock-like engineered materials (such as concretes and ceramics), and metals. Fundamentally, Kayenta is a computational framework for generalized plasticity models. As such, it includes a yield surface, but the term (3z(Byield(3y (Bis generalized to include any form of inelastic material response (including microcrack growth and pore collapse) that can result in non-recovered strain upon removal of loads on a material element. Kayenta supports optional anisotropic elasticity associated with joint sets, as well as optional deformation-induced anisotropy through kinematic hardening (in which the initially isotropic yield surface is permitted to translate in deviatoric stress space to model Bauschinger effects). The governing equations are otherwise isotropic. Because Kayenta is a unification and generalization of simpler models, it can be run using as few as 2 parameters (for linear elasticity) to as many as 40 material and control parameters in the exceptionally rare case when all features are used. For high-strain-rate applications, Kayenta supports rate dependence through an overstress model. Isotropic damage is modeled through loss of stiffness and strength.
This report evaluates several interpolants implemented in the Digital Image Correlation Engine (DICe), an image correlation software package developed by Sandia. By interpolants we refer to the basis functions used to represent discrete pixel intensity data as a continuous signal. Interpolation is used to determine intensity values in an image at non - pixel locations. It is also used, in some cases, to evaluate the x and y gradients of the image intensities. Intensity gradients subsequently guide the optimization process. The goal of this report is to inform analysts as to the characteristics of each interpolant and provide guidance towards the best interpolant for a given dataset. This work also serves as an initial verification of each of the interpolants implemented.
Titanium and the titanium alloy Ti64 (6% aluminum, 4% vanadium and the balance ti- tanium) are materials used in many technologically important applications. To be able to computationally investigate and design these applications, accurate Equations of State (EOS) are needed and in many cases also additional constitutive relations. This report describes what data is available for constructing EOS for these two materials, and also describes some references giving data for stress-strain constitutive models. We also give some suggestions for projects to achieve improved EOS and constitutive models. In an appendix, we present a study of the 'cloud formation' issue observed in the ALEGRA code. This issue was one of the motivating factors for this literature search of available data for constructing improved EOS for Ti and Ti64. However, the study shows that the cloud formation issue is only marginally connected to the quality of the EOS, and, in fact, is a physical behavior of the system in question. We give some suggestions for settings in, and improvements of, the ALEGRA code to address this computational di culty.
We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integrates processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.
Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.
Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time? We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length 2, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length 2 in which the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm. We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the l4/3-norm of the degree sequence. Norms of the degree sequence are a measure of the heaviness of the tail, and it is precisely this feature that allows non trivial speedups of simple triangle enumeration algorithms. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent α ≥ 7/3, and non trivial speedup whenever α ∈ (2, 3).
Computational science and engineering application programs are typically large, complex, and dynamic, and are often constrained by distribution limitations. As a means of making tractable rapid explorations of scientific and engineering application programs in the context of new, emerging, and future computing architectures, a suite of "miniapps" has been created to serve as proxies for full scale applications. Each miniapp is designed to represent a key performance characteristic that does or is expected to significantly impact the runtime performance of an application program. In this paper we introduce a methodology for assessing the ability of these miniapps to effectively represent these performance issues. We applied this methodology to three miniapps, examining the linkage between them and an application they are intended to represent. Herein we evaluate the fidelity of that linkage. This work represents the initial steps required to begin to answer the question, "Under what conditions does a miniapp represent a key performance characteristic in a full app?"
The computer-aided design (CAD) applications that are fundamental to the electronic design automation industry need to harness the available hardware resources to be able to perform full-chip simulation for modern technology nodes (45nm and below). We will present a hybrid (MPI+threads) approach for parallel transistor-level transient circuit simulation that achieves scalable performance for some challenging large-scale integrated circuits. This approach focuses on the computationally expensive part of the simulator: the linear system solve. Hybrid versions of two iterative linear solver strategies are presented, one takes advantage of block triangular form structure while the other uses a Schur complement technique. Results indicate up to a 27x improvement in total simulation time on 256 cores.
This work proposes a model-reduction methodology that preserves Lagrangian structure and achieves computational efficiency in the presence of high-order nonlinearities and arbitrary parameter dependence. As such, the resulting reduced-order model retains key properties such as energy conservation and symplectic time-evolution maps. We focus on parameterized simple mechanical systems subjected to Rayleigh damping and external forces, and consider an application to nonlinear structural dynamics. To preserve structure, the method first approximates the system's "Lagrangian ingredients"-the Riemannian metric, the potential-energy function, the dissipation function, and the external force-and subsequently derives reduced-order equations of motion by applying the (forced) Euler-Lagrange equation with these quantities. From the algebraic perspective, key contributions include two efficient techniques for approximating parameterized reduced matrices while preserving symmetry and positive definiteness: matrix gappy proper orthogonal decomposition and reduced-basis sparsification. Results for a parameterized truss-structure problem demonstrate the practical importance of preserving Lagrangian structure and illustrate the proposed method's merits: it reduces computation time while maintaining high accuracy and stability, in contrast to existing nonlinear model-reduction techniques that do not preserve structure.
The mathematically correct specification of a fractional differential equation on a bounded domain requires specification of appropriate boundary conditions, or their fractional analogue. This paper discusses the application of nonlocal diffusion theory to specify well-posed fractional diffusion equations on bounded domains.
In k-hypergraph matching, we are given a collection of sets of size at most k, each with an associated weight, and we seek a maximumweight subcollection whose sets are pairwise disjoint. More generally, in k-hypergraph b-matching, instead of disjointness we require that every element appears in at most b sets of the subcollection. Our main result is a linear-programming based (k - 1 + 1/k)-approximation algorithm for k-hypergraph b-matching. This settles the integrality gap when k is one more than a prime power, since it matches a previously-known lower bound. When the hypergraph is bipartite, we are able to improve the approximation ratio to k - 1, which is also best possible relative to the natural LP. These results are obtained using a more careful application of the iterated packing method. Using the bipartite algorithmic integrality gap upper bound, we show that for the family of combinatorial auctions in which anyone can win at most t items, there is a truthful-in-expectation polynomial-time auction that t-approximately maximizes social welfare. We also show that our results directly imply new approximations for a generalization of the recently introduced bounded-color matching problem. We also consider the generalization of b-matching to demand matching, where edges have nonuniform demand values. The best known approximation algorithm for this problem has ratio 2k on k-hypergraphs. We give a new algorithm, based on local ratio, that obtains the same approximation ratio in a much simpler way.
The following work presents an adjoint-based methodology for solving inverse problems in heterogeneous and fractured media using state-based peridynamics. We show that the inner product involving the peridynamic operators is self-adjoint. The proposed method is illustrated for several numerical examples with constant and spatially varying material parameters as well as in the context of fractures. We also present a framework for obtaining material parameters by integrating digital image correlation (DIC) with inverse analysis. This framework is demonstrated by evaluating the bulk and shear moduli for a sample of nuclear graphite using digital photographs taken during the experiment. The resulting measured values correspond well with other results reported in the literature. Lastly, we show that this framework can be used to determine the load state given observed measurements of a crack opening. This type of analysis has many applications in characterizing subsurface stress-state conditions given fracture patterns in cores of geologic material.
A notion of material homogeneity is proposed for peridynamic bodies with variable horizon but constant bulk properties. A relation is derived that scales the force state according to the position-dependent horizon while keeping the bulk properties unchanged. Using this scaling relation, if the horizon depends on position, artifacts called ghost forces may arise in a body under a homogeneous deformation. These artifacts depend on the second derivative of the horizon and can be reduced by employing a modified equilibrium equation using a new quantity called the partial stress. Bodies with piecewise constant horizon can be modeled without ghost forces by using a simpler technique called a splice. As a limiting case of zero horizon, both the partial stress and splice techniques can be used to achieve local-nonlocal coupling. Computational examples, including dynamic fracture in a one-dimensional model with local- nonlocal coupling, illustrate the methods.
In this paper, a series of algorithms are proposed to address the problems in the NASA Langley Research Center Multidisciplinary Uncertainty Quantification Challenge. A Bayesian approach is employed to characterize and calibrate the epistemic parameters based on the available data, whereas a variance-based global sensitivity analysis is used to rank the epistemic and aleatory model parameters. A nested sampling of the aleatory-epistemic space is proposed to propagate uncertainties from model parameters to output quantities of interest.
We develop a computationally less expensive alternative to the direct solution of a large sparse symmetric positive definite system arising from the numerical solution of elliptic partial differential equation models. Our method, substituted factorization, replaces the computationally expensive factorization of certain dense submatrices that arise in the course of direct solution with sparse Cholesky factorization with one or more solutions of triangular systems using substitution. These substitutions fit into the tree-structure commonly used by parallel sparse Cholesky, and reduce the initial factorization cost at the expense of a slight increase cost in solving for a right-hand side vector. Our analysis shows that substituted factorization reduces the number of floating-point operations for the model k × k 5-point finite-difference problem by 10% and empirical tests show execution time reduction on average of 24.4%. On a test suite of three-dimensional problems we observe execution time reduction as high as 51.7% and 43.1% on average.
This paper describes a method for incorporating a diffusion field modeling oxygen usage and dispersion in a multi-scale model of Mycobacterium tuberculosis (Mtb) infection mediated granuloma formation. We implemented this method over a floating-point field to model oxygen dynamics in host tissue during chronic phase response and Mtb persistence. The method avoids the requirement of satisfying the Courant-Friedrichs-Lewy (CFL) condition, which is necessary in implementing the explicit version of the finite-difference method, but imposes an impractical bound on the time step. Instead, diffusion is modeled by a matrix-based, steady state approximate solution to the diffusion equation. Moreover, presented in figure 1 is the evolution of the diffusion profiles of a containment granuloma over time.
Quantum tomography is used to characterize quantum operations implemented in quantum information processing (QIP) hardware. Traditionally, state tomography has been used to characterize the quantum state prepared in an initialization procedure, while quantum process tomography is used to characterize dynamical operations on a QIP system. As such, tomography is critical to the development of QIP hardware (since it is necessary both for debugging and validating as-built devices, and its results are used to influence the next generation of devices). But tomography suffers from several critical drawbacks. In this report, we present new research that resolves several of these flaws. We describe a new form of tomography called gate set tomography (GST), which unifies state and process tomography, avoids prior methods critical reliance on precalibrated operations that are not generally available, and can achieve unprecedented accuracies. We report on theory and experimental development of adaptive tomography protocols that achieve far higher fidelity in state reconstruction than non-adaptive methods. Finally, we present a new theoretical and experimental analysis of process tomography on multispin systems, and demonstrate how to more effectively detect and characterize quantum noise using carefully tailored ensembles of input states.
Aleph models continuum electrostatic and steady and transient thermal fields using a finite-element method. Much work has gone into expanding the core solver capability to support enriched modeling consisting of multiple interacting fields, special boundary conditions and two-way interfacial coupling with particles modeled using Aleph's complementary particle-in-cell capability. This report provides quantitative evidence for correct implementation of Aleph's field solver via order- of-convergence assessments on a collection of problems of increasing complexity. It is intended to provide Aleph with a pedigree and to establish a basis for confidence in results for more challenging problems important to Sandia's mission that Aleph was specifically designed to address.
Aleph is an electrostatic particle-in-cell code which uses the finite element method to solve for the electric potential and field based on external potentials and discrete charged particles. The field solver in Aleph was verified for two problems and matched the analytic theory for finite elements. The first problem showed the mesh-refinement convergence for a nonlinear field with no particles within the domain. This matched the theoretical convergence rates of second order for the potential field and first order for the electric field. Then the solution for a single particle in an infinite domain was compared to the analytic solution. This also matched the theory of first order convergence in both the potential and electric fields for both problems over a refinement factor of 16. These solutions give confidence that the field solver and charge weighting schemes are implemented correctly. This page intentionally left blank.
Recent cyber security events have demonstrated the need for algorithms that adapt to the rapidly evolving threat landscape of complex network systems. In particular, human analysts often fail to identify data exfiltration when it is encrypted or disguised as innocuous data. Signature-based approaches for identifying data types are easily fooled and analysts can only investigate a small fraction of network events. However, neural networks can learn to identify subtle patterns in a suitably chosen input space. To this end, we have developed a signal processing approach for classifying data files which readily adapts to new data formats. We evaluate the performance for three input spaces consisting of the power spectral density, byte probability distribution and sliding-window entropy of the byte sequence in a file. By combining all three, we trained a deep neural network to discriminate amongst nine common data types found on the Internet with 97.4% accuracy.
Numerous domains, ranging from medical diagnostics to intelligence analysis, involve visual search tasks in which people must find and identify specific items within large sets of imagery. These tasks rely heavily on human judgment, making fully automated systems infeasible in many cases. Researchers have investigated methods for combining human judgment with computational processing to increase the speed at which humans can triage large image sets. One such method is rapid serial visual presentation (RSVP), in which images are presented in rapid succession to a human viewer. While viewing the images and looking for targets of interest, the participant’s brain activity is recorded using electroencephalography (EEG). The EEG signals can be time-locked to the presentation of each image, producing event-related potentials (ERPs) that provide information about the brain’s response to those stimuli. The participants’ judgments about whether or not each set of images contained a target and the ERPs elicited by target and non-target images are used to identify subsets of images that merit close expert scrutiny [1]. Although the RSVP/EEG paradigm holds promise for helping professional visual searchers to triage imagery rapidly, it may be limited by the nature of the target items. Targets that do not vary a great deal in appearance are likely to elicit useable ERPs, but more variable targets may not. In the present study, we sought to extend the RSVP/EEG paradigm to the domain of aviation security screening, and in doing so to explore the limitations of the technique for different types of targets. Professional Transportation Security Officers (TSOs) viewed bag X-rays that were presented using an RSVP paradigm. The TSOs viewed bursts of images containing 50 segments of bag X-rays that were presented for 100 ms each. Following each burst of images, the TSOs indicated whether or not they thought there was a threat item in any of the images in that set. EEG was recorded during each burst of images and ERPs were calculated by time-locking the EEG signal to the presentation of images containing threats and matched images that were identical except for the presence of the threat item. Half of the threat items had a prototypical appearance and half did not. We found that the bag images containing threat items with a prototypical appearance reliably elicited a P300 ERP component, while those without a prototypical appearance did not. These findings have implications for the application of the RSVP/EEG technique to real-world visual search domains.
Visual search data describe people’s performance on the common perceptual problem of identifying target objects in a complex scene. Technological advances in areas such as eye tracking now provide researchers with a wealth of data not previously available. The goal of this work is to support researchers in analyzing this complex and multimodal data and in developing new insights into visual search techniques. We discuss several methods drawn from the statistics and machine learning literature for integrating visual search data derived from multiple sources and performing exploratory data analysis. We ground our discussion in a specific task performed by officers at the Transportation Security Administration and consider the applicability, likely issues, and possible adaptations of several candidate analysis methods.
Modern high performance computers connect hundreds of thousands of endpoints and employ thousands of switches. This allows for a great deal of freedom in the design of the network topology. At the same time, due to the sheer numbers and complexity involved, it becomes more challenging to easily distinguish between promising and improper designs. With ever increasing line rates and advances in optical interconnects, there is a need for renewed design methodologies that comprehensively capture the requirements and expose tradeoffs expeditiously in this complex design space. We introduce a systematic approach, based on Generalized Moore Graphs, allowing one to quickly gauge the ideal level of connectivity required for a given number of end-points and traffic hypothesis, and to collect insight on the role of the switch radix in the topology cost. Based on this approach, we present a methodology for the identification of Pareto-optimal topologies. We apply our method to a practical case with 25,000 nodes and present the results.
Performance at Transportation Security Administration (TSA) airport checkpoints must be consistently high to skillfully mitigate national security threats and incidents. To accomplish this, Transportation Security Officers (TSOs) must exceptionally perform in threat detection, interaction with passengers, and efficiency. It is difficult to measure the human attributes that contribute to high performing TSOs because cognitive ability such as memory, personality, and competence are inherently latent variables. Cognitive scientists at Sandia National Laboratories have developed a methodology that links TSOs’ cognitive ability to their performance. This paper discusses how the methodology was developed using a strict quantitative process, the strengths and weaknesses, as well as how this could be generalized to other non-TSA contexts. The scope of this project is to identify attributes that distinguished high and low TSO performance for the duties at the checkpoint that involved direct interaction with people going through the checkpoint.
In this paper we present a localized polynomial chaos expansion for partial differential equations (PDE) with random inputs. In particular, we focus on time independent linear stochastic problems with high dimensional random inputs, where the traditional polynomial chaos methods, and most of the existing methods, incur prohibitively high simulation cost. The local polynomial chaos method employs a domain decomposition technique to approximate the stochastic solution locally. In each subdomain, a subdomain problem is solved independently and, more importantly, in a much lower dimensional random space. In a postprocesing stage, accurate samples of the original stochastic problems are obtained from the samples of the local solutions by enforcing the correct stochastic structure of the random inputs and the coupling conditions at the interfaces of the subdomains. Overall, the method is able to solve stochastic PDEs in very large dimensions by solving a collection of low dimensional local problems and can be highly efficient. In this paper we present the general mathematical framework of the methodology and use numerical examples to demonstrate the properties of the method.
The overall conduct of verification, validation and uncertainty quantification (VVUQ) is discussed through the construction of a workflow relevant to computational modeling including the turbulence problem in the coarse grained simulation (CGS) approach. The workflow contained herein is defined at a high level and constitutes an overview of the activity. Nonetheless, the workflow represents an essential activity in predictive simulation and modeling. VVUQ is complex and necessarily hierarchical in nature. The particular characteristics of VVUQ elements depend upon where the VVUQ activity takes place in the overall hierarchy of physics and models. In this chapter, we focus on the differences between and interplay among validation, calibration and UQ, as well as the difference between UQ and sensitivity analysis. The discussion in this chapter is at a relatively high level and attempts to explain the key issues associated with the overall conduct of VVUQ. The intention is that computational physicists can refer to this chapter for guidance regarding how VVUQ analyses fit into their efforts toward conducting predictive calculations.
The impact of automation on human performance has been studied by human factors researchers for over 35 years. One unresolved facet of this research is measurement of the level of automation across and within engineered systems. Repeatable methods of observing, measuring and documenting the level of automation are critical to the creation and validation of generalized theories of automation's impact on the reliability and resilience of human-in-the-loop systems. Numerous qualitative scales for measuring automation have been proposed. However these methods require subjective assessments based on the researcher's knowledge and experience, or through expert knowledge elicitation involving highly experienced individuals from each work domain. More recently, quantitative scales have been proposed, but have yet to be widely adopted, likely due to the difficulty associated with obtaining a sufficient number of empirical measurements from each system component. Our research suggests the need for a quantitative method that enables rapid measurement of a system's level of automation, is applicable across domains, and can be used by human factors practitioners in field studies or by system engineers as part of their technical planning processes. In this paper we present our research methodology and early research results from studies of electricity grid distribution control rooms. Using a system analysis approach based on quantitative measures of level of automation, we provide an illustrative analysis of select grid modernization efforts. This measure of the level of automation can be displayed as either a static, historical view of the system's automation dynamics (the dynamic interplay between human and automation required to maintain system performance) or it can be incorporated into real-time visualization systems already present in control rooms.
We describe the computational simulations and damage assessments that we provided in support of a tabletop exercise (TTX) at the request of NASA's Near-Earth Objects Program Office. The overall purpose of the exercise was to assess leadership reactions, information requirements, and emergency management responses to a hypothetical asteroid impact with Earth. The scripted exercise consisted of discovery, tracking, and characterization of a hypothetical asteroid; inclusive of mission planning, mitigation, response, impact to population, infrastructure and GDP, and explicit quantification of uncertainty. Participants at the meeting included representatives of NASA, Department of Defense, Department of State, Department of Homeland Security/Federal Emergency Management Agency (FEMA), and the White House. The exercise took place at FEMA headquarters. Sandia's role was to assist the Jet Propulsion Laboratory (JPL) in developing the impact scenario, to predict the physical effects of the impact, and to forecast the infrastructure and economic losses. We ran simulations using Sandia's CTH hydrocode to estimate physical effects on the ground, and to produce contour maps indicating damage assessments that could be used as input for the infrastructure and economic models. We used the FASTMap tool to provide estimates of infrastructure damage over the affected area, and the REAcct tool to estimate the potential economic severity expressed as changes to GDP (by nation, region, or sector) due to damage and short-term business interruptions.
We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditioner results in faster linear solve times but the ILU preconditioner exhibits better scalability. A weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. Here, we show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.
Despite technological advances making computing devices faster, smaller, and more prevalent in today's age, data generation and collection has outpaced data processing capabilities. Simply having more compute platforms does not provide a means of addressing challenging problems in the big data era. Rather, alternative processing approaches are needed and the application of machine learning to big data is hugely important. The MapReduce programming paradigm is an alternative to conventional supercomputing approaches, and requires less stringent data passing constrained problem decompositions. Rather, MapReduce relies upon defining a means of partitioning the desired problem so that subsets may be computed independently and recom-bined to yield the net desired result. However, not all machine learning algorithms are amenable to such an approach. Game-theoretic algorithms are often innately distributed, consisting of local interactions between players without requiring a central authority and are iterative by nature rather than requiring extensive retraining. Effectively, a game-theoretic approach to machine learning is well suited for the MapReduce paradigm and provides a novel, alternative new perspective to addressing the big data problem. In this paper we present a variant of our Support Vector Machine (SVM) Game classifier which may be used in a distributed manner, and show an illustrative example of applying this algorithm.
Memory failures in future extreme scale applications are a significant concern in the high-performance computing community and have attracted much research attention. We contend in this paper that using application checkpoint data to detect memory failures has potential benefits and is preferable to examining application memory. To support this contention, we describe the application of machine learning techniques to evaluate the veracity of checkpoint data. Our preliminary results indicate that supervised decision tree machine learning approaches can effectively detect corruption in restart files, suggesting that future extreme-scale applications and systems may benefit from incorporating such approaches in order to cope with memory failues.
Ultrasonic wave propagation decreases as a material is heated. Two factors that can characterize material properties are changes in wave speed and energy loss from interactions within the media. Relatively small variations in velocity and attenuation can detect significant differences in microstructures. This paper discusses an overview of experimental techniques that document the changes within a highly attenuative material as it is either being heated or cooled from 25°C to 90°C. The experimental set-up utilizes ultrasonic probes in a through-transmission configuration. The waveforms are recorded and analyzed during thermal experiments. To complement the ultrasonic data, a Discontinuous-Galerkin Model (DGM) was also created which uses unstructured meshes and documents how waves travel in these anisotropic media. This numerical method solves particle motion travel using partial differential equations and outputs a wave trace per unit time. Both experimental and analytical data are compared and presented.