Multi-scale binary permeability field estimation from static and dynamic data is completed using Markov Chain Monte Carlo (MCMC) sampling. The binary permeability field is defined as high permeability inclusions within a lower permeability matrix. Static data are obtained as measurements of permeability with support consistent to the coarse scale discretization. Dynamic data are advective travel times along streamlines calculated through a fine-scale field and averaged for each observation point at the coarse scale. Parameters estimated at the coarse scale (30 x 20 grid) are the spatially varying proportion of the high permeability phase and the inclusion length and aspect ratio of the high permeability inclusions. From the non-parametric, posterior distributions estimated for these parameters, a recently developed sub-grid algorithm is employed to create an ensemble of realizations representing the fine-scale (3000 x 2000), binary permeability field. Each fine-scale ensemble member is instantiated by convolution of an uncorrelated multiGaussian random field with a Gaussian kernel defined by the estimated inclusion length and aspect ratio. Since the multiGaussian random field is itself a realization of a stochastic process, the procedure for generating fine-scale binary permeability field realizations is also stochastic. Two different methods are hypothesized to perform posterior predictive tests. Different mechanisms for combining multi Gaussian random fields with kernels defined from the MCMC sampling are examined. Posterior predictive accuracy of the estimated parameters is assessed against a simulated ground truth for predictions at both the coarse scale (effective permeabilities) and at the fine scale (advective travel time distributions). The two techniques for conducting posterior predictive tests are compared by their ability to recover the static and dynamic data. The skill of the inference and the method for generating fine-scale binary permeability fields are evaluated through flow calculations on the resulting fields using fine-scale realizations and comparing them against results obtained with the ground truth fine-scale and coarse-scale permeability fields.
While advances in manufacturing enable the fabrication of integrated circuits containing tens-to-hundreds of millions of devices, the time-sensitive modeling and simulation necessary to design these circuits poses a significant computational challenge. This is especially true for mixed-signal integrated circuits where detailed performance analyses are necessary for the individual analog/digital circuit components as well as the full system. When the integrated circuit has millions of devices, performing a full system simulation is practically infeasible using currently available Electrical Design Automation (EDA) tools. The principal reason for this is the time required for the nonlinear solver to compute the solutions of large linearized systems during the simulation of these circuits. The research presented in this report aims to address the computational difficulties introduced by these large linearized systems by using Model Order Reduction (MOR) to (i) generate specialized preconditioners that accelerate the computation of the linear system solution and (ii) reduce the overall dynamical system size. MOR techniques attempt to produce macromodels that capture the desired input-output behavior of larger dynamical systems and enable substantial speedups in simulation time. Several MOR techniques that have been developed under the LDRD on 'Solution Methods for Very Highly Integrated Circuits' will be presented in this report. Among those presented are techniques for linear time-invariant dynamical systems that either extend current approaches or improve the time-domain performance of the reduced model using novel error bounds and a new approach for linear time-varying dynamical systems that guarantees dimension reduction, which has not been proven before. Progress on preconditioning power grid systems using multi-grid techniques will be presented as well as a framework for delivering MOR techniques to the user community using Trilinos and the Xyce circuit simulator, both prominent world-class software tools.
Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.
Our ability to field useful, nano-enabled microsystems that capitalize on recent advances in sensor technology is severely limited by the energy density of available power sources. The catalytic nanodiode (reported by Somorjai's group at Berkeley in 2005) was potentially an alternative revolutionary source of micropower. Their first reports claimed that a sizable fraction of the chemical energy may be harvested via hot electrons (a 'chemicurrent') that are created by the catalytic chemical reaction. We fabricated and tested Pt/GaN nanodiodes, which eventually produced currents up to several microamps. Our best reaction yields (electrons/CO{sub 2}) were on the order of 10{sup -3}; well below the 75% values first reported by Somorjai (we note they have also been unable to reproduce their early results). Over the course of this Project we have determined that the whole concept of 'chemicurrent', in fact, may be an illusion. Our results conclusively demonstrate that the current measured from our nanodiodes is derived from a thermoelectric voltage; we have found no credible evidence for true chemicurrent. Unfortunately this means that the catalytic nanodiode has no future as a micropower source.
These slides describe different strategies for installing Python software. Although I am a big fan of Python software development, robust strategies for software installation remains a challenge. This talk describes several different installation scenarios. The Good: the user has administrative privileges - Installing on Windows with an installer executable, Installing with Linux application utility, Installing a Python package from the PyPI repository, and Installing a Python package from source. The Bad: the user does not have administrative privileges - Using a virtual environment to isolate package installations, and Using an installer executable on Windows with a virtual environment. The Ugly: the user needs to install an extension package from source - Installing a Python extension package from source, and PyCoinInstall - Managing builds for Python extension packages. The last item referring to PyCoinInstall describes a utility being developed for the COIN-OR software, which is used within the operations research community. COIN-OR includes a variety of Python and C++ software packages, and this script uses a simple plug-in system to support the management of package builds and installation.
It is well known that the continuous Galerkin method (in its standard form) is not locally conservative, yet many stabilized methods are constructed by augmenting the standard Galerkin weak form. In particular, the Variational Multiscale (VMS) method has achieved popularity for combating numerical instabilities that arise for mixed formulations that do not otherwise satisfy the LBB condition. Among alternative methods that satisfy local and global conservation, many employ Raviart-Thomas function spaces. The lowest order Raviart-Thomas finite element formulation (RT0) consists of evaluating fluxes over the midpoint of element edges and constant pressures within the element. Although the RT0 element poses many advantages, it has only been shown viable for triangular or tetrahedral elements (quadrilateral variants of this method do not pass the patch test). In the context of heterogenous materials, both of these methods have been used to model the mixed form of the Darcy equation. This work aims, in a comparative fashion, to evaluate the strengths and weaknesses of either approach for modeling Darcy flow for problems with highly varying material permeabilities and predominantly open flow boundary conditions. Such problems include carbon sequestration and enhanced oil recovery simulations for which the far-field boundary is typically described with some type of pressure boundary condition. We intend to show the degree to which the VMS formulation violates local mass conservation for these types of problems and compare the performance of the VMS and RT0 methods at boundaries between disparate permeabilities.
Significant challenges exist for achieving peak or even consistent levels of performance when using IO systems at scale. They stem from sharing IO system resources across the processes of single large-scale applications and/or multiple simultaneous programs causing internal and external interference, which in turn, causes substantial reductions in IO performance. This paper presents interference effects measurements for two different file systems at multiple supercomputing sites. These measurements motivate developing a 'managed' IO approach using adaptive algorithms varying the IO system workload based on current levels and use areas. An implementation of these methods deployed for the shared, general scratch storage system on Oak Ridge National Laboratory machines achieves higher overall performance and less variability in both a typical usage environment and with artificially introduced levels of 'noise'. The latter serving to clearly delineate and illustrate potential problems arising from shared system usage and the advantages derived from actively managing it.
The peridynamic theory is an extension of traditional solid mechanics that treats discontinuous media, including the evolution of discontinuities due to fracture, on the same mathematical basis as classically smooth media. A recent advance in the linearized peridynamic theory permits the reduction of the number of degrees of freedom modeled within a body. Under equilibrium conditions, this coarse graining method exactly reproduces the internal forces on the coarsened degrees of freedom, including the effect of the omitted material that is no longer explicitly modeled. The method applies to heterogeneous as well as homogeneous media and accounts for defects in the material. The coarse graining procedure can be repeated over and over, resulting in a hierarchically coarsened description that, at each stage, continues to reproduce the exact internal forces present in the original, detailed model. Each coarsening step results in reduced computational cost. This talk will describe the new peridynamic coarsening method and show computational examples.
The main result we will present is a 2k-approximation algorithm for the following 'k-hypergraph demand matching' problem: given a set system with sets of size <=k, where sets have profits & demands and vertices have capacities, find a max-profit subsystem whose demands do not exceed the capacities. The main tool is an iterative way to explicitly build a decomposition of the fractional optimum as 2k times a convex combination of integral solutions. If time permits we'll also show how the approach can be extended to a 3-approximation for 2-column sparse packing. The second result is tight w.r.t the integrality gap, and the first is near-tight as a gap lower bound of 2(k-1+1/k) is known.
The Python Optimization Modeling Objects (Pyomo) package [1] is an open source tool for modeling optimization applications within Python. Pyomo provides an objected-oriented approach to optimization modeling, and it can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers. While Pyomo provides a capability that is commonly associated with algebraic modeling languages such as AMPL, AIMMS, and GAMS, Pyomo's modeling objects are embedded within a full-featured high-level programming language with a rich set of supporting libraries. Pyomo leverages the capabilities of the Coopr software library [2], which integrates Python packages (including Pyomo) for defining optimizers, modeling optimization applications, and managing computational experiments. A central design principle within Pyomo is extensibility. Pyomo is built upon a flexible component architecture [3] that allows users and developers to readily extend the core Pyomo functionality. Through these interface points, extensions and applications can have direct access to an optimization model's expression objects. This facilitates the rapid development and implementation of new modeling constructs and as well as high-level solution strategies (e.g. using decomposition- and reformulation-based techniques). In this presentation, we will give an overview of the Pyomo modeling environment and model syntax, and present several extensions to the core Pyomo environment, including support for Generalized Disjunctive Programming (Coopr GDP), Stochastic Programming (PySP), a generic Progressive Hedging solver [4], and a tailored implementation of Bender's Decomposition.
This paper examines potential motivations for incorporating virtualization support in the system software stacks of high-end capability supercomputers. We advocate that this will increase the flexibility of these platforms significantly and enable new capabilities that are not possible with current fixed software stacks. Our results indicate that compute, virtual memory, and I/O virtualization overheads are low and can be further mitigated by utilizing well-known techniques such as large paging and VMM bypass. Furthermore, since the addition of virtualization support does not affect the performance of applications using the traditional native environment, there is essentially no disadvantage to its addition.
We present the results of the first stage of a two-stage evaluation of open source visual analytics packages. This stage is a broad feature comparison over a range of open source toolkits. Although we had originally intended to restrict ourselves to comparing visual analytics toolkits, we quickly found that very few were available. So we expanded our study to include information visualization, graph analysis, and statistical packages. We examine three aspects of each toolkit: visualization functions, analysis capabilities, and development environments. With respect to development environments, we look at platforms, language bindings, multi-threading/parallelism, user interface frameworks, ease of installation, documentation, and whether the package is still being actively developed.
Surfpack is a library of multidimensional function approximation methods useful for efficient surrogate-based sensitivity/uncertainty analysis or calibration/optimization. I will survey current Surfpack meta-modeling capabilities for continuous variables and describe recent progress generalizing to both continuous and categorical factors, including relevant test problems and analysis comparisons.
Recent work on eigenvalues and eigenvectors for tensors of order m {>=} 3 has been motivated by applications in blind source separation, magnetic resonance imaging, molecular conformation, and more. In this paper, we consider methods for computing real symmetric-tensor eigenpairs of the form Ax{sup m-1} = {lambda}x subject to {parallel}x{parallel} = 1, which is closely related to optimal rank-1 approximation of a symmetric tensor. Our contribution is a novel shifted symmetric higher-order power method (SS-HOPM), which we showis guaranteed to converge to a tensor eigenpair. SS-HOPM can be viewed as a generalization of the power iteration method for matrices or of the symmetric higher-order power method. Additionally, using fixed point analysis, we can characterize exactly which eigenpairs can and cannot be found by the method. Numerical examples are presented, including examples from an extension of the method to fnding complex eigenpairs.
Laboratories that work with biological agents need to manage their safety risks to persons working the laboratories and the human and animal community in the surrounding areas. Biosafety guidance defines a wide variety of biosafety risk mitigation measures, which include measures which fall under the following categories: engineering controls, procedural and administrative controls, and the use of personal protective equipment; the determination of which mitigation measures should be used to address the specific laboratory risks are dependent upon a risk assessment. Ideally, a risk assessment should be conducted in a manner which is standardized and systematic which allows it to be repeatable and comparable. A risk assessment should clearly define the risk being assessed and avoid over complication.
This presentation included a discussion of challenges arising in parallel mesh management, as well as demonstrated solutions. They also described the broad range of software for mesh management and modification developed by the Interoperable Technologies for Advanced Petascale Simulations (ITAPS) team, and highlighted applications successfully using the ITAPS tool suite.
The Trilinos Project started approximately nine years ago as a small effort to enable research, development and ongoing support of small, related solver software efforts. The 'Tri' in Trilinos was intended to indicate the eventual three packages we planned to develop. In 2007 the project expanded its scope to include any package that was an enabling technology for technical computing. Presently the Trilinos repository contains over 55 packages covering a broad spectrum of reusable tools for constructing full-featured scalable scientific and engineering applications. Trilinos usage is now worldwide, and many applications have an explicit dependence on Trilinos for essential capabilities. Users come from other US laboratories, universities, industry and international research groups. Awareness and use of Trilinos is growing rapidly outside of Sandia. Members of the external research community are becoming more familiar with Trilinos, its design and collaborative nature. As a result, the Trilinos project is receiving an increasing number of requests from external community members who want to contribute to Trilinos as developers. To-date we have worked with external developers in an ad hoc fashion. Going forward, we want to develop a set of policies, procedures, tools and infrastructure to simplify interactions with external developers. As we go forward with multi-laboratory efforts such as CASL and X-Stack, and international projects such as IESP, we will need a more streamlined and explicit process for making external developers 'first-class citizens' in the Trilinos development community. This document is intended to frame the discussion for expanding the Trilinos community to all strategically important external members, while at the same time preserving Sandia's primary leadership role in the project.
This talk highlights some multigrid challenges that arise from several application areas including structural dynamics, fluid flow, and electromagnetics. A general framework is presented to help introduce and understand algebraic multigrid methods based on energy minimization concepts. Connections between algebraic multigrid prolongators and finite element basis functions are made to explored. It is shown how the general algebraic multigrid framework allows one to adapt multigrid ideas to a number of different situations. Examples are given corresponding to linear elasticity and specifically in the solution of linear systems associated with extended finite elements for fracture problems.
The most feasible way to disperse particles in a bulk material or control their packing at a substrate is through fluidization in a carrier that can be processed with well-known techniques such as spin, drip and spray coating, fiber drawing, and casting. The next stage in the processing is often solidification involving drying by solvent evaporation. While there has been significant progress in the past few years in developing discrete element numerical methods to model dense nanoparticle dispersion/suspension rheology which properly treat the hydrodynamic interactions of the solvent, these methods cannot at present account for the volume reduction of the suspension due to solvent evaporation. As part of LDRD project FY-101285 we have developed and implemented methods in the current suite of discrete element methods to remove solvent particles and volume, and hence solvent mass from the liquid/vapor interface of a suspension to account for volume reduction (solvent drying) effects. To validate the methods large scale molecular dynamics simulations have been carried out to follow the evaporation process at the microscopic scale.
Predictive Capability Maturity Model (PCMM) is a communication tool that must include a dicussion of the supporting evidence. PCMM is a tool for managing risk in the use of modeling and simulation. PCMM is in the service of organizing evidence to help tell the modeling and simulation (M&S) story. PCMM table describes what activities within each element are undertaken at each of the levels of maturity. Target levels of maturity can be established based on the intended application. The assessment is to inform what level has been achieved compared to the desired level, to help prioritize the VU activities & to allocate resources.
In this work, we developed a self-organizing map (SOM) technique for using web-based text analysis to forecast when a group is undergoing a phase change. By 'phase change', we mean that an organization has fundamentally shifted attitudes or behaviors. For instance, when ice melts into water, the characteristics of the substance change. A formerly peaceful group may suddenly adopt violence, or a violent organization may unexpectedly agree to a ceasefire. SOM techniques were used to analyze text obtained from organization postings on the world-wide web. Results suggest it may be possible to forecast phase changes, and determine if an example of writing can be attributed to a group of interest.
Imported oil exacerabates our trade deficit and funds anti-American regimes. Nuclear Energy (NE) is a demonstrated technology with high efficiency. NE's two biggest political detriments are possible accidents and nuclear waste disposal. For NE policy, proliferation is the biggest obstacle. Nuclear waste can be reduced through reprocessing, where fuel rods are separated into various streams, some of which can be reused in reactors. Current process developed in the 1950s is dirty and expensive, U/Pu separation is the most critical. Fuel rods are sheared and dissolved in acid to extract fissile material in a centrifugal contactor. Plants have many contacts in series with other separations. We have taken a science and simulation-based approach to develop a modern reprocessing plant. Models of reprocessing plants are needed to support nuclear materials accountancy, nonproliferation, plant design, and plant scale-up.
This report provides documentation for the completion of the Sandia portion of the ASC Level II Visualization on the platform milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratories and Los Alamos National Laboratories. This milestone contains functionality required for performing visualization directly on a supercomputing platform, which is necessary for peta-scale visualization. Sandia's contribution concerns in-situ visualization, running a visualization in tandem with a solver. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is most computationally intensive portion of the visualization process. For terascale platforms, commodity clusters with graphics processors(GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the performance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. Scientific simulation on parallel supercomputers is traditionally performed in four sequential steps: meshing, partitioning, solver, and visualization. Not all of these components are necessarily run on the supercomputer. In particular, the meshing and visualization typically happen on smaller but more interactive computing resources. However, the previous decade has seen a growth in both the need and ability to perform scalable parallel analysis, and this gives motivation for coupling the solver and visualization.
The peridynamic model of solid mechanics is a mathematical theory designed to provide consistent mathematical treatment of deformations involving discontinuities, especially cracks. Unlike the partial differential equations (PDEs) of the standard theory, the fundamental equations of the peridynamic theory remain applicable on singularities such as crack surfaces and tips. These basic relations are integro-differential equations that do not require the existence of spatial derivatives of the deformation, or even continuity of the deformation. In the peridynamic theory, material points in a continuous body separated from each other by finite distances can interact directly through force densities. The interaction between each pair of points is called a bond. The dependence of the force density in a bond on the deformation provides the constitutive model for a material. By allowing the force density in a bond to depend on the deformation of other nearby bonds, as well as its own deformation, a wide spectrum of material response can be modelled. Damage is included in the constitutive model through the irreversible breakage of bonds according to some criterion. This criterion determines the critical energy release rate for a peridynamic material. In this talk, we present a general discussion of the peridynamic method and recent progress in its application to penetration and fracture in nonlinearly elastic solids. Constitutive models are presented for rubbery materials, including damage evolution laws. The deformation near a crack tip is discussed and compared with results from the standard theory. Examples demonstrating the spontaneous nucleation and growth of cracks are presented. It is also shown how the method can be applied to anisotropic media, including fiber reinforced composites. Examples show prediction of impact damage in composites and comparison against experimental measurements of damage and delamination.
In a (future) quantum computer a single logical quantum bit (qubit) will be made of multiple physical qubits. These extra physical qubits implement mandatory extensive error checking. The efficiency of error correction will fundamentally influence the performance of a future quantum computer, both in latency/speed and in error threshold (the worst error tolerated for an individual gate). Executing this quantum error correction requires scheduling the individual operations subject to architectural constraints. Since our last talk on this subject, a team of researchers at Sandia National Labortories has designed a logical qubit architecture that considers all relevant architectural issues including layout, the effects of supporting classical electronics, and the types of gates that the underlying physical qubit implementation supports most naturally. This is a two-dimensional system where 2-qubit operations occur locally, so there is no need to calculate more complex qubit/information transportation. Using integer programming, we found a schedule of qubit operations that obeys the hardware constraints, implements the local-check code in the native gate set, and minimizes qubit idle periods. Even with an optimal schedule, however, parallel Monte Carlo simulation shows that there is no finite error probability for the native gates such that the error-correction system would be benecial. However, by adding dynamic decoupling, a series of timed pulses that can reverse some errors, we found that there may be a threshold. Thus finding optimal schedules for increasingly-refined scheduling problems has proven critical for the overall design of the logical qubit system. We describe the evolving scheduling problems and the ideas behind the integer programming-based solution methods. This talk assumes no prior knowledge of quantum computing.
Arctic sea ice is an important component of the global climate system and due to feedback effects the Arctic ice cover is changing rapidly. Predictive mathematical models are of paramount importance for accurate estimates of the future ice trajectory. However, the sea ice components of Global Climate Models (GCMs) vary significantly in their prediction of the future state of Arctic sea ice and have generally underestimated the rate of decline in minimum sea ice extent seen over the past thirty years. One of the contributing factors to this variability is the sensitivity of the sea ice to model physical parameters. A new sea ice model that has the potential to improve sea ice predictions incorporates an anisotropic elastic-decohesive rheology and dynamics solved using the material-point method (MPM), which combines Lagrangian particles for advection with a background grid for gradient computations. We evaluate the variability of the Los Alamos National Laboratory CICE code and the MPM sea ice code for a single year simulation of the Arctic basin using consistent ocean and atmospheric forcing. Sensitivities of ice volume, ice area, ice extent, root mean square (RMS) ice speed, central Arctic ice thickness, and central Arctic ice speed with respect to ten different dynamic and thermodynamic parameters are evaluated both individually and in combination using the Design Analysis Kit for Optimization and Terascale Applications (DAKOTA). We find similar responses for the two codes and some interesting seasonal variability in the strength of the parameters on the solution.
The two primary objectives of this LDRD project were to create a lightweight kernel (LWK) operating system(OS) designed to take maximum advantage of multi-core processors, and to leverage the virtualization capabilities in modern multi-core processors to create a more flexible and adaptable LWK environment. The most significant technical accomplishments of this project were the development of the Kitten lightweight kernel, the co-development of the SMARTMAP intra-node memory mapping technique, and the development and demonstration of a scalable virtualization environment for HPC. Each of these topics is presented in this report by the inclusion of a published or submitted research paper. The results of this project are being leveraged by several ongoing and new research projects.
MPI is the dominant programming model for distributed memory parallel computers, and is often used as the intra-node programming model on multi-core compute nodes. However, application developers are increasingly turning to hybrid models that use threading within a node and MPI between nodes. In contrast to MPI, most current threaded models do not require application developers to deal explicitly with data locality. With increasing core counts and deeper NUMA hierarchies seen in the upcoming LANL/SNL 'Cielo' capability supercomputer, data distribution poses an upper boundary on intra-node scalability within threaded applications. Data locality therefore has to be identified at runtime using static memory allocation policies such as first-touch or next-touch, or specified by the application user at launch time. We evaluate several existing techniques for managing data distribution using micro-benchmarks on an AMD 'Magny-Cours' system with 24 cores among 4 NUMA domains and argue for the adoption of a dynamic runtime system implemented at the kernel level, employing a novel page table replication scheme to gather per-NUMA domain memory access traces.
This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.
Statistical Latent Dirichlet Analysis produces mixture model data that are geometrically equivalent to points lying on a regular simplex in moderate to high dimensions. Numerous other statistical models and techniques also produce data in this geometric category, even though the meaning of the axes and coordinate values differs significantly. A distance function is used to further analyze these points, for example to cluster them. Several different distance functions are popular amongst statisticians; which distance function is chosen is usually driven by the historical preference of the application domain, information-theoretic considerations, or by the desirability of the clustering results. Relatively little consideration is usually given to how distance functions geometrically transform data, or the distances algebraic properties. Here we take a look at these issues, in the hope of providing complementary insight and inspiring further geometric thought. Several popular distances, {chi}{sup 2}, Jensen - Shannon divergence, and the square of the Hellinger distance, are shown to be nearly equivalent; in terms of functional forms after transformations, factorizations, and series expansions; and in terms of the shape and proximity of constant-value contours. This is somewhat surprising given that their original functional forms look quite different. Cosine similarity is the square of the Euclidean distance, and a similar geometric relationship is shown with Hellinger and another cosine. We suggest a geodesic variation of Hellinger. The square-root projection that arises in Hellinger distance is briefly compared to standard normalization for Euclidean distance. We include detailed derivations of some ratio and difference bounds for illustrative purposes. We provide some constructions that nearly achieve the worst-case ratios, relevant for contours.
While individual neurons function at relatively low firing rates, naturally-occurring nervous systems not only surpass manmade systems in computing power, but accomplish this feat using relatively little energy. It is asserted that the next major breakthrough in computing power will be achieved through application of neuromorphic approaches that mimic the mechanisms by which neural systems integrate and store massive quantities of data for real-time decision making. The proposed LDRD provides a conceptual foundation for SNL to make unique advances toward exascale computing. First, a team consisting of experts from the HPC, MESA, cognitive and biological sciences and nanotechnology domains will be coordinated to conduct an exercise with the outcome being a concept for applying neuromorphic computing to achieve exascale computing. It is anticipated that this concept will involve innovative extension and integration of SNL capabilities in MicroFab, material sciences, high-performance computing, and modeling and simulation of neural processes/systems.
This report summarizes activities undertaken during FY08-FY10 for the LDRD Peridynamics as a Rigorous Coarse-Graining of Atomistics for Multiscale Materials Design. The goal of our project was to develop a coarse-graining of finite temperature molecular dynamics (MD) that successfully transitions from statistical mechanics to continuum mechanics. The goal of our project is to develop a coarse-graining of finite temperature molecular dynamics (MD) that successfully transitions from statistical mechanics to continuum mechanics. Our coarse-graining overcomes the intrinsic limitation of coupling atomistics with classical continuum mechanics via the FEM (finite element method), SPH (smoothed particle hydrodynamics), or MPM (material point method); namely, that classical continuum mechanics assumes a local force interaction that is incompatible with the nonlocal force model of atomistic methods. Therefore FEM, SPH, and MPM inherit this limitation. This seemingly innocuous dichotomy has far reaching consequences; for example, classical continuum mechanics cannot resolve the short wavelength behavior associated with atomistics. Other consequences include spurious forces, invalid phonon dispersion relationships, and irreconcilable descriptions/treatments of temperature. We propose a statistically based coarse-graining of atomistics via peridynamics and so develop a first of a kind mesoscopic capability to enable consistent, thermodynamically sound, atomistic-to-continuum (AtC) multiscale material simulation. Peridynamics (PD) is a microcontinuum theory that assumes nonlocal forces for describing long-range material interaction. The force interactions occurring at finite distances are naturally accounted for in PD. Moreover, PDs nonlocal force model is entirely consistent with those used by atomistics methods, in stark contrast to classical continuum mechanics. Hence, PD can be employed for mesoscopic phenomena that are beyond the realms of classical continuum mechanics and atomistic simulations, e.g., molecular dynamics and density functional theory (DFT). The latter two atomistic techniques are handicapped by the onerous length and time scales associated with simulating mesoscopic materials. Simulating such mesoscopic materials is likely to require, and greatly benefit from multiscale simulations coupling DFT, MD, PD, and explicit transient dynamic finite element methods FEM (e.g., Presto). The proposed work fills the gap needed to enable multiscale materials simulations.
This report is a summary of the accomplishments of the 'Leveraging Multi-way Linkages on Heterogeneous Data' which ran from FY08 through FY10. The goal was to investigate scalable and robust methods for multi-way data analysis. We developed a new optimization-based method called CPOPT for fitting a particular type of tensor factorization to data; CPOPT was compared against existing methods and found to be more accurate than any faster method and faster than any equally accurate method. We extended this method to computing tensor factorizations for problems with incomplete data; our results show that you can recover scientifically meaningfully factorizations with large amounts of missing data (50% or more). The project has involved 5 members of the technical staff, 2 postdocs, and 1 summer intern. It has resulted in a total of 13 publications, 2 software releases, and over 30 presentations. Several follow-on projects have already begun, with more potential projects in development.
The fundamental ideas of the high order compact method are combined with the generalized finite difference method. The result is a finite difference method that works on unstructured, nonuniform grids, and is more accurate than one would classically expect from the number of grid points employed.
The neocortex is perhaps the highest region of the human brain, where audio and visual perception takes place along with many important cognitive functions. An important research goal is to describe the mechanisms implemented by the neocortex. There is an apparent regularity in the structure of the neocortex [Brodmann 1909, Mountcastle 1957] which may help simplify this task. The work reported here addresses the problem of how to describe the putative repeated units ('cortical circuits') in a manner that is easily understood and manipulated, with the long-term goal of developing a mathematical and algorithmic description of their function. The approach is to reduce each algorithm to an enhanced perceptron-like structure and describe its computation using difference equations. We organize this algorithmic processing into larger structures based on physiological observations, and implement key modeling concepts in software which runs on parallel computing hardware.
QMU stands for 'Quantification of Margins and Uncertainties'. QMU is a basic framework for consistency in integrating simulation, data, and/or subject matter expertise to provide input into a risk-informed decision-making process. QMU is being applied to a wide range of NNSA stockpile issues, from performance to safety. The implementation of QMU varies with lab and application focus. The Advanced Simulation and Computing (ASC) Program develops validated computational simulation tools to be applied in the context of QMU. QMU provides input into a risk-informed decision making process. The completeness aspect of QMU can benefit from the structured methodology and discipline of quantitative risk assessment (QRA)/probabilistic risk assessment (PRA). In characterizing uncertainties it is important to pay attention to the distinction between those arising from incomplete knowledge ('epistemic' or systematic), and those arising from device-to-device variation ('aleatory' or random). The national security labs should investigate the utility of a probability of frequency (PoF) approach in presenting uncertainties in the stockpile. A QMU methodology is connected if the interactions between failure modes are included. The design labs should continue to focus attention on quantifying uncertainties that arise from epistemic uncertainties such as poorly-modeled phenomena, numerical errors, coding errors, and systematic uncertainties in experiment. The NNSA and design labs should ensure that the certification plan for any RRW is supported by strong, timely peer review and by an ongoing, transparent QMU-based documentation and analysis in order to permit a confidence level necessary for eventual certification.
Training simulators have become increasingly popular tools for instructing humans on performance in complex environments. However, the question of how to provide individualized and scenario-specific assessment and feedback to students remains largely an open question. In this work, we follow-up on previous evaluations of the Automated Expert Modeling and Automated Student Evaluation (AEMASE) system, which automatically assesses student performance based on observed examples of good and bad performance in a given domain. The current study provides a rigorous empirical evaluation of the enhanced training effectiveness achievable with this technology. In particular, we found that students given feedback via the AEMASE-based debrief tool performed significantly better than students given only instructor feedback on two out of three domain-specific performance metrics.
Training simulators have become increasingly popular tools for instructing humans on performance in complex environments. However, the question of how to provide individualized and scenario-specific assessment and feedback to students remains largely an open question. To maximize training efficiency, new technologies are required that assist instructors in providing individually relevant instruction. Sandia National Laboratories has shown the feasibility of automated performance assessment tools, such as the Sandia-developed Automated Expert Modeling and Student Evaluation (AEMASE) software, through proof-of-concept demonstrations, a pilot study, and an experiment. In the pilot study, the AEMASE system, which automatically assesses student performance based on observed examples of good and bad performance in a given domain, achieved a high degree of agreement with a human grader (89%) in assessing tactical air engagement scenarios. In more recent work, we found that AEMASE achieved a high degree of agreement with human graders (83-99%) for three Navy E-2 domain-relevant performance metrics. The current study provides a rigorous empirical evaluation of the enhanced training effectiveness achievable with this technology. In particular, we assessed whether giving students feedback based on automated metrics would enhance training effectiveness and improve student performance. We trained two groups of employees (differentiated by type of feedback) on a Navy E-2 simulator and assessed their performance on three domain-specific performance metrics. We found that students given feedback via the AEMASE-based debrief tool performed significantly better than students given only instructor feedback on two out of three metrics. Future work will focus on extending these developments for automated assessment of teamwork.
The peridynamic model of solid mechanics treats internal forces within a continuum through interactions across finite distances. These forces are determined through a constitutive model that, in the case of an elastic material, permits the strain energy density at a point to depend on the collective deformation of all the material within some finite distance of it. The forces between points are evaluated from the Frechet derivative of this strain energy density with respect to the deformation map. The resulting equation of motion is an integro-differential equation written in terms of these interparticle forces, rather than the traditional stress tensor field. Recent work on peridynamics has elucidated the energy balance in the presence of these long-range forces. We have derived the appropriate analogue of stress power, called absorbed power, that leads to a satisfactory definition of internal energy. This internal energy is additive, allowing us to meaningfully define an internal energy density field in the body. An expression for the local first law of thermodynamics within peridynamics combines this mechanical component, the absorbed power, with heat transport. The global statement of the energy balance over a subregion can be expressed in a form in which the mechanical and thermal terms contain only interactions between the interior of the subregion and the exterior, in a form anticipated by Noll in 1955. The local form of this first law within peridynamics, coupled with the second law as expressed in the Clausius-Duhem inequality, is amenable to the Coleman-Noll procedure for deriving restrictions on the constitutive model for thermomechanical response. Using an idea suggested by Fried in the context of systems of discrete particles, this procedure leads to a dissipation inequality for peridynamics that has a surprising form. It also leads to a thermodynamically consistent way to treat damage within the theory, shedding light on how damage, including the nucleation and advance of cracks, should be incorporated into a constitutive model.
Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
This LDRD 149045 final report describes work that Sandians Scott A. Mitchell, Randall Laviolette, Shawn Martin, Warren Davis, Cindy Philips and Danny Dunlavy performed in 2010. Prof. Afra Zomorodian provided insight. This was a small late-start LDRD. Several other ongoing efforts were leveraged, including the Networks Grand Challenge LDRD, and the Computational Topology CSRF project, and the some of the leveraged work is described here. We proposed a sentence mining technique that exploited both the distribution and the order of parts-of-speech (POS) in sentences in English language documents. The ultimate goal was to be able to discover 'call-to-action' framing documents hidden within a corpus of mostly expository documents, even if the documents were all on the same topic and used the same vocabulary. Using POS was novel. We also took a novel approach to analyzing POS. We used the hypothesis that English follows a dynamical system and the POS are trajectories from one state to another. We analyzed the sequences of POS using support vector machines and the cycles of POS using computational homology. We discovered that the POS were a very weak signal and did not support our hypothesis well. Our original goal appeared to be unobtainable with our original approach. We turned our attention to study an aspect of a more traditional approach to distinguishing documents. Latent Dirichlet Allocation (LDA) turns documents into bags-of-words then into mixture-model points. A distance function is used to cluster groups of points to discover relatedness between documents. We performed a geometric and algebraic analysis of the most popular distance functions and made some significant and surprising discoveries, described in a separate technical report.
We implemented two numerical simulation capabilities essential to reliably predicting the effect of non-ideal explosives (NXs). To begin to be able to treat the multiple, competing, multi-step reaction paths and slower kinetics of NXs, Sandia's CTH shock physics code was extended to include the TIGER thermochemical equilibrium solver as an in-line routine. To facilitate efficient exploration of reaction pathways that need to be identified for the CTH simulations, we implemented in Sandia's LAMMPS molecular dynamics code the MSST method, which is a reactive molecular dynamics technique for simulating steady shock wave response. Our preliminary demonstrations of these two capabilities serve several purposes: (i) they demonstrate proof-of-principle for our approach; (ii) they provide illustration of the applicability of the new functionality; and (iii) they begin to characterize the use of the new functionality and identify where improvements will be needed for the ultimate capability to meet national security needs. Next steps are discussed.
The simplest conceptual model of cybersecurity implicitly views attackers and defenders as acting in isolation from one another: an attacker seeks to penetrate or disrupt a system that has been protected to a given level, while a defender attempts to thwart particular attacks. Such a model also views all non-malicious parties as having the same goal of preventing all attacks. But in fact, attackers and defenders are interacting parts of the same system, and different defenders have their own individual interests: defenders may be willing to accept some risk of successful attack if the cost of defense is too high. We have used game theory to develop models of how non-cooperative but non-malicious players in a network interact when there is a substantial cost associated with effective defensive measures. Although game theory has been applied in this area before, we have introduced some novel aspects of player behavior in our work, including: (1) A model of how players attempt to avoid the costs of defense and force others to assume these costs; (2) A model of how players interact when the cost of defending one node can be shared by other nodes; and (3) A model of the incentives for a defender to choose less expensive, but less effective, defensive actions.
This report describes the specification of a challenge problem and associated challenge milestones for the Waste Integrated Performance and Safety Codes (IPSC) supporting the U.S. Department of Energy (DOE) Office of Nuclear Energy Advanced Modeling and Simulation (NEAMS) Campaign. The NEAMS challenge problems are designed to demonstrate proof of concept and progress towards IPSC goals. The goal of the Waste IPSC is to develop an integrated suite of modeling and simulation capabilities to quantitatively assess the long-term performance of waste forms in the engineered and geologic environments of a radioactive waste storage or disposal system. The Waste IPSC will provide this simulation capability (1) for a range of disposal concepts, waste form types, engineered repository designs, and geologic settings, (2) for a range of time scales and distances, (3) with appropriate consideration of the inherent uncertainties, and (4) in accordance with robust verification, validation, and software quality requirements. To demonstrate proof of concept and progress towards these goals and requirements, a Waste IPSC challenge problem is specified that includes coupled thermal-hydrologic-chemical-mechanical (THCM) processes that describe (1) the degradation of a borosilicate glass waste form and the corresponding mobilization of radionuclides (i.e., the processes that produce the radionuclide source term), (2) the associated near-field physical and chemical environment for waste emplacement within a salt formation, and (3) radionuclide transport in the near field (i.e., through the engineered components - waste form, waste package, and backfill - and the immediately adjacent salt). The initial details of a set of challenge milestones that collectively comprise the full challenge problem are also specified.
Scientific applications use highly specialized data structures that require complex, latency sensitive graphs of integer instructions for memory address calculations. Working with the Univeristy of Wisconsin, we have demonstrated significant differences between the Sandia's applications and the industry standard SPEC-FP (standard performance evaluation corporation-floating point) suite. Specifically, integer dataflow performance is critical to overall system performance. To improve this performance, we have developed a configurable functional unit design that is capable of accelerating integer dataflow.
Diffusion, lossy wave, and Klein–Gordon equations find numerous applications in practical problems across a range of diverse disciplines. The temporal dependence of all three Green’s functions are characterized by an infinite tail. This implies that the cost complexity of the spatio-temporal convolutions, associated with evaluating the potentials, scales as O(Ns2Nt2), where Ns and Nt are the number of spatial and temporal degrees of freedom, respectively. In this paper, we discuss two new methods to rapidly evaluate these spatio-temporal convolutions by exploiting their block-Toeplitz nature within the framework of accelerated Cartesian expansions (ACE). The first scheme identifies a convolution relation in time amongst ACE harmonics and the fast Fourier transform (FFT) is used for efficient evaluation of these convolutions. The second method exploits the rank deficiency of the ACE translation operators with respect to time and develops a recursive numerical compression scheme for the efficient representation and evaluation of temporal convolutions. It is shown that the cost of both methods scales as O(NsNtlog2Nt). Furthermore, several numerical results are presented for the diffusion equation to validate the accuracy and efficacy of the fast algorithms developed here.
The shallow water equations are used as a test for many atmospheric models because the solution mimics the horizontal aspects of atmospheric dynamics while the simplicity of the equations make them useful for numerical experiments. This study describes a high-order element-based Galerkin method for the global shallow water equations using absolute vorticity, divergence, and fluid depth (atmospheric thickness) as the prognostic variables, while the wind field is a diagnostic variable that can be calculated from the stream function and velocity potential (the Laplacians of which are the vorticity and divergence, respectively). The numerical method employed to solve the shallow water system is based on the discontinuous Galerkin and spectral element methods. The discontinuous Galerkin method, which is inherently conservative, is used to solve the equations governing two conservative variables - absolute vorticity and atmospheric thickness (mass). The spectral element method is used to solve the divergence equation and the Poisson equations for the velocity potential and the stream function. Time integration is done with an explicit strong stability-preserving second-order Runge-Kutta scheme and the wind field is updated directly from the vorticity and divergence at each stage, and the computational domain is the cubed sphere. A stable steady-state test is run and convergence results are provided, showing that the method is high-order accurate. Additionally, two tests without analytic solutions are run with comparable results to previous high-resolution runs found in the literature.
The outline of this presentation is: (1) High-level view of Zoltan; (2) Requirements, data models, and interface; (3) Load Balancing and Partitioning; (4) Matrix Ordering, Graph Coloring; (5) Utilities; (6) Isorropia; and (7) Zoltan2.
The objectives of this presentation are: (1) Learn how to partition a problem using Zoltan; (2) Understand the following (a) Basic process of partitioning with Zoltan, (b) Setting Zoltan parameters, (c) Registering query functions, (d) Writing query functions, (e) Zoltan-LB-Partition and its input/output; and (3) Be able to integrate Zoltan into your own applications.
We fabricated a split-gate defined point contact in a double gate enhancement mode Si-MOS device, and implanted Sb donor atoms using a self-aligned process. E-beam lithography in combination with a timed implant gives us excellent control over the placement of dopant atoms, and acts as a stepping stone to focused ion beam implantation of single donors. Our approach allows us considerable latitude in experimental design in-situ. We have identified two resonance conditions in the point contact conductance as a function of split gate voltage. Using tunneling spectroscopy, we probed their electronic structure as a function of temperature and magnetic field. We also determine the capacitive coupling between the resonant feature and several gates. Comparison between experimental values and extensive quasi-classical simulations constrain the location and energy of the resonant level. We discuss our results and how they may apply to resonant tunneling through a single donor.
The cubed sphere geometry, obtained by inscribing a cube in a sphere and mapping points between the two surfaces using a gnomonic (central) projection, is commonly used in atmospheric models because it is free of polar singularities and is well-suited for parallel computing. Global meshes on the cubed-sphere typically project uniform (square) grids from each face of the cube onto the sphere, and if refinement is desired then it is done with non-conforming meshes - overlaying the area of interest with a finer uniform mesh, which introduces so-called hanging nodes on edges along the boundary of the fine resolution area. An alternate technique is to tile each face of the cube with quadrilaterals without requiring the quads to be rectangular. These meshes allow for refinement in areas of interest with a conforming mesh, providing a smoother transition between high and low resolution portions of the grid than non-conforming refinement. The conforming meshes are demonstrated in HOMME, NCAR's High Order Method Modeling Environment, where two modifications have been made: the dependence on uniform meshes has been removed, and the ability to read arbitrary quadrilateral meshes from a previously-generated file has been added. Numerical results come from a conservative spectral element method modeling a selection of the standard shallow water test cases.
Periodic, coordinated, checkpointing to disk is the most prevalent fault tolerance method used in modern large-scale, capability class, high-performance computing (HPC) systems. Previous work has shown that as the system grows in size, the inherent synchronization of coordinated checkpoint/restart (CR) limits application scalability; at large node counts the application spends most of its time checkpointing instead of executing useful work. Furthermore, a single component failure forces an application restart from the last correct checkpoint. Suggested alternatives to coordinated CR include uncoordinated CR with message logging, redundant computation, and RAID-inspired, in-memory distributed checkpointing schemes. Each of these alternatives have differing overheads that are dependent on both the scale and communication characteristics of the application. In this work, using the Structural Simulation Toolkit (SST) simulator, we compare the performance characteristics of each of these resilience methods for a number of HPC application patterns on a number of proposed exascale machines. The result of this work provides valuable guidance on the most efficient resilience methods for exascale systems.
Trilinos is an object-oriented software framework to enabled the solution of large-scale, complex multiphysics engineering and scientific problems. Different Trilinos packages build on each other to create a stack providing the necessary capability: (1) Non-linear solver; (2) Linear solver/preconditioner; (3) Distributed linear algebra; and (4) Local linear algebra.
Enumerating triangles (3-cycles) in graphs is a kernel operation for social network analysis. For example, many community detection methods depend upon finding common neighbors of two related entities. We consider Cohen's simple and elegant solution for listing triangles: give each node a 'bucket.' Place each edge into the bucket of its endpoint of lowest degree, breaking ties consistently. Each node then checks each pair of edges in its bucket, testing for the adjacency that would complete that triangle. Cohen presents an informal argument that his algorithm should run well on real graphs. We formalize this argument by providing an analysis for the expected running time on a class of random graphs, including power law graphs. We consider a rigorously defined method for generating a random simple graph, the erased configuration model (ECM). In the ECM each node draws a degree independently from a marginal degree distribution, endpoints pair randomly, and we erase self loops and multiedges. If the marginal degree distribution has a finite second moment, it follows immediately that Cohen's algorithm runs in expected linear time. Furthermore, it can still run in expected linear time even when the degree distribution has such a heavy tail that the second moment is not finite. We prove that Cohen's algorithm runs in expected linear time when the marginal degree distribution has finite 4/3 moment and no vertex has degree larger than {radical}n. In fact we give the precise asymptotic value of the expected number of edge pairs per bucket. A finite 4/3 moment is required; if it is unbounded, then so is the number of pairs. The marginal degree distribution of a power law graph has bounded 4/3 moment when its exponent {alpha} is more than 7/3. Thus for this class of power law graphs, with degree at most {radical}n, Cohen's algorithm runs in expected linear time. This is precisely the value of {alpha} for which the clustering coefficient tends to zero asymptotically, and it is in the range that is relevant for the degree distribution of the World-Wide Web.
Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the exibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis.
The aim of this project is to develop low dimension parametric (deterministic) models of complex networks, to use compressive sensing (CS) and multiscale analysis to do so and to exploit the structure of complex networks (some are self-similar under coarsening). CS provides a new way of sampling and reconstructing networks. The approach is based on multiresolution decomposition of the adjacency matrix and its efficient sampling. It requires preprocessing of the adjacency matrix to make it 'blocky' which is the biggest (combinatorial) algorithm challenge. Current CS reconstruction algorithm makes no use of the structure of a graph, its very general (and so not very efficient/customized). Other model-based CS techniques exist, but not yet adapted to networks. Obvious starting point for future work is to increase the efficiency of reconstruction.
The present paper is the second in a series published at I/ITSEC that seeks to explain the efficacy of multi-role experiential learning employed to create engaging game-based training methods transitioned to the U.S. Army, U.S. Army Special Forces, Civil Affairs, and Psychological Operations teams. The first publication (I/ITSEC 2009) summarized findings from a quantitative study that investigated experiential learning in the multi-player, PC-based game module transitioned to PEO-STRI, DARWARS Ambush! NK (non-kinetic). The 2009 publication reported that participants of multi-role (Player and Reflective Observer/Evaluator) game-based training reported statistically significant learning and engagement. Additionally when the means of the two groups (Player and Reflective Observer/Evaluator) were compared, they were not statistically significantly different from each other. That is to say that both playing as well as observing/evaluating were engaging learning modalities. The Observer/Evaluator role was designed to provide an opportunity for real-time reflection and meta-cognitive learning during game play. Results indicated that this role was an engaging way to learn about communication, that participants learned something about cultural awareness, and that the skills they learned were helpful in problem solving and decision-making.
The present paper seeks to continue to understand what and how users of non-kinetic game-based missions learn by revisiting the 2009 quantitative study with further investigation such as stochastic player performance analysis using latent semantic analyses and graph visualizations. The results are applicable to First-Person game-based learning systems designed to enhance trainee intercultural communication, interpersonal skills, and adaptive thinking. In the full paper, we discuss results obtained from data collected from 78 research participants of diverse backgrounds who trained by engaging in tasks directly, as well as observing and evaluating peer performance in real-time. The goal is two-fold. One is to quantify and visualize detailed player performance data coming from game play transcription to give further understanding to the results in the 2009 I/ITSEC paper. The second is to develop a set of technologies from this quantification and visualization approach into a generalized application tool to be used to aid in future games’ development of player/learner models and game adaptation algorithms.
Specifically, this paper addresses questions such as, “Are there significant differences in one's experience when an experiential learning task is observed first, and then performed by the same individual?” “Are there significant differences among groups participating in different roles in non-kinetic engagement training, especially when one role requires more active participation that the other?” “What is the impact of behavior modeling on learning in games?” In answering these questions the present paper reinforces the 2009 empirical study conclusion that contrary to current trends in military game development, experiential learning is enhanced by innovative training approaches designed to facilitate trainee mastery of reflective observation and abstract conceptualization as much as performance-based skills.
This paper compares three approaches for model selection: classical least squares methods, information theoretic criteria, and Bayesian approaches. Least squares methods are not model selection methods although one can select the model that yields the smallest sum-of-squared error function. Information theoretic approaches balance overfitting with model accuracy by incorporating terms that penalize more parameters with a log-likelihood term to reflect goodness of fit. Bayesian model selection involves calculating the posterior probability that each model is correct, given experimental data and prior probabilities that each model is correct. As part of this calculation, one often calibrates the parameters of each model and this is included in the Bayesian calculations. Our approach is demonstrated on a structural dynamics example with models for energy dissipation and peak force across a bolted joint. The three approaches are compared and the influence of the log-likelihood term in all approaches is discussed.
The problem of incomplete data - i.e., data with missing or unknown values - in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most well-known tensor factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CP-WOPT on two real-world applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process.
Co-design has been identified as a key strategy for achieving Exascale computing in this decade. This paper describes the need for co-design in High Performance Computing related research in embedded computing the development of hardware/software co-simulation methods.
The analysis of networked activities is dramatically more challenging than many traditional kinds of analysis. A network is defined by a set of entities (people, organizations, banks, computers, etc.) linked by various types of relationships. These entities and relationships are often uninteresting alone, and only become significant in aggregate. The analysis and visualization of these networks is one of the driving factors behind the creation of the Titan Toolkit. Given the broad set of problem domains and the wide ranging databases in use by the information analysis community, the Titan Toolkit's flexible, component based pipeline provides an excellent platform for constructing specific combinations of network algorithms and visualizations.
The observation and characterization of a single atom system in silicon is a significant landmark in half a century of device miniaturization, and presents an important new laboratory for fundamental quantum and atomic physics. We compare with multi-million atom tight binding (TB) calculations the measurements of the spectrum of a single two-electron (2e) atom system in silicon - a negatively charged (D-) gated Arsenic donor in a FinFET. The TB method captures accurate single electron eigenstates of the device taking into account device geometry, donor potentials, applied fields, interfaces, and the full host bandstructure. In a previous work, the depths and fields of As donors in six device samples were established through excited state spectroscopy of the D0 electron and comparison with TB calculations. Using self-consistent field (SCF) TB, we computed the charging energies of the D- electron for the same six device samples, and found good agreement with the measurements. Although a bulk donor has only a bound singlet ground state and a charging energy of about 40 meV, calculations show that a gated donor near an interface can have a reduced charging energy and bound excited states in the D- spectrum. Measurements indeed reveal reduced charging energies and bound 2e excited states, at least one of which is a triplet. The calculations also show the influence of the host valley physics in the two-electron spectrum of the donor.
Automated processing, modeling, and analysis of unstructured text (news documents, web content, journal articles, etc.) is a key task in many data analysis and decision making applications. As data sizes grow, scalability is essential for deep analysis. In many cases, documents are modeled as term or feature vectors and latent semantic analysis (LSA) is used to model latent, or hidden, relationships between documents and terms appearing in those documents. LSA supplies conceptual organization and analysis of document collections by modeling high-dimension feature vectors in many fewer dimensions. While past work on the scalability of LSA modeling has focused on the SVD, the goal of our work is to investigate the use of distributed memory architectures for the entire text analysis process, from data ingestion to semantic modeling and analysis. ParaText is a set of software components for distributed processing, modeling, and analysis of unstructured text. The ParaText source code is available under a BSD license, as an integral part of the Titan toolkit. ParaText components are chained-together into data-parallel pipelines that are replicated across processes on distributed-memory architectures. Individual components can be replaced or rewired to explore different computational strategies and implement new functionality. ParaText functionality can be embedded in applications on any platform using the native C++ API, Python, or Java. The ParaText MPI Process provides a 'generic' text analysis pipeline in a command-line executable that can be used for many serial and parallel analysis tasks. ParaText can also be deployed as a web service accessible via a RESTful (HTTP) API. In the web service configuration, any client can access the functionality provided by ParaText using commodity protocols ... from standard web browsers to custom clients written in any language.
While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance.
A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.
After more than 50 years of molecular simulations, accurate empirical models are still the bottleneck in the wide adoption of simulation techniques. Addressing this issue with a fresh paradigm is the need of the day. In this study, we outline a new genetic-programming based method to develop empirical models for a system purely from its energy and/or forces. While the approach was initially developed for the development of classical force-fields from ab-initio calculations, we also discuss its application to the molecular coarse-graining of methanol. Two models, one representing methanol by a single site and the other via two sites will be developed using this method. They will be validated against existing coarse-grained potentials for methanol by comparing thermophysical properties.
This brief paper explores the development of scalable, nonlinear, fully-implicit solution methods for a stabilized unstructured finite element (FE) discretization of the 2D incompressible (reduced) resistive MHD system. The discussion considers the stabilized FE formulation in context of a fully-implicit time integration and direct-to-steady-state solution capability. The nonlinear solver strategy employs Newton-Krylov methods, which are preconditioned using fully-coupled algebraic multilevel (AMG) techniques and a new approximate block factorization (ABF) preconditioner. The intent of these preconditioners is to enable robust, scalable and efficient solution approaches for the large-scale sparse linear systems generated by the Newton linearization. We present results for the fully-coupled AMG preconditioner for two prototype problems, a low Lundquist number MHD Faraday conduction pump and moderately-high Lundquist number incompressible magnetic island coalescence problem. For the MHD pump results we explore the scaling of the fully-coupled AMG preconditioner for up to 4096 processors for problems with up to 64M unknowns on a CrayXT3/4. Using the island coalescence problem we explore the weak scaling of the AMG preconditioner and the influence of the Lundquist number on the iteration count. Finally we present some very recent results for the algorithmic scaling of the ABF preconditioner.
Dynamical systems theory provides a powerful framework for understanding the behavior of complex evolving systems. However applying these ideas to large-scale dynamical systems such as discretizations of multi-dimensional PDEs is challenging. Such systems can easily give rise to problems with billions of dynamical variables, requiring specialized numerical algorithms implemented on high performance computing architectures with thousands of processors. This talk will describe LOCA, the Library of Continuation Algorithms, a suite of scalable continuation and bifurcation tools optimized for these types of systems that is part of the Trilinos software collection. In particular, we will describe continuation and bifurcation analysis techniques designed for large-scale dynamical systems that are based on specialized parallel linear algebra methods for solving augmented linear systems. We will also discuss several other Trilinos tools providing nonlinear solvers (NOX), eigensolvers (Anasazi), iterative linear solvers (AztecOO and Belos), preconditioners (Ifpack, ML, Amesos) and parallel linear algebra data structures (Epetra and Tpetra) that LOCA can leverage for efficient and scalable analysis of large-scale dynamical systems.
The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T + 1? Specifically, we look at bipartite graphs changing over time and consider matrix- and tensor-based methods for predicting links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem.
Modeling the interaction of dislocations with internal boundaries and free surfaces is essential to understanding the effect of material microstructure on dislocation motion. However, discrete dislocation dynamics methods rely on infinite domain solutions of dislocation fields which makes modeling of heterogeneous materials difficult. A finite domain dislocation dynamics capability is under development that resolves both the dislocation array and polycrystalline structure in a compatible manner so that free surfaces and material interfaces are easily treated. In this approach the polycrystalline structure is accommodated using the GFEM, and the displacement due to the dislocation array is added to the displacement approximation. Shown in figure 1 are representative results from simulations of randomly placed and oriented dislocation sources in a cubic nickel polycrystal. Each grain has a randomly assigned (unique) material basis, and available glide planes are chosen accordingly. The change in basis between neighboring grains has an important effect on the motion of dislocations since the resolved shear on available glide planes can change dramatically. Dislocation transmission through high angle grain boundaries is assumed to occur by absorption into the boundary and subsequent nucleation in the neighboring grain. Such behavior is illustrated in figure 1d. Nucleation from the vertically oriented source in the bottom right grain is due to local stresses from dislocation pile-up in the neighboring grain. In this talk, the method and implementation is presented as well as some representative results from large scale (i.e., massively parallel) simulations of dislocation motion in cubic nano-domain nickel alloy. Particular attention will be paid to the effect of grain size on polycrystalline strength.
As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.
In this presentation we examine the accuracy and performance of a suite of discrete-element-modeling approaches to predicting equilibrium and dynamic rheological properties of polystyrene suspensions. What distinguishes each approach presented is the methodology of handling the solvent hydrodynamics. Specifically, we compare stochastic rotation dynamics (SRD), fast lubrication dynamics (FLD) and dissipative particle dynamics (DPD). Method-to-method comparisons are made as well as comparisons with experimental data. Quantities examined are equilibrium structure properties (e.g. pair-distribution function), equilibrium dynamic properties (e.g. short- and long-time diffusivities), and dynamic response (e.g. steady shear viscosity). In all approaches we deploy the DLVO potential for colloid-colloid interactions. Comparisons are made over a range of volume fractions and salt concentrations. Our results reveal the utility of such methods for long-time diffusivity prediction can be dubious in certain ranges of volume fraction, and other discoveries regarding the best formulation to use in predicting rheological response.