Publications Search

Automated processing, modeling, and analysis of unstructured text (news documents, web content, journal articles, etc.) is a key task in many data analysis and decision making applications. As data sizes grow, scalability is essential for deep analysis. In many cases, documents are modeled as term or feature vectors and latent semantic analysis (LSA) is used to model latent, or hidden, relationships between documents and terms appearing in those documents. LSA supplies conceptual organization and analysis of document collections by modeling high-dimension feature vectors in many fewer dimensions. While past work on the scalability of LSA modeling has focused on the SVD, the goal of our work is to investigate the use of distributed memory architectures for the entire text analysis process, from data ingestion to semantic modeling and analysis. ParaText is a set of software components for distributed processing, modeling, and analysis of unstructured text. The ParaText source code is available under a BSD license, as an integral part of the Titan toolkit. ParaText components are chained-together into data-parallel pipelines that are replicated across processes on distributed-memory architectures. Individual components can be replaced or rewired to explore different computational strategies and implement new functionality. ParaText functionality can be embedded in applications on any platform using the native C++ API, Python, or Java. The ParaText MPI Process provides a 'generic' text analysis pipeline in a command-line executable that can be used for many serial and parallel analysis tasks. ParaText can also be deployed as a web service accessible via a RESTful (HTTP) API. In the web service configuration, any client can access the functionality provided by ParaText using commodity protocols ... from standard web browsers to custom clients written in any language.

More Details

TYPE Conference YEAR 2010

OSTI

A highly reliable RAID system based on GPUs

Curry, Matthew L.

While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance.

More Details

TYPE Conference YEAR 2010

OSTI

Data mining for ontology development

Davidson, George S.; Schoenwald, David A.

A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

More Details

TYPE SAND Report YEAR 2010

DOI OSTI

Optical holography as an analogue for a neural re-use mechanism: Comment on Anderson (2010)

Behavioral and Brain Sciences

Verzi, Stephen J.; Wagner, John S.; Warrender, Christina E.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI DOI

Evolutionary optimization of interatomic potentials using genetic programming

Jayaraman, Saivenkataraman J.

After more than 50 years of molecular simulations, accurate empirical models are still the bottleneck in the wide adoption of simulation techniques. Addressing this issue with a fresh paradigm is the need of the day. In this study, we outline a new genetic-programming based method to develop empirical models for a system purely from its energy and/or forces. While the approach was initially developed for the development of classical force-fields from ab-initio calculations, we also discuss its application to the molecular coarse-graining of methanol. Two models, one representing methanol by a single site and the other via two sites will be developed using this method. They will be validated against existing coarse-grained potentials for methanol by comparing thermophysical properties.

More Details

TYPE Conference YEAR 2010

OSTI

Initial performance of fully-coupled AMG and approximate block factorization preconditioners for solution of implicit FE resistive MHD

Shadid, John N.

This brief paper explores the development of scalable, nonlinear, fully-implicit solution methods for a stabilized unstructured finite element (FE) discretization of the 2D incompressible (reduced) resistive MHD system. The discussion considers the stabilized FE formulation in context of a fully-implicit time integration and direct-to-steady-state solution capability. The nonlinear solver strategy employs Newton-Krylov methods, which are preconditioned using fully-coupled algebraic multilevel (AMG) techniques and a new approximate block factorization (ABF) preconditioner. The intent of these preconditioners is to enable robust, scalable and efficient solution approaches for the large-scale sparse linear systems generated by the Newton linearization. We present results for the fully-coupled AMG preconditioner for two prototype problems, a low Lundquist number MHD Faraday conduction pump and moderately-high Lundquist number incompressible magnetic island coalescence problem. For the MHD pump results we explore the scaling of the fully-coupled AMG preconditioner for up to 4096 processors for problems with up to 64M unknowns on a CrayXT3/4. Using the island coalescence problem we explore the weak scaling of the AMG preconditioner and the influence of the Lundquist number on the iteration count. Finally we present some very recent results for the algorithmic scaling of the ABF preconditioner.

More Details

TYPE Conference YEAR 2010

OSTI

Continuation and bifurcation analysis of large-scale dynamical systems with LOCA

Salinger, Andrew G.; Pawlowski, Roger P.

Dynamical systems theory provides a powerful framework for understanding the behavior of complex evolving systems. However applying these ideas to large-scale dynamical systems such as discretizations of multi-dimensional PDEs is challenging. Such systems can easily give rise to problems with billions of dynamical variables, requiring specialized numerical algorithms implemented on high performance computing architectures with thousands of processors. This talk will describe LOCA, the Library of Continuation Algorithms, a suite of scalable continuation and bifurcation tools optimized for these types of systems that is part of the Trilinos software collection. In particular, we will describe continuation and bifurcation analysis techniques designed for large-scale dynamical systems that are based on specialized parallel linear algebra methods for solving augmented linear systems. We will also discuss several other Trilinos tools providing nonlinear solvers (NOX), eigensolvers (Anasazi), iterative linear solvers (AztecOO and Belos), preconditioners (Ifpack, ML, Amesos) and parallel linear algebra data structures (Epetra and Tpetra) that LOCA can leverage for efficient and scalable analysis of large-scale dynamical systems.

More Details

TYPE Conference YEAR 2010

OSTI

UQ Algorithm Research and Advanced Deployment within the DAKOTA Project

Eldred, Michael S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Link prediction on evolving graphs using matrix and tensor factorizations

Kolda, Tamara G.; Dunlavy, Daniel D.

The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T + 1? Specifically, we look at bipartite graphs changing over time and consider matrix- and tensor-based methods for predicting links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem.

More Details

TYPE Conference YEAR 2010

OSTI

Trilinos Pre-Checkin Test Script

Willenbring, James M.; Bartlett, Roscoe B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Modeling dislocations in a polycrystal using the generalized finite element method

Robbins, Joshua R.

Modeling the interaction of dislocations with internal boundaries and free surfaces is essential to understanding the effect of material microstructure on dislocation motion. However, discrete dislocation dynamics methods rely on infinite domain solutions of dislocation fields which makes modeling of heterogeneous materials difficult. A finite domain dislocation dynamics capability is under development that resolves both the dislocation array and polycrystalline structure in a compatible manner so that free surfaces and material interfaces are easily treated. In this approach the polycrystalline structure is accommodated using the GFEM, and the displacement due to the dislocation array is added to the displacement approximation. Shown in figure 1 are representative results from simulations of randomly placed and oriented dislocation sources in a cubic nickel polycrystal. Each grain has a randomly assigned (unique) material basis, and available glide planes are chosen accordingly. The change in basis between neighboring grains has an important effect on the motion of dislocations since the resolved shear on available glide planes can change dramatically. Dislocation transmission through high angle grain boundaries is assumed to occur by absorption into the boundary and subsequent nucleation in the neighboring grain. Such behavior is illustrated in figure 1d. Nucleation from the vertically oriented source in the bottom right grain is due to local stresses from dislocation pile-up in the neighboring grain. In this talk, the method and implementation is presented as well as some representative results from large scale (i.e., massively parallel) simulations of dislocation motion in cubic nano-domain nickel alloy. Particular attention will be paid to the effect of grain size on polycrystalline strength.

More Details

TYPE Conference YEAR 2010

OSTI

Root Cause Analysis of Networked Computer Alerts

Technometrics

Stearley, Jon S.; Mitchell, Scott A.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI

Modeling Prompt Neutron Effects for Stockpile Bipolar Junction Transistors

Lin, Paul L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Exotic atomic physics observed in a two electron analogue-atom in silicon

Nature Physics

Rahman, Rajib R.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI

Factors impacting performance of multithreaded triangular solve

Wolf, Michael W.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.

More Details

TYPE Conference YEAR 2010

OSTI

Combinatorial algorithms : the real power behind parallel computing

Hendrickson, Bruce A.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

First principles investigation of low energy E' precursors in amorphous silica

Physical Review Letters

Schultz, Peter A.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI

A Cortical-Hippocampal Neural Architecture for Episodic Memory with Information Theoretic Model Analysis

Vineyard, Craig M.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Market Simulations for Evaluation of Regulatory Strategies

Watson, Jean-Paul W.; Siirola, John D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Performance of mesoscale modeling methods for predicting microstructure, mobility and rheology of charged suspensions

Plimpton, Steven J.; Schunk, Randy; Lechman, Jeremy B.; Grest, Gary S.; Pierce, Flint P.; Grillet, Anne M.

In this presentation we examine the accuracy and performance of a suite of discrete-element-modeling approaches to predicting equilibrium and dynamic rheological properties of polystyrene suspensions. What distinguishes each approach presented is the methodology of handling the solvent hydrodynamics. Specifically, we compare stochastic rotation dynamics (SRD), fast lubrication dynamics (FLD) and dissipative particle dynamics (DPD). Method-to-method comparisons are made as well as comparisons with experimental data. Quantities examined are equilibrium structure properties (e.g. pair-distribution function), equilibrium dynamic properties (e.g. short- and long-time diffusivities), and dynamic response (e.g. steady shear viscosity). In all approaches we deploy the DLVO potential for colloid-colloid interactions. Comparisons are made over a range of volume fractions and salt concentrations. Our results reveal the utility of such methods for long-time diffusivity prediction can be dubious in certain ranges of volume fraction, and other discoveries regarding the best formulation to use in predicting rheological response.

More Details

TYPE Conference YEAR 2010

OSTI