Publications Search

This LDRD 149045 final report describes work that Sandians Scott A. Mitchell, Randall Laviolette, Shawn Martin, Warren Davis, Cindy Philips and Danny Dunlavy performed in 2010. Prof. Afra Zomorodian provided insight. This was a small late-start LDRD. Several other ongoing efforts were leveraged, including the Networks Grand Challenge LDRD, and the Computational Topology CSRF project, and the some of the leveraged work is described here. We proposed a sentence mining technique that exploited both the distribution and the order of parts-of-speech (POS) in sentences in English language documents. The ultimate goal was to be able to discover 'call-to-action' framing documents hidden within a corpus of mostly expository documents, even if the documents were all on the same topic and used the same vocabulary. Using POS was novel. We also took a novel approach to analyzing POS. We used the hypothesis that English follows a dynamical system and the POS are trajectories from one state to another. We analyzed the sequences of POS using support vector machines and the cycles of POS using computational homology. We discovered that the POS were a very weak signal and did not support our hypothesis well. Our original goal appeared to be unobtainable with our original approach. We turned our attention to study an aspect of a more traditional approach to distinguishing documents. Latent Dirichlet Allocation (LDA) turns documents into bags-of-words then into mixture-model points. A distance function is used to cluster groups of points to discover relatedness between documents. We performed a geometric and algebraic analysis of the most popular distance functions and made some significant and surprising discoveries, described in a separate technical report.

More Details

TYPE SAND Report YEAR 2010

OSTI DOI

Geometric comparison of popular mixture-model distances

Mitchell, Scott A.

Statistical Latent Dirichlet Analysis produces mixture model data that are geometrically equivalent to points lying on a regular simplex in moderate to high dimensions. Numerous other statistical models and techniques also produce data in this geometric category, even though the meaning of the axes and coordinate values differs significantly. A distance function is used to further analyze these points, for example to cluster them. Several different distance functions are popular amongst statisticians; which distance function is chosen is usually driven by the historical preference of the application domain, information-theoretic considerations, or by the desirability of the clustering results. Relatively little consideration is usually given to how distance functions geometrically transform data, or the distances algebraic properties. Here we take a look at these issues, in the hope of providing complementary insight and inspiring further geometric thought. Several popular distances, {chi}{sup 2}, Jensen - Shannon divergence, and the square of the Hellinger distance, are shown to be nearly equivalent; in terms of functional forms after transformations, factorizations, and series expansions; and in terms of the shape and proximity of constant-value contours. This is somewhat surprising given that their original functional forms look quite different. Cosine similarity is the square of the Euclidean distance, and a similar geometric relationship is shown with Hellinger and another cosine. We suggest a geodesic variation of Hellinger. The square-root projection that arises in Hellinger distance is briefly compared to standard normalization for Euclidean distance. We include detailed derivations of some ratio and difference bounds for illustrative purposes. We provide some constructions that nearly achieve the worst-case ratios, relevant for contours.

More Details

TYPE Conference YEAR 2010

OSTI

Root Cause Analysis of Networked Computer Alerts

Technometrics

Stearley, Jon S.; Mitchell, Scott A.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI

Summary of the CSRI Workshop on Combinatorial Algebraic Topology (CAT): Software, Applications, & Algorithms

Mitchell, Scott A.; Bennett, Janine C.; Day, David M.

This report summarizes the Combinatorial Algebraic Topology: software, applications & algorithms workshop (CAT Workshop). The workshop was sponsored by the Computer Science Research Institute of Sandia National Laboratories. It was organized by CSRI staff members Scott Mitchell and Shawn Martin. It was held in Santa Fe, New Mexico, August 29-30. The CAT Workshop website has links to some of the talk slides and other information, http://www.cs.sandia.gov/CSRI/Workshops/2009/CAT/index.html. The purpose of the report is to summarize the discussions and recap the sessions. There is a special emphasis on technical areas that are ripe for further exploration, and the plans for follow-up amongst the workshop participants. The intended audiences are the workshop participants, other researchers in the area, and the workshop sponsors.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI

CAT Workshop 2009 Poster

Mitchell, Scott A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2009

OSTI

Distance-avoiding sequences for extremely low-bandwidth authentication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Collins, Michael J.; Mitchell, Scott A.

We develop a scheme for providing strong cryptographic authentication on a stream of messages which consumes very little bandwidth (as little as one bit per message) and is robust in the presence of dropped messages. Such a scheme should be useful for extremely low-power, low-bandwidth wireless sensor networks and "smart dust" applications. The tradeoffs among security, memory, bandwidth, and tolerance for missing messages give rise to several new optimization problems. We report on experimental results and derive bounds on the performance of the scheme. © 2008 Springer-Verlag Berlin Heidelberg.

More Details

TYPE Conference YEAR 2008

Scopus OSTI

R&D for computational cognitive and social models : foundations for model evaluation through verification and validation (final LDRD report)

McNamara, Laura A.; Trucano, Timothy G.; Backus, George A.; Mitchell, Scott A.

Sandia National Laboratories is investing in projects that aim to develop computational modeling and simulation applications that explore human cognitive and social phenomena. While some of these modeling and simulation projects are explicitly research oriented, others are intended to support or provide insight for people involved in high consequence decision-making. This raises the issue of how to evaluate computational modeling and simulation applications in both research and applied settings where human behavior is the focus of the model: when is a simulation 'good enough' for the goals its designers want to achieve? In this report, we discuss two years' worth of review and assessment of the ASC program's approach to computational model verification and validation, uncertainty quantification, and decision making. We present a framework that extends the principles of the ASC approach into the area of computational social and cognitive modeling and simulation. In doing so, we argue that the potential for evaluation is a function of how the modeling and simulation software will be used in a particular setting. In making this argument, we move from strict, engineering and physics oriented approaches to V&V to a broader project of model evaluation, which asserts that the systematic, rigorous, and transparent accumulation of evidence about a model's performance under conditions of uncertainty is a reasonable and necessary goal for model evaluation, regardless of discipline. How to achieve the accumulation of evidence in areas outside physics and engineering is a significant research challenge, but one that requires addressing as modeling and simulation tools move out of research laboratories and into the hands of decision makers. This report provides an assessment of our thinking on ASC Verification and Validation, and argues for further extending V&V research in the physical and engineering sciences toward a broader program of model evaluation in situations of high consequence decision-making.

More Details

TYPE SAND Report YEAR 2008

OSTI DOI

1411 Department Review 2006

Mitchell, Scott A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2006

OSTI

Methods for Multisweep Automation

Shepherd, Jason F.; Mitchell, Scott A.; Knupp, Patrick K.; Mitchell, Scott A.

Sweeping has become the workhorse algorithm for creating conforming hexahedral meshes of complex models. This paper describes progress on the automatic, robust generation of MultiSwept meshes in CUBIT. MultiSweeping extends the class of volumes that may be swept to include those with multiple source and multiple target surfaces. While not yet perfect, CUBIT's MultiSweeping has recently become more reliable, and been extended to assemblies of volumes. Sweep Forging automates the process of making a volume (multi) sweepable: Sweep Verification takes the given source and target surfaces, and automatically classifies curve and vertex types so that sweep layers are well formed and progress from sources to targets.

More Details

TYPE Conference YEAR 2000

OSTI