Publications

Results 51–69 of 69

Search results

Jump to search filters

Tracking topic birth and death in LDA

Wilson, Andrew T.; Robinson, David G.

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.

More Details

Data intensive computing at Sandia

Wilson, Andrew T.

Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.

More Details

Network algorithms for information analysis using the Titan Toolkit

Wylie, Brian N.; Wilson, Andrew T.

The analysis of networked activities is dramatically more challenging than many traditional kinds of analysis. A network is defined by a set of entities (people, organizations, banks, computers, etc.) linked by various types of relationships. These entities and relationships are often uninteresting alone, and only become significant in aggregate. The analysis and visualization of these networks is one of the driving factors behind the creation of the Titan Toolkit. Given the broad set of problem domains and the wide ranging databases in use by the information analysis community, the Titan Toolkit's flexible, component based pipeline provides an excellent platform for constructing specific combinations of network algorithms and visualizations.

More Details

Exploring 2D tensor fields using stress nets

Wilson, Andrew T.; Brannon, Rebecca M.

In this article we describe stress nets, a technique for exploring 2D tensor fields. Our method allows a user to examine simultaneously the tensors eigenvectors (both major and minor) as well as scalar-valued tensor invariants. By avoiding noise-advection techniques, we are able to display both principal directions of the tensor field as well as the derived scalars without cluttering the display. We present a CPU-only implementation of stress nets as well as a hybrid CPU/GPU approach and discuss the relative strengths and weaknesses of each. Stress nets have been used as part of an investigation into crack propagation. They were used to display the directions of maximum shear in a slab of material under tension as well as the magnitude of the shear forces acting on each point. Our methods allowed users to find new features in the data that were not visible on standard plots of tensor invariants. These features disagree with commonly accepted analytical crack propagation solutions and have sparked renewed investigation. Though developed for a materials mechanics problem, our method applies equally well to any 2D tensor field having unique characteristic directions.

More Details
Results 51–69 of 69
Results 51–69 of 69