Publications

Results 1–25 of 131

Search results

Jump to search filters

Tensor decompositions for count data that leverage stochastic and deterministic optimization

Optimization Methods and Software

Myers, Jeremy M.; Dunlavy, Daniel M.

There is growing interest to extend low-rank matrix decompositions to multi-way arrays, or tensors. One fundamental low-rank tensor decomposition is the canonical polyadic decomposition (CPD). The challenge of fitting a low-rank, nonnegative CPD model to Poisson-distributed count data is of particular interest. Several popular algorithms use local search methods to approximate the maximum likelihood estimator (MLE) of the Poisson CPD model. This work presents two new algorithms that extend state-of-the-art local methods for Poisson CPD. Hybrid GCP-CPAPR combines Generalized Canonical Decomposition (GCP) with stochastic optimization and CP Alternating Poisson Regression (CPAPR), a deterministic algorithm, to increase the probability of converging to the MLE over either method used alone. Restarted CPAPR with SVDrop uses a heuristic based on the singular values of the CPD model unfoldings to identify convergence toward optimizers that are not the MLE and restarts within the feasible domain of the optimization problem, thus reducing overall computational cost when using a multi-start strategy. We provide empirical evidence that indicates our approaches outperform existing methods with respect to converging to the Poisson CPD MLE.

More Details

The Average Spectrum Norm and Near-Optimal Tensor Completion

Lopez, Oscar F.; Lehoucq, Rich; Llosa-Vite, Carlos; Prasadan, Arvind; Dunlavy, Daniel M.

We propose the average spectrum norm to study the minimum number of measurements required to approximate a multidimensional array (i.e., sample complexity) via low-rank tensor recovery. Our focus is on the tensor completion problem, where the aim is to estimate a multiway array using a subset of tensor entries corrupted by noise. Our average spectrum norm-based analysis provides near-optimal sample complexities, exhibiting dependence on the ambient dimensions and rank that do not suffer from exponential scaling as the order increases.

More Details

Computing Sparse Tensor Decompositions via Chapel and C++/MPI Interoperability without Intermediate I/O

Geronimo Anderson, Sean I.; Dunlavy, Daniel M.

We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several real-world datasets, providing timing results that illustrate minimal overhead incurred using this approach. Finally, we extend our work to improve memory usage and provide convenient random access to sparse shared mapped memory tensor elements in Chapel, while still being capable of leveraging high performance implementations of tensor algorithms in C++.

More Details

Zero-truncated Poisson regression for sparse multiway count data corrupted by false zeros

Information and Inference

Dunlavy, Daniel M.; Lehoucq, Rich; Lopez, Oscar F.

We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an N-way rank-R parametric tensor M ∈ (0, ∞)I×.....×I generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately IR2 log22(I) non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.

More Details

A Hybrid Method for Tensor Decompositions that Leverages Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel M.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

More Details

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar F.; Lehoucq, Rich; Dunlavy, Daniel M.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times

Dunlavy, Daniel M.; Chew, Peter A.

We present a novel approach to information retrieval and document analysis based on graph analytic methods. Traditional information retrieval methods use a set of terms to define a query that is applied against a document corpus to identify the documents most related to those terms. In contrast, we define a query as a set of documents of interest and apply the query by computing mean hitting times between this set and all other documents on a document similarity graph abstraction of the semantic relationships between all pairs of documents. We present the steps of our approach along with a simple example application illustrating how this approach can be used to find documents related to two or more documents or topics of interest.

More Details
Results 1–25 of 131
Results 1–25 of 131