Publications Details

Publications / LDRD Report

Chaconne: A Statistical Approach to Nonlocal Compression for Supervised Learning, Semi-Supervised Learning, and Anomaly Detection

Foss, Alexander; Field, Richard V.; Ting, Christina; Shuler, Kurtis; Bauer, Travis L.; Zhao, Sihai D.; Cardenas-Torres, Eduardo

This project developed a novel statistical understanding of compression analytics (CA), which has challenged and clarified some core assumptions about CA, and enabled the development of novel techniques that address vital challenges of national security. Specifically, this project has yielded the development of novel capabilities including 1. Principled metrics for model selection in CA, 2. Techniques for deriving/applying optimal classification rules and decision theory to supervised CA, including how to properly handle class imbalance and differing costs of misclassification, 3. Two techniques for handling nonlocal information in CA, 4. A novel technique for unsupervised CA that is agnostic with regard to the underlying compression algorithm, 5. A framework for semisupervised CA when a small number of labels are known in an otherwise large unlabeled dataset. 6. The academic alliance component of this project has focused on the development of a novel exemplar-based Bayesian technique for estimating variable length Markov models (closely related to PPM [prediction by partial matching] compression techniques). We have developed examples illustrating the application of our work to text, video, genetic sequences, and unstructured cybersecurity log files.