Publications

16 Results

Search results

Jump to search filters

Chaconne: A Statistical Approach to Nonlocal Compression for Supervised Learning, Semi-Supervised Learning, and Anomaly Detection

Foss, Alexander; Field, Richard V.; Ting, Christina; Shuler, Kurtis; Bauer, Travis L.; Zhao, Sihai D.; Cardenas-Torres, Eduardo

This project developed a novel statistical understanding of compression analytics (CA), which has challenged and clarified some core assumptions about CA, and enabled the development of novel techniques that address vital challenges of national security. Specifically, this project has yielded the development of novel capabilities including 1. Principled metrics for model selection in CA, 2. Techniques for deriving/applying optimal classification rules and decision theory to supervised CA, including how to properly handle class imbalance and differing costs of misclassification, 3. Two techniques for handling nonlocal information in CA, 4. A novel technique for unsupervised CA that is agnostic with regard to the underlying compression algorithm, 5. A framework for semisupervised CA when a small number of labels are known in an otherwise large unlabeled dataset. 6. The academic alliance component of this project has focused on the development of a novel exemplar-based Bayesian technique for estimating variable length Markov models (closely related to PPM [prediction by partial matching] compression techniques). We have developed examples illustrating the application of our work to text, video, genetic sequences, and unstructured cybersecurity log files.

More Details

Distance Metrics and Clustering Methods for Mixed-type Data

International Statistical Review

Foss, Alexander; Markatou, Marianthi; Ray, Bonnie

In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed–type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Here, the guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.

More Details
16 Results
16 Results