Publications Search

A wide range of machine learning problems, including astronomical inference about galaxy clusters, scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This project explores problems in learning such functions via kernel methods, particularly for large-scale problems. When learning from large numbers of distributions, the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We first improve the understanding of the workhorse methods of random Fourier features: we show that of the two approaches in common usage, one is strictly superior. We then present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications.

More Details

TYPE Other Report YEAR 2016

DOI OSTI

Counter-Adversarial Community Detection: Initial Investigations

Kegelmeyer, William P.; Wendt, Jeremy; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

PANTHER. Trajectory Analysis

Foulk, James W.; Wilson, Andrew T.; Valicka, Christopher G.; Kegelmeyer, William P.; Shead, Timothy M.; Czuchlewski, Kristina R.; Newton, Benjamin D.

We want to organize a body of trajectories in order to identify, search for, classify and predict behavior among objects such as aircraft and ships. Existing compari- son functions such as the Fr'echet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as total distance traveled and distance be- tween start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans that are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to iden- tify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories, predict destination and apply unsupervised machine learning algorithms.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

Counter-Adversarial Data Analytics: Machine Learning (and Graph Analysis)

Kegelmeyer, William P.; Pinar, Ali P.; Zage, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Attacking DBSCAN for Fun and Profit

Crussell, Jonathan; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Attacking DBSCAN for Fun and Profit

Crussell, Jonathan; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Characterizing and Detecting Aircraft Identity and Diversion

Kegelmeyer, William P.; Shead, Timothy M.; Foulk, James W.; Wilson, Andrew T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Attacking DBSCAN for Fun and Profit

Crussell, Jonathan; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Attacking DBSCAN for Fun and Profit

Crussell, Jonathan; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Counter Adversarial Data Analytics (CADA)

Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance

Kegelmeyer, William P.; Chiang, Ken C.; Ingram, Joe B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI DOI

Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance

Kegelmeyer, William P.; Chiang, Ken; Ingram, Joe B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Estimating Extrapolation Risk in Supervised Machine Learning: Should I trust this prediction?

Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Estimating Extrapolation Risk in Supervised Machine Learning: Should I trust this prediction?

Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning

Kegelmeyer, William P.; Bader, Brett W.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

COMET: A recipe for learning and using large ensembles on massive data

Proceedings - IEEE International Conference on Data Mining, ICDM

Basilico, Justin D.; Munson, Miles A.; Dixon, Kevin R.; Kolda, Tamara G.; Kegelmeyer, William P.

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more. © 2011 IEEE.

More Details

TYPE Conference YEAR 2011

OSTI Scopus

Publications

Search results