This paper presents a probabilistic model for various machine learning (ML) applications. While deep learning (DL) has produced state-of-the-art results in many domains, DL models are complex and over-parameterized, which leads to high uncertainty about what the model has learned, as well as its decision process. Further, DL models are not probabilistic, making reasoning about their output challenging. In contrast, the proposed model, referred to as Yet Another Discriminate Analysis(YADA), is less complex than other methods, is based on a mathematically rigorous foundation, and can be utilized for a wide variety of ML tasks including classification, explainability, and uncertainty quantification. YADA is thus competitive in most cases with many state-of-the-art DL models. Ideally, a probabilistic model would represent the full joint probability distribution of its features, but doing so is often computationally expensive and intractable. Hence, many probabilistic models assume that the features are either normally distributed, mutually independent, or both, which can severely limit their performance. YADA is an intermediate model that (1) captures the marginal distributions of each variable and the pairwise correlations between variables and (2) explicitly maps features to the space of multivariate Gaussian variables. Numerous mathematical properties of the YADA model can be derived, thereby improving the theoretic underpinnings of ML. Validation of the model can be statistically verified on new or held-out data using native properties of YADA. However, there are some engineering and practical challenges that we enumerate to make YADA more useful.
his report details the findings from the research and investigation of Geometric Measures of Trustworthiness for Machine Learning Predictions. We explored the trustworthiness of machine learning (ML) models’ predictions using geometric measures to quantify the similarity of a query point with the training data. Predictive uncertainty in ML can originate from at least three sources: (1) Model uncertainty, which represents the uncertainty in model form (e.g. decision tree, vs neural network) and estimating the model parameters from the training data, (2) Data uncertainty, which represents the natural complexities of the data such as class overlap and inherent noise, and (3) Distributional uncertainty, which represents the mismatch between the training and operational distributions. The proposed measures focus on measuring and explaining the data and distributional uncertainties by measuring the relationships of operational data with the training data.
Deep neural networks (DNNs) achieve state-of-the-art performance in video anomaly detection. However, the usage of DNNs is limited in practice due to their computational overhead, generally requiring significant resources and specialized hardware. Further, despite recent progress, current evaluation criteria of video anomaly detection algorithms are flawed, preventing meaningful comparisons among algorithms. In response to these challenges, we propose (1) a compression-based technique referred to as Spatio-Temporal N-Gram Prediction by Partial Matching (STNG PPM) and (2) simple modifications to current evaluation criteria for improved interpretation and broader applicability across algorithms. STNG PMM does not require specialized hardware, has few parameters to tune, and is competitive with DNNs on multiple benchmark data sets in video anomaly detection.
Deep neural networks (DNNs) achieve state-of-the-art performance in video anomaly detection. However, the usage of DNNs is limited in practice due to their computational overhead, generally requiring significant resources and specialized hardware. Further, despite recent progress, current evaluation criteria of video anomaly detection algorithms are flawed, preventing meaningful comparisons among algorithms. In response to these challenges, we propose (1) a compression-based technique referred to as Spatio-Temporal N-Gram Prediction by Partial Matching (STNG PPM) and (2) simple modifications to current evaluation criteria for improved interpretation and broader applicability across algorithms. STNG PMM does not require specialized hardware, has few parameters to tune, and is competitive with DNNs on multiple benchmark data sets in video anomaly detection.
As Machine Learning (ML) continues to advance, it is being integrated into more systems. Often, the ML component represents a significant portion of the system that reduces the burden on the end user or significantly improves task performance. However, the ML component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in task performance, the models are often black boxes. Evaluating the credibility and the vulnerabilities of ML models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, here we present considerations to evaluate systems embedded with an ML component within a red-teaming inspired methodology. We focus on (1) cyber vulnerabilities to an ML model, (2) evaluating performance gaps, and (3) adversarial ML vulnerabilities.
The primary purpose of this document is to outline the progress made on the LDRD titled “Identifying and Explaining Anomalous Activity in Surveillance Video with Compression Algorithms” in FY22 and FY23. In this LDRD, we explored the usage of compression-based analytics to identify anomalous activity in video. We developed a novel algorithm, Spatio-Temporal N-Gram PPM (STNG PPM) that accounts for spatially and temporally aware anomalies in video. We extracted features using motions vectors from video as well as operating on the raw features. STNG PPM is comparable to many deep learning approaches but does not require specialized hardware (GPUs) to run efficiently. We also examine the evaluation metrics and propose novel measures addressing faults in the current evaluation measures.
In recent years, infections and damage caused by malware have increased at exponential rates. At the same time, machine learning (ML) techniques have shown tremendous promise in many domains, often out performing human efforts by learning from large amounts of data. Results in the open literature suggest that ML is able to provide similar results for malware detection, achieving greater than 99% classifcation accuracy [49]. However, the same detection rates when applied in deployed settings have not been achieved. Malware is distinct from many other domains in which ML has shown success in that (1) it purposefully tries to hide, leading to noisy labels and (2) often its behavior is similar to benign software only differing in intent, among other complicating factors. This report details the reasons for the diffcultly of detecting novel malware by ML methods and offers solutions to improve the detection of novel malware.
This report details the results of a three-fold investigation of sensitivity analysis (SA) for machine learning (ML) explainability (MLE): (1) the mathematical assessment of the fidelity of an explanation with respect to a learned ML model, (2) quantifying the trustworthiness of a prediction, and (3) the impact of MLE on the efficiency of end-users through multiple users studies. We focused on the cybersecurity domain as the data is inherently non-intuitive. As ML is being using in an increasing number of domains, including domains where being wrong can elicit high consequences, MLE has been proposed as a means of generating trust in a learned ML models by end users. However, little analysis has been performed to determine if the explanations accurately represent the target model and they themselves should be trusted beyond subjective inspection. Current state-of-the-art MLE techniques only provide a list of important features based on heuristic measures and/or make certain assumptions about the data and the model which are not representative of the real-world data and models. Further, most are designed without considering the usefulness by an end-user in a broader context. To address these issues, we present a notion of explanation fidelity based on Shapley values from cooperative game theory. We find that all of the investigated MLE explainability methods produce explanations that are incongruent with the ML model that is being explained. This is because they make critical assumptions about feature independence and linear feature interactions for computational reasons. We also find that in deployed, explanations are rarely used due to a variety of reason including that there are several other tools which are trusted more than the explanations and there is little incentive to use the explanations. In the cases when the explanations are used, we found that there is the danger that explanations persuade the end users to wrongly accept false positives and false negatives. However, ML model developers and maintainers find the explanations more useful to help ensure that the ML model does not have obvious biases. In light of these findings, we suggest a number of future directions including developing MLE methods that directly model non-linear model interactions and including design principles that take into account the usefulness of explanations to the end user. We also augment explanations with a set of trustworthiness measures that measure geometric aspects of the data to determine if the model output should be trusted.