Publications Search

Logical activation functions for training arbitrary probabilistic Boolean operations

Information Sciences

Duersch, Jed A.; Catanach, Thomas A.; Das, Niladri

In this work, we introduce a family of novel activation functions for deep neural networks that approximate n-ary, or n-argument, probabilistic logic. Logic has long been used to encode complex relationships between claims that are either true or false. Thus, these activation functions provide a step towards models that can efficiently encode information. Unfortunately, typical feedforward networks with elementwise activation functions cannot capture certain relationships succinctly, such as the exclusive disjunction (p xor q) and conditioned disjunction (if c then p else q). Our n-ary activation functions address this challenge by approximating belief functions (probabilistic Boolean logic) with logit representations of probability and experiments demonstrate the ability to learn arbitrary logical ground truths in a single layer. Further, by representing belief tables using a basis that associates the number of nonzero parameters with the effective arity of each belief function, we forge a concrete relationship between logical complexity and sparsity, thus opening new optimization approaches to suppress logical complexity during training. We provide a computationally efficient PyTorch implementation and test our activation functions against other logic-approximating activation functions on both traditional machine learning tasks as well as reproducing known logical relationships.

More Details

TYPE Journal Article YEAR 2024

DOI OSTI Scopus

Low-Variance Hessian Estimation: Coordinating High-Dimensional Sparsification with Projective Integral Updates

Duersch, Jed A.

We can improve network pruning by leveraging the loss-topography extraction techniques used by projective integral updates for variational inference. Low-variance Hessians facilitate more aggressive pruning by providing better loss approximations when a parameter is removed.

More Details

TYPE Conference Presentation YEAR 2024

DOI OSTI

Extending Parsimonious Bayesian Inference

Duersch, Jed A.

Parsimonious Bayesian inference is a theoretical framework for efficient data assimilation that seeks to balance increased consistency between predictions and training data against corresponding increases in model complexity. Within this framework, over-training is understood as optimization that encodes excessive information within model parameters while only achieving small improvements between predictions and training data. This project aims to develop practical methods of limiting excess model information during optimization. One key observation is that practical heuristics for parsimonious learning in high-dimensions must balance expressivity, i.e. the ability of the model to capture diverse predictions with only a few non-zero parameters, against discoverability, i.e. the ability to train the model with gradient-based optimization and drive parameters to low information states. As such, we developed logical activation functions that are able to adaptively approximate arbitrary truth tables that define Boolean logic operations within a probabilistic framework. These functions have demonstrated the ability to learn exclusive disjunction (XOR) and conditioned disjunction (if [condition] then [result_if_true] else [result_if_false]) within a single layer of a neural network. To efficiently exploit these activation functions to drive parsimonious learning required several other advances within the domain of variational inference. The most efficient form of complexity suppression is structured sparsification, driving most model parameters to zero while achieving the structural coherence among nonzeros needed for bandwidth reduction. Such models are not only far more efficient at suppressing information-theoretic complexity, they also reduce the other forms of complexity (computations, communication, storage, and the number of dependencies needed to evaluate predictions). Aiming to support enhanced sparsification, this project examined new approaches to high-dimensional variational inference that allow us to calibrate and control parameter uncertainty during optimization. By identifying which parameters can sustain sparsifying perturbations with little impact on prediction quality, we can develop better pruning strategies by framing them as approximate Bayesian inference. These advances also open paths to mitigate concerns with deploying advanced learning methods in resource-constrained environments, such as running models on power-limited or communication-limited devices.

More Details

TYPE LDRD Report YEAR 2023

DOI OSTI

Variational Kalman Filtering with H∞-Based Correction for Robust Bayesian Learning in High Dimensions

Das, Niladri; Duersch, Jed A.; Catanach, Thomas A.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Adaptive High-Arity Logical Activation Functions

Duersch, Jed A.; Catanach, Thomas A.; Das, Niladri

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Variational Kalman Filtering with H∞-Based Correction for Robust Bayesian Learning in High Dimensions

Proceedings of the IEEE Conference on Decision and Control

Das, Niladri; Duersch, Jed A.; Catanach, Thomas A.

In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and H∞-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large-scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a frame-work that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a H∞-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF-H∞ recursion that employs consecutive variational inference and H∞ based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.

More Details

TYPE Conference Paper YEAR 2022

OSTI Scopus

CIS Project 22359, Final Technical Report. Discretized Posterior Approximation in High Dimensions

Duersch, Jed A.; Catanach, Thomas A.

Our primary aim in this work is to understand how to efficiently obtain reliable uncertainty quantification in automatic learning algorithms with limited training datasets. Standard approaches rely on cross-validation to tune hyper parameters. Unfortunately, when our datasets are too small, holdout datasets become unreliable—albeit unbiased—measures of prediction quality due to the lack of adequate sample size. We should not place confidence in holdout estimators under conditions wherein the sample variance is both large and unknown. More poigniantly, our training experiments on limited data (Duersch and Catanach, 2021) show that even if we could improve estimator quality under these conditions, the typical training trajectory may never even encounter generalizable models.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Parsimonious Inference Foundations for Trustworthy-by-Design Machine Learning

Duersch, Jed A.; Catanach, Thomas A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Parsimonious Inference Information-Theoretic Foundations for a Complete Theory of Machine Learning (CIS-LDRD Project 218313 Final Technical Report)

Duersch, Jed A.; Catanach, Thomas A.; Gu, Ming

This work examines how we may cast machine learning within a complete Bayesian framework to quantify and suppress explanatory complexity from first principles. Our investigation into both the philosophy and mathematics of rational belief leads us to emphasize the critical role of Bayesian inference in learning well-justified predictions within a rigorous and complete extended logic. The Bayesian framework allows us to coherently account for evidence in the learned plausibility of potential explanations. As an extended logic, the Bayesian paradigm regards probability as a notion of degrees of truth. In order to satisfy critical properties of probability as a coherent measure, as well as maintain consistency with binary propositional logic, we arrive at Bayes' Theorem as the only justifiable mechanism to update our beliefs to account for empiracle evidence. Yet, in the machine learning paradigm, where explanations are unconstrained algorithmic abstractions, we arrive at a critical challenge: Bayesian inference requires prior belief. Conventional approaches fail to yield a consistent framework in which we could compare prior plausibility among the infinities of potential choices in learning architectures. The difficulty of articulating well-justified prior belief over abstract models is the provinence of memorization in traditional machine learning training practices. This becomes exceptionally problematic in the context of limited datasets, when we wish to learn justifiable predictions from only a small amount of data.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Parsimonious Rational Belief

Duersch, Jed A.; Catanach, Thomas A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

A Bayesian Perspective on Machine Learning and UQ

Catanach, Thomas A.; Duersch, Jed A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Randomized projection for rank-revealing matrix factorizations and low-rank approximations

SIAM Review

Duersch, Jed A.; Gu, Ming

Rank-revealing matrix decompositions provide an essential tool in spectral analysis of matrices, including the Singular Value Decomposition (SVD) and related low-rank approximation techniques. QR with Column Pivoting (QRCP) is usually suitable for these purposes, but it can be much slower than the unpivoted QR algorithm. For large matrices, the difference in performance is due to increased communication between the processor and slow memory, which QRCP needs in order to choose pivots during decomposition. Our main algorithm, Randomized QR with Column Pivoting (RQRCP), uses randomized projection to make pivot decisions from a much smaller sample matrix, which we can construct to reside in a faster level of memory than the original matrix. This technique may be understood as trading vastly reduced communication for a controlled increase in uncertainty during the decision process. For rank-revealing purposes, the selection mechanism in RQRCP produces results that are the same quality as the standard algorithm, but with performance near that of unpivoted QR (often an order of magnitude faster for large matrices). We also propose two formulas that facilitate further performance improvements. The first efficiently updates sample matrices to avoid computing new randomized projections. The second avoids large trailing updates during the decomposition in truncated low-rank approximations. Our truncated version of RQRCP also provides a key initial step in our truncated SVD approximation, TUXV. These advances open up a new performance domain for large matrix factorizations that will support efficient problem-solving techniques for challenging applications in science, engineering, and data analysis.

More Details

TYPE Journal Article YEAR 2020

DOI OSTI Scopus

Generalizing information to the evolution of rational belief

Entropy

Duersch, Jed A.; Catanach, Thomas A.

Information theory provides a mathematical foundation to measure uncertainty in belief. Belief is represented by a probability distribution that captures our understanding of an outcome's plausibility. Information measures based on Shannon's concept of entropy include realization information, Kullback-Leibler divergence, Lindley's information in experiment, cross entropy, and mutual information. We derive a general theory of information from first principles that accounts for evolving belief and recovers all of these measures. Rather than simply gauging uncertainty, information is understood in this theory to measure change in belief. We may then regard entropy as the information we expect to gain upon realization of a discrete latent random variable. This theory of information is compatible with the Bayesian paradigm in which rational belief is updated as evidence becomes available. Furthermore, this theory admits novel measures of information with well-defined properties, which we explored in both analysis and experiment. This view of information illuminates the study of machine learning by allowing us to quantify information captured by a predictive model and distinguish it from residual information contained in training data. We gain related insights regarding feature selection, anomaly detection, and novel Bayesian approaches.

More Details

TYPE Journal Article YEAR 2020

DOI OSTI Scopus