Publications Details

Publications / Other Report

Efficient Generalizable Deep Learning

Stochastic optimization is a fundamental field of research for machine learning. Stochastic gradient descent (SGD) and related methods provide a feasible means to train complicated prediction models over large datasets. SGD, however, does not explicitly address the problem of overfitting, which can lead to predictions that perform poorly on new data. This difference between loss performance on unseen testing data verses that of training data defines the generalization gap of a model. We introduce a new computational kernel called Stochastic Hessian Projection (SHP) that uses a maximum likelihood framework to simultaneously estimate gradient noise covariance and local curvature of the loss function. Our analysis illustrates that these quantities affect the evolution of parameter uncertainty and therefore generalizability. We show how these computations allow us to predict the generalization gap without requiring holdout data. Explicitly assessing this metric for generalizability during training may improve machine learning predictions when data is scarce and understanding prediction variability is critical.