Publications Details

Publications / SAND Report

Builtin vs. auxiliary detection of extrapolation risk

Munson, Miles A.

A key assumption in supervised machine learning is that future data will be similar to historical data. This assumption is often false in real world applications, and as a result, prediction models often return predictions that are extrapolations. We compare four approaches to estimating extrapolation risk for machine learning predictions. Two builtin methods use information available from the classification model to decide if the model would be extrapolating for an input data point. The other two build auxiliary models to supplement the classification model and explicitly model extrapolation risk. Experiments with synthetic and real data sets show that the auxiliary models are more reliable risk detectors. To best safeguard against extrapolating predictions, however, we recommend combining builtin and auxiliary diagnostics.