Evaluation of a Class of Simple and Effective Uncertainty Methods for Sparse Samples of Random Variables and Functions
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a class of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10-4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.