We develop methods that could be used to qualify a training dataset and a data-driven turbulence closure trained on it. By qualify, we mean identify the kind of turbulent physics that could be simulated by the data-driven closure. We limit ourselves to closures for the Reynolds-Averaged Navier Stokes (RANS) equations. We build on our previous work on assembling feature-spaces, clustering and characterizing Direct Numerical Simulation datasets that are typically pooled to constitute training datasets. In this paper, we develop an alternative way to assemble feature-spaces and thus check the correctness and completeness of our previous method. We then use the characterization of our training dataset to identify if a data-driven turbulence closure learned on it would generalize to an unseen flow configuration – an impinging jet in our case. Finally, we train a RANS closure architected as a neural network, and develop an explanation i.e., an interpretable approximation, using generalized linear mixed-effects models and check whether the explanation resembles a contemporary closure from turbulence modeling.
We present a simple, near-real-time Bayesian method to infer and forecast a multiwave outbreak, and demonstrate it on the COVID-19 pandemic. The approach uses timely epidemiological data that has been widely available for COVID-19. It provides short-term forecasts of the outbreak’s evolution, which can then be used for medical resource planning. The method postulates one- and multiwave infection models, which are convolved with the incubation-period distribution to yield competing disease models. The disease models’ parameters are estimated via Markov chain Monte Carlo sampling and information-theoretic criteria are used to select between them for use in forecasting. The method is demonstrated on two- and three-wave COVID-19 outbreaks in California, New Mexico and Florida, as observed during Summer-Winter 2020. We find that the method is robust to noise, provides useful forecasts (along with uncertainty bounds) and that it reliably detected when the initial single-wave COVID-19 outbreaks transformed into successive surges as containment efforts in these states failed by the end of Spring 2020.
Machine-learned models, specifically neural networks, are increasingly used as “closures” or “constitutive models” in engineering simulators to represent fine-scale physical phenomena that are too computationally expensive to resolve explicitly. However, these neural net models of unresolved physical phenomena tend to fail unpredictably and are therefore not used in mission-critical simulations. In this report, we describe new methods to authenticate them, i.e., to determine the (physical) information content of their training datasets, qualify the scenarios where they may be used and to verify that the neural net, as trained, adhere to physics theory. We demonstrate these methods with neural net closure of turbulent phenomena used in Reynolds Averaged Navier-Stokes equations. We show the types of turbulent physics extant in our training datasets, and, using a test flow of an impinging jet, identify the exact locations where the neural network would be extrapolating i.e., where it would be used outside the feature-space where it was trained. Using Generalized Linear Mixed Models, we also generate explanations of the neural net (à la Local Interpretable Model agnostic Explanations) at prototypes placed in the training data and compare them with approximate analytical models from turbulence theory. Finally, we verify our findings by reproducing them using two different methods.
In this paper we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, in particular variants of recurrent neural networks (RNNs) have been studied for ILI (Influenza-Like Illness) forecasting, and have achieved a higher forecasting skill compared to conventional models such as ARIMA. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block—temporal convolutional networks and simple neural attentive meta-learners—for epidemiological forecasting. We then test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, plain RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.
Lin, Yen T.; Neumann, Jacob; Miller, Ely F.; Posner, Richard G.; Mallela, Abhishek; Safta, Cosmin; Ray, Jaideep; Thakur, Gautam; Chinthavali, Supriya; Hlavacek, William S.
To increase situational awareness and support evidence-based policymaking, we formulated a mathematical model for coronavirus disease transmission within a regional population. This compartmental model accounts for quarantine, self-isolation, social distancing, a nonexponentially distributed incubation period, asymptomatic persons, and mild and severe forms of symptomatic disease. We used Bayesian inference to calibrate region-specific models for consistency with daily reports of confirmed cases in the 15 most populous metropolitan statistical areas in the United States. We also quantified uncertainty in parameter estimates and forecasts. This online learning approach enables early identification of new trends despite considerable variability in case reporting.
For digital twins (DTs) to become a central fixture in mission critical systems, a better understanding is required of potential modes of failure, quantification of uncertainty, and the ability to explain a model’s behavior. These aspects are particularly important as the performance of a digital twin will evolve during model development and deployment for real-world operations.
This paper explores unsupervised learning approaches for analysis and categorization of turbulent flow data. Single point statistics from several high-fidelity turbulent flow simulation data sets are classified using a Gaussian mixture model clustering algorithm. Candidate features are proposed, which include barycentric coordinates of the Reynolds stress anisotropy tensor, as well as scalar and angular invariants of the Reynolds stress and mean strain rate tensors. A feature selection algorithm is applied to the data in a sequential fashion, flow by flow, to identify a good feature set and an optimal number of clusters for each data set. The algorithm is first applied to Direct Numerical Simulation data for plane channel flow, and produces clusters that are consistent with turbulent flow theory and empirical results that divide the channel flow into a number of regions (viscous sub-layer, log layer, etc). Clusters are then identified for flow over a wavy-walled channel, flow over a bump in a channel, and flow past a square cylinder. Some clusters are closely identified with the anisotropy state of the turbulence, as indicated by the location within the barycentric map of the Reynolds stress tensor. Other clusters can be connected to physical phenomena, such as boundary layer separation and free shear layers. Exemplar points from the clusters, or prototypes, are then identified using a prototype selection method. These exemplars summarize the dataset by a factor of 10 to 1000. The clustering and prototype selection algorithms provide a foundation for physics-based, semi-automated classification of turbulent flow states and extraction of a subset of data points that can serve as the basis for the development of explainable machine-learned turbulence models.
We demonstrate a Bayesian method for the “real-time” characterization and forecasting of partially observed COVID-19 epidemic. Characterization is the estimation of infection spread parameters using daily counts of symptomatic patients. The method is designed to help guide medical resource allocation in the early epoch of the outbreak. The estimation problem is posed as one of Bayesian inference and solved using a Markov chain Monte Carlo technique. The data used in this study was sourced before the arrival of the second wave of infection in July 2020. The proposed modeling approach, when applied at the country level, generally provides accurate forecasts at the regional, state and country level. The epidemiological model detected the flattening of the curve in California, after public health measures were instituted. The method also detected different disease dynamics when applied to specific regions of New Mexico.
In this report we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, especially variants of recurrent neural networks (RNNs) have been studied for influenza forecasting, and have achieved higher forecasting skill compared to conventional models such as ARIMA models. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block temporal convolutional networks and simple neural attentive meta-learner for epidemiological forecasting and test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.
In this report, we construct and test a framework for fusing the predictions of a ensemble of seismic wave detectors. The framework is drawn from multi-instance learning and is meant to improve the predictive skill of the ensemble beyond that of the individual detectors. We show how the framework allows the use of multiple features derived from the seismogram to detect seismic wave arrivals, as well as how it allows only the most informative features to be retained in the ensemble. The computational cost of the "ensembling" method is linear in the size of the ensemble, allowing a scalable method for monitoring multiple features/transformations of a seismogram. The framework is tested on teleseismic and regional p-wave arrivals at the IMS (International Monitoring System) station in Warramunga, NT, Australia and the PNSU station in University of Utah's monitoring network.
This report documents a statistical method for the "real-time" characterization of partially observed epidemics. Observations consist of daily counts of symptomatic patients, diagnosed with the disease. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information for the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and is predicated on a model for the distribution of the incubation period. The model parameters are estimated as distributions using a Markov Chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. The method is applied to the COVID-19 pandemic of 2020, using data at the country, provincial (e.g., states) and regional (e.g. county) levels. The epidemiological model includes a stochastic component due to uncertainties in the incubation period. This model-form uncertainty is accommodated by a pseudo-marginal Metropolis-Hastings MCMC sampler, which produces posterior distributions that reflect this uncertainty. We approximate the discrepancy between the data and the epidemiological model using Gaussian and negative binomial error models; the latter was motivated by the over-dispersed count data. For small daily counts we find the performance of the calibrated models to be similar for the two error models. For large daily counts the negative-binomial approximation is numerically unstable unlike the Gaussian error model. Application of the model at the country level (for the United States, Germany, Italy, etc.) generally provided accurate forecasts, as the data consisted of large counts which suppressed the day-to-day variations in the observations. Further, the bulk of the data is sourced over the duration before the relaxation of the curbs on population mixing, and is not confounded by any discernible country-wide second wave of infections. At the state-level, where reporting was poor or which evinced few infections (e.g., New Mexico), the variance in the data posed some, though not insurmountable, difficulties, and forecasts were able to capture the data with large uncertainty bounds. The method was found to be sufficiently sensitive to discern the flattening of the infection and epidemic curve due to shelter-in-place orders after around 90% quantile for the incubation distribution (about 10 days for COVID-19). The proposed model was also used at a regional level to compare the forecasts for the central and north-west regions of New Mexico. Modeling the data for these regions illustrated different disease spread dynamics captured by the model. While in the central region the daily counts peaked in the late April, in the north-west region the ramp-up continued for approximately three more weeks.