Publications

Results 1–200 of 228

Search results

Jump to search filters

Detecting outbreaks using a spatial latent field

PLOS ONE

Ray, Jaideep; Bridgman, Wyatt

In this paper, we present a method for estimating the infection-rate of a disease as a spatial-temporal field. Our data comprises time-series case-counts of symptomatic patients in various areal units of a region. We extend an epidemiological model, originally designed for a single areal unit, to accommodate multiple units. The field estimation is framed within a Bayesian context, utilizing a parameterized Gaussian random field as a spatial prior. We apply an adaptive Markov chain Monte Carlo method to sample the posterior distribution of the model parameters condition on COVID-19 case-count data from three adjacent counties in New Mexico, USA. Our results suggest that the correlation between epidemiological dynamics in neighboring regions helps regularize estimations in areas with high variance (i.e., poor quality) data. Using the calibrated epidemic model, we forecast the infection-rate over each areal unit and develop a simple anomaly detector to signal new epidemic waves. Our findings show that anomaly detector based on estimated infection-rates outperforms a conventional algorithm that relies solely on case-counts.

More Details

Predictive dynamic wetting, fluid–structure interaction simulations for braze run-out

Computers and Fluids

Horner, Jeffrey S.; Kemmenoe, David J.; Bourdon, Gustav J.; Roberts, Scott A.; Arata, Edward R.; Ray, Jaideep; Grillet, Anne M.

Brazing and soldering are metallurgical joining techniques that use a wetting molten metal to create a joint between two faying surfaces. The quality of the brazing process depends strongly on the wetting properties of the molten filler metal, namely the surface tension and contact angle, and the resulting joint can be susceptible to various defects, such as run-out and underfill, if the material properties or joining conditions are not suitable. In this work, we implement a finite element simulation to predict the formation of such defects in braze processes. This model incorporates both fluid–structure interaction through an arbitrary Eulerian–Lagrangian technique and free surface wetting through conformal decomposition finite element modeling. Upon validating our numerical simulations against experimental run-out studies on a silver-Kovar system, we then use the model to predict run-out and underfill in systems with variable surface tension, contact angles, and applied pressure. Finally, we consider variable joint/surface geometries and show how different geometrical configurations can help to mitigate run-out. This work aims to understand how brazing defects arise and validate a coupled wetting and fluid–structure interaction simulation that can be used for other industrial problems.

More Details

An Assessment of the Laminar Hypersonic Double-Cone Experiments in the LENS-XX Tunnel

AIAA Journal

Ray, Jaideep; Blonigan, Patrick J.; Phipps, Eric T.; Maupin, Kathryn A.

This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. In this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.

More Details

An Assessment of the Laminar Hypersonic Double-Cone Experiments in the LENS-XX Tunnel

AIAA Journal

Ray, Jaideep; Blonigan, Patrick J.; Phipps, Eric T.; Maupin, Kathryn A.

This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. In this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.

More Details

Calibrating hypersonic turbulence flow models with the HIFiRE-1 experiment using data-driven machine-learned models

Computer Methods in Applied Mechanics and Engineering

Chowdhary, Kenny; Hoang, Chi; Ray, Jaideep; Lee, Kookjin

In this paper we study the efficacy of combining machine-learning methods with projection-based model reduction techniques for creating data-driven surrogate models of computationally expensive, high-fidelity physics models. Such surrogate models are essential for many-query applications e.g., engineering design optimization and parameter estimation, where it is necessary to invoke the high-fidelity model sequentially, many times. Surrogate models are usually constructed for individual scalar quantities. However there are scenarios where a spatially varying field needs to be modeled as a function of the model's input parameters. We develop a method to do so, using projections to represent spatial variability while a machine-learned model captures the dependence of the model's response on the inputs. The method is demonstrated on modeling the heat flux and pressure on the surface of the HIFiRE-1 geometry in a Mach 7.16 turbulent flow. The surrogate model is then used to perform Bayesian estimation of freestream conditions and parameters of the SST (Shear Stress Transport) turbulence model embedded in the high-fidelity (Reynolds-Averaged Navier–Stokes) flow simulator, using shock-tunnel data. The paper provides the first-ever Bayesian calibration of a turbulence model for complex hypersonic turbulent flows. We find that the primary issues in estimating the SST model parameters are the limited information content of the heat flux and pressure measurements and the large model-form error encountered in a certain part of the flow.

More Details

Detecting technological maturity from bibliometric patterns

Expert Systems with Applications

Cauthen, Katherine R.; Rai, Prashant; Hale, Nicholas; Freeman, Laura; Ray, Jaideep

The capability to identify emergent technologies based upon easily accessed open-source indicators, such as publications, is important for decision-makers in industry and government. The scientific contribution of this work is the proposition of a machine learning approach to the detection of the maturity of emerging technologies based on publication counts. Time-series of publication counts have universal features that distinguish emerging and growing technologies. We train an artificial neural network classifier, a supervised machine learning algorithm, upon these features to predict the maturity (emergent vs. growth) of an arbitrary technology. With a training set comprised of 22 technologies we obtain a classification accuracy ranging from 58.3% to 100% with an average accuracy of 84.6% for six test technologies. To enhance classifier performance, we augmented the training corpus with synthetic time-series technology life cycle curves, formed by calculating weighted averages of curves in the original training set. Training the classifier on the synthetic data set resulted in improved accuracy, ranging from 83.3% to 100% with an average accuracy of 90.4% for the test technologies. The performance of our classifier exceeds that of competing machine learning approaches in the literature, which report an average classification accuracy of only 85.7% at maximum. Moreover, in contrast to current methods our approach does not require subject matter expertise to generate training labels, and it can be automated and scaled.

More Details

Validation of Calibrated k–ε Model Parameters for Jet-in-Crossflow

AIAA Journal

Miller, Nathan E.; Beresh, Steven J.; Ray, Jaideep

Previous efforts determined a set of calibrated, optimal model parameter values for Reynolds-averaged Navier–Stokes (RANS) simulations of a compressible jet in crossflow (JIC) using a $k–ε$ turbulence model. These parameters were derived by comparing simulation results to particle image velocimetry (PIV) data of a complementary JIC experiment under a limited set of flow conditions. Here, a $k–ε$ model using both nominal and calibrated parameters is validated against PIV data acquired from a much wider variety of JIC cases, including a realistic flight vehicle. The results from the simulations using the calibrated model parameters showed considerable improvements over those using the nominal values, even for cases that were not used in the calibration procedure that defined the optimal parameters. This improvement is demonstrated using a number of quality metrics that test the spatial alignment of the jet core, the magnitudes of multiple flow variables, and the location and strengths of vortices in the counter-rotating vortex cores on the PIV planes. These results suggest that the calibrated parameters have applicability well outside the specific flow case used in defining them and that with the right model parameters, RANS solutions for the JIC can be improved significantly over those obtained from the nominal model.

More Details

Qualifying Training Datasets for Data-Driven Turbulence Closures

AIAA AVIATION 2022 Forum

Banerjee, Tania; Ray, Jaideep; Barone, Matthew F.; Domino, Stefan P.

We develop methods that could be used to qualify a training dataset and a data-driven turbulence closure trained on it. By qualify, we mean identify the kind of turbulent physics that could be simulated by the data-driven closure. We limit ourselves to closures for the Reynolds-Averaged Navier Stokes (RANS) equations. We build on our previous work on assembling feature-spaces, clustering and characterizing Direct Numerical Simulation datasets that are typically pooled to constitute training datasets. In this paper, we develop an alternative way to assemble feature-spaces and thus check the correctness and completeness of our previous method. We then use the characterization of our training dataset to identify if a data-driven turbulence closure learned on it would generalize to an unseen flow configuration – an impinging jet in our case. Finally, we train a RANS closure architected as a neural network, and develop an explanation i.e., an interpretable approximation, using generalized linear mixed-effects models and check whether the explanation resembles a contemporary closure from turbulence modeling.

More Details

Qualifying Training Datasets for Data-Driven Turbulence Closures

AIAA AVIATION 2022 Forum

Banerjee, Tania; Ray, Jaideep; Barone, Matthew F.; Domino, Stefan P.

We develop methods that could be used to qualify a training dataset and a data-driven turbulence closure trained on it. By qualify, we mean identify the kind of turbulent physics that could be simulated by the data-driven closure. We limit ourselves to closures for the Reynolds-Averaged Navier Stokes (RANS) equations. We build on our previous work on assembling feature-spaces, clustering and characterizing Direct Numerical Simulation datasets that are typically pooled to constitute training datasets. In this paper, we develop an alternative way to assemble feature-spaces and thus check the correctness and completeness of our previous method. We then use the characterization of our training dataset to identify if a data-driven turbulence closure learned on it would generalize to an unseen flow configuration – an impinging jet in our case. Finally, we train a RANS closure architected as a neural network, and develop an explanation i.e., an interpretable approximation, using generalized linear mixed-effects models and check whether the explanation resembles a contemporary closure from turbulence modeling.

More Details

Forecasting Multi-Wave Epidemics Through Bayesian Inference

Archives of Computational Methods in Engineering

Safta, Cosmin; Ray, Jaideep; Blonigan, Patrick J.

We present a simple, near-real-time Bayesian method to infer and forecast a multiwave outbreak, and demonstrate it on the COVID-19 pandemic. The approach uses timely epidemiological data that has been widely available for COVID-19. It provides short-term forecasts of the outbreak’s evolution, which can then be used for medical resource planning. The method postulates one- and multiwave infection models, which are convolved with the incubation-period distribution to yield competing disease models. The disease models’ parameters are estimated via Markov chain Monte Carlo sampling and information-theoretic criteria are used to select between them for use in forecasting. The method is demonstrated on two- and three-wave COVID-19 outbreaks in California, New Mexico and Florida, as observed during Summer-Winter 2020. We find that the method is robust to noise, provides useful forecasts (along with uncertainty bounds) and that it reliably detected when the initial single-wave COVID-19 outbreaks transformed into successive surges as containment efforts in these states failed by the end of Spring 2020.

More Details

Verification of Data-Driven Models of Physical Phenomena using Interpretable Approximation

Ray, Jaideep; Barone, Matthew F.; Domino, Stefan P.; Banerjee, Tania; Ranka, Sanjay

Machine-learned models, specifically neural networks, are increasingly used as “closures” or “constitutive models” in engineering simulators to represent fine-scale physical phenomena that are too computationally expensive to resolve explicitly. However, these neural net models of unresolved physical phenomena tend to fail unpredictably and are therefore not used in mission-critical simulations. In this report, we describe new methods to authenticate them, i.e., to determine the (physical) information content of their training datasets, qualify the scenarios where they may be used and to verify that the neural net, as trained, adhere to physics theory. We demonstrate these methods with neural net closure of turbulent phenomena used in Reynolds Averaged Navier-Stokes equations. We show the types of turbulent physics extant in our training datasets, and, using a test flow of an impinging jet, identify the exact locations where the neural network would be extrapolating i.e., where it would be used outside the feature-space where it was trained. Using Generalized Linear Mixed Models, we also generate explanations of the neural net (à la Local Interpretable Model agnostic Explanations) at prototypes placed in the training data and compare them with approximate analytical models from turbulence theory. Finally, we verify our findings by reproducing them using two different methods.

More Details

The predictive skill of convolutional neural networks models for disease forecasting

PLoS ONE

Lee, Kookjin; Ray, Jaideep; Safta, Cosmin

In this paper we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, in particular variants of recurrent neural networks (RNNs) have been studied for ILI (Influenza-Like Illness) forecasting, and have achieved a higher forecasting skill compared to conventional models such as ARIMA. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block—temporal convolutional networks and simple neural attentive meta-learners—for epidemiological forecasting. We then test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, plain RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.

More Details

Daily forecasting of regional epidemics of coronavirus disease with bayesian uncertainty quantification, United States

Emerging Infectious Diseases

Lin, Yen T.; Neumann, Jacob; Miller, Ely F.; Posner, Richard G.; Mallela, Abhishek; Safta, Cosmin; Ray, Jaideep; Thakur, Gautam; Chinthavali, Supriya; Hlavacek, William S.

To increase situational awareness and support evidence-based policymaking, we formulated a mathematical model for coronavirus disease transmission within a regional population. This compartmental model accounts for quarantine, self-isolation, social distancing, a nonexponentially distributed incubation period, asymptomatic persons, and mild and severe forms of symptomatic disease. We used Bayesian inference to calibrate region-specific models for consistency with daily reports of confirmed cases in the 15 most populous metropolitan statistical areas in the United States. We also quantified uncertainty in parameter estimates and forecasts. This online learning approach enables early identification of new trends despite considerable variability in case reporting.

More Details

Robustness and Validation of Model and Digital Twins Deployment

Volkova, Svitana; Stracuzzi, David J.; Shafer, Jenifer; Ray, Jaideep; Pullum, Laura

For digital twins (DTs) to become a central fixture in mission critical systems, a better understanding is required of potential modes of failure, quantification of uncertainty, and the ability to explain a model’s behavior. These aspects are particularly important as the performance of a digital twin will evolve during model development and deployment for real-world operations.

More Details

Feature selection, clustering, and prototype placement for turbulence data sets

AIAA Scitech 2021 Forum

Barone, Matthew F.; Ray, Jaideep; Domino, Stefan P.

This paper explores unsupervised learning approaches for analysis and categorization of turbulent flow data. Single point statistics from several high-fidelity turbulent flow simulation data sets are classified using a Gaussian mixture model clustering algorithm. Candidate features are proposed, which include barycentric coordinates of the Reynolds stress anisotropy tensor, as well as scalar and angular invariants of the Reynolds stress and mean strain rate tensors. A feature selection algorithm is applied to the data in a sequential fashion, flow by flow, to identify a good feature set and an optimal number of clusters for each data set. The algorithm is first applied to Direct Numerical Simulation data for plane channel flow, and produces clusters that are consistent with turbulent flow theory and empirical results that divide the channel flow into a number of regions (viscous sub-layer, log layer, etc). Clusters are then identified for flow over a wavy-walled channel, flow over a bump in a channel, and flow past a square cylinder. Some clusters are closely identified with the anisotropy state of the turbulence, as indicated by the location within the barycentric map of the Reynolds stress tensor. Other clusters can be connected to physical phenomena, such as boundary layer separation and free shear layers. Exemplar points from the clusters, or prototypes, are then identified using a prototype selection method. These exemplars summarize the dataset by a factor of 10 to 1000. The clustering and prototype selection algorithms provide a foundation for physics-based, semi-automated classification of turbulent flow states and extraction of a subset of data points that can serve as the basis for the development of explainable machine-learned turbulence models.

More Details

Characterization of partially observed epidemics through Bayesian inference: application to COVID-19

Computational Mechanics

Safta, Cosmin; Ray, Jaideep; Sargsyan, Khachik

We demonstrate a Bayesian method for the “real-time” characterization and forecasting of partially observed COVID-19 epidemic. Characterization is the estimation of infection spread parameters using daily counts of symptomatic patients. The method is designed to help guide medical resource allocation in the early epoch of the outbreak. The estimation problem is posed as one of Bayesian inference and solved using a Markov chain Monte Carlo technique. The data used in this study was sourced before the arrival of the second wave of infection in July 2020. The proposed modeling approach, when applied at the country level, generally provides accurate forecasts at the regional, state and country level. The epidemiological model detected the flattening of the curve in California, after public health measures were instituted. The method also detected different disease dynamics when applied to specific regions of New Mexico.

More Details

Predictive Skill of Deep Learning Models Trained on Limited Sequence Data

Safta, Cosmin; Lee, Kookjin L.; Ray, Jaideep

In this report we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, especially variants of recurrent neural networks (RNNs) have been studied for influenza forecasting, and have achieved higher forecasting skill compared to conventional models such as ARIMA models. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block temporal convolutional networks and simple neural attentive meta-learner for epidemiological forecasting and test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.

More Details

A Multi-Instance learning Framework for Seismic Detectors

Ray, Jaideep; Wang, Fulton; Young, Christopher J.

In this report, we construct and test a framework for fusing the predictions of a ensemble of seismic wave detectors. The framework is drawn from multi-instance learning and is meant to improve the predictive skill of the ensemble beyond that of the individual detectors. We show how the framework allows the use of multiple features derived from the seismogram to detect seismic wave arrivals, as well as how it allows only the most informative features to be retained in the ensemble. The computational cost of the "ensembling" method is linear in the size of the ensemble, allowing a scalable method for monitoring multiple features/transformations of a seismogram. The framework is tested on teleseismic and regional p-wave arrivals at the IMS (International Monitoring System) station in Warramunga, NT, Australia and the PNSU station in University of Utah's monitoring network.

More Details

Characterization of Partially Observed Epidemics - Application to COVID-19

Safta, Cosmin; Ray, Jaideep; Foulk, James W.; Catanach, Thomas A.; Chowdhary, Kenny; Debusschere, Bert; Galvan, Edgar; Geraci, Gianluca; Khalil, Mohammad; Portone, Teresa

This report documents a statistical method for the "real-time" characterization of partially observed epidemics. Observations consist of daily counts of symptomatic patients, diagnosed with the disease. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information for the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and is predicated on a model for the distribution of the incubation period. The model parameters are estimated as distributions using a Markov Chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. The method is applied to the COVID-19 pandemic of 2020, using data at the country, provincial (e.g., states) and regional (e.g. county) levels. The epidemiological model includes a stochastic component due to uncertainties in the incubation period. This model-form uncertainty is accommodated by a pseudo-marginal Metropolis-Hastings MCMC sampler, which produces posterior distributions that reflect this uncertainty. We approximate the discrepancy between the data and the epidemiological model using Gaussian and negative binomial error models; the latter was motivated by the over-dispersed count data. For small daily counts we find the performance of the calibrated models to be similar for the two error models. For large daily counts the negative-binomial approximation is numerically unstable unlike the Gaussian error model. Application of the model at the country level (for the United States, Germany, Italy, etc.) generally provided accurate forecasts, as the data consisted of large counts which suppressed the day-to-day variations in the observations. Further, the bulk of the data is sourced over the duration before the relaxation of the curbs on population mixing, and is not confounded by any discernible country-wide second wave of infections. At the state-level, where reporting was poor or which evinced few infections (e.g., New Mexico), the variance in the data posed some, though not insurmountable, difficulties, and forecasts were able to capture the data with large uncertainty bounds. The method was found to be sufficiently sensitive to discern the flattening of the infection and epidemic curve due to shelter-in-place orders after around 90% quantile for the incubation distribution (about 10 days for COVID-19). The proposed model was also used at a regional level to compare the forecasts for the central and north-west regions of New Mexico. Modeling the data for these regions illustrated different disease spread dynamics captured by the model. While in the central region the daily counts peaked in the late April, in the north-west region the ramp-up continued for approximately three more weeks.

More Details

Rigorous Data Fusion for Computationally Expensive Simulations

Winovich, Nick; Rushdi, Ahmad; Phipps, Eric T.; Ray, Jaideep; Lin, Guang; Ebeida, Mohamed

This manuscript comprises the final report for the 1-year, FY19 LDRD project "Rigorous Data Fusion for Computationally Expensive Simulations," wherein an alternative approach to Bayesian calibration was developed based a new sampling technique called VoroSpokes. Vorospokes is a novel quadrature and sampling framework defined with respect to Voronoi tessellations of bounded domains in $R^d$ developed within this project. In this work, we first establish local quadrature and sampling results on convex polytopes using randomly directed rays, or spokes, to approximate the quantities of interest for a specified target function. A theoretical justification for both procedures is provided along with empirical results demonstrating the unbiased convergence in the resulting estimates/samples. The local quadrature and sampling procedures are then extended to global procedures defined on more general domains by applying the local results to the cells of a Voronoi tessellation covering the domain in consideration. We then demonstrate how the proposed global sampling procedure can be used to define a natural framework for adaptively constructing Voronoi Piecewise Surrogate (VPS) approximations based on local error estimates. Finally, we show that the adaptive VPS procedure can be used to form a surrogate model approximation to a specified, potentially unnormalized, density function, and that the global sampling procedure can be used to efficiently draw independent samples from the surrogate density in parallel. The performance of the resulting VoroSpokes sampling framework is assessed on a collection of Bayesian inference problems and is shown to provide highly accurate posterior predictions which align with the results obtained using traditional methods such as Gibbs sampling and random-walk Markov Chain Monte Carlo (MCMC). Importantly, the proposed framework provides a foundation for performing Bayesian inference tasks which is entirely independent from the theory of Markov chains.

More Details

Validation of calibrated K-ɛ model parameters for jet-in-crossflow

AIAA Aviation 2019 Forum

Miller, Nathan E.; Beresh, Steven J.; Ray, Jaideep

Previous efforts determined a set of calibrated model parameters for ReynoldsAveraged Navier Stokes (RANS) simulations of a compressible jet in crossflow (JIC) using a k-ɛ turbulence model. These coefficients were derived from Particle Image Velocimetry (PIV) data of a complementary experiment using a limited set of flow conditions. Here, k-ɛ models using conventional (nominal) and calibrated parameters are rigorously validated against PIV data acquired under a much wider variety of JIC cases, including a flight configuration. The results from the simulations using the calibrated model parameters showed considerable improvements over those using the nominal values, even for cases that were not used in defining the calibrated parameters. This improvement is demonstrated using quality metrics defined specifically to test the spatial alignment of the jet core as well as the magnitudes of flow variables on the PIV planes. These results suggest that the calibrated parameters have applicability well outside the specific flow case used in defining them and that with the right model parameters, RANS results can be improved significantly over the nominal.

More Details

Estimation of inflow uncertainties in laminar hypersonic double-cone experiments

AIAA Scitech 2019 Forum

Ray, Jaideep; Kieweg, Sarah; Dinzl, Derek J.; Carnes, Brian R.; Weirs, Gregory; Freno, Brian A.; Howard, Micah; Smith, Thomas M.

We propose a probabilistic framework for assessing the consistency of an experimental dataset, i.e., whether the stated experimental conditions are consistent with the measurements provided. In case the dataset is inconsistent, our framework allows one to hypothesize and test sources of inconsistencies. This is crucial in model validation efforts. The framework relies on statistical inference to estimate experimental settings deemed untrustworthy, from measurements deemed accurate. The quality of the inferred variables is gauged by its ability to reproduce held-out experimental measurements; if the new predictions are closer to measurements than before, the cause of the discrepancy is deemed to have been found. The framework brings together recent advances in the use of Bayesian inference and statistical emulators in fluid dynamics with similarity measures for random variables to construct the hypothesis testing approach. We test the framework on two double-cone experiments executed in the LENS-XX wind tunnel and one in the LENS-I tunnel; all three have encountered difficulties when used in model validation exercises. However, the cause behind the difficulties with the LENS-I experiment is known, and our inferential framework recovers it. We also detect an inconsistency with one of the LENS-XX experiments, and hypothesize three causes for it. We check two of the hypotheses using our framework, and we find evidence that rejects them. We end by proposing that uncertainty quantification methods be used more widely to understand experiments and characterize facilities, and we cite three different methods to do so, the third of which we present in this paper.

More Details

Conditioning multi-model ensembles for disease forecasting

Ray, Jaideep; Cauthen, Katherine R.; Lefantzi, Sophia; Burks, Lynne

In this study we investigate how an ensemble of disease models can be conditioned to observational data, in a bid to improve its predictive skill. We use the ensemble of influenza forecasting models gathered by the US Centers for Disease Control and Prevention (CDC) as the exemplar. This ensemble is used every year to forecast the annual influenza outbreak in the United States. The models constituting this ensemble draw on very different modeling assumptions and approximations and are a diverse collection of methods to approximate epidemiological dynamics. Currently, each models' predictions are accorded the same importance, or weight, when compiling the ensemble's forecast. We consider this equally-weighted ensemble as the baseline case which has to be improved upon. In this study, we explore whether an ensemble forecast can be improved by "conditioning" the ensemble to whatever observational data is available from the ongoing outbreak. "Conditioning" can imply according the ensemble's members different weights which evolve over time, or simply perform the forecast using the top k (equally-weighted) models. In the latter case, the composition of the "top-k-see of models evolves over time. This is called "model averaging" in statistics. We explore four methods to perform model-averaging, three of which are new. We find that the CDC ensemble responds best to the "top-k-models" approach to model-averaging. All the new MA methods perform better than the baseline equally-weighted ensemble. The four model-averaging methods treat the models as black-boxes and simply use their forecasts as inputs i.e., one does not need access to the models at all, but rather only their forecasts. The model-averaging approaches reviewed in this report thus form a general framework for model-averaging any model ensemble.

More Details

Robust Bayesian calibration of a k-ϵ model for compressible jet-in-crossflow simulations

AIAA Journal

Ray, Jaideep; Dechant, Lawrence; Lefantzi, Sophia; Ling, Julia; Arunajatesan, Srinivasan

Compressible jet-in-crossflow interactions are difficult to simulate accurately using Reynolds-averaged Navier-Stokes (RANS) models. This could be due to simplifications inherent in RANS or the use of inappropriate RANS constants estimated by fitting to experiments of simple or canonical flows. Our previous work on Bayesian calibration of a k - ϵ model to experimental data had led to a weak hypothesis that inaccurate simulations could be due to inappropriate constants more than model-form inadequacies of RANS. In this work, Bayesian calibration of k - ϵ constants to a set of experiments that span a range of Mach numbers and jet strengths has been performed. The variation of the calibrated constants has been checked to assess the degree to which parametric estimates compensate for RANS's model-form errors. An analytical model of jet-in-crossflow interactions has also been developed, and estimates of k - ϵ constants that are free of any conflation of parametric and RANS's model-form uncertainties have been obtained. It has been found that the analytical k - ϵ constants provide mean-flow predictions that are similar to those provided by the calibrated constants. Further, both of them provide predictions that are far closer to experimental measurements than those computed using "nominal" values of these constants simply obtained from the literature. It can be concluded that the lack of predictive skill of RANS jet-in-crossflow simulations is mostly due to parametric inadequacies, and our analytical estimates may provide a simple way of obtaining predictive compressible jet-in-crossflow simulations.

More Details

Biologically inspired approaches for biosurveillance anomaly detection and data fusion

Finley, Patrick D.; Levin, Drew; Flanagan, Tatiana P.; Beyeler, Walter E.; Mitchell, Michael D.; Ray, Jaideep; Moses, Melanie; Forrest, Stephanie

This study developed and tested biologically inspired computational methods to detect anomalous signals in data streams that could indicate a pending outbreak or bio-weapon attack. Current large-scale biosurveillance systems are plagued by two principal deficiencies: (1) timely detection of disease-indicating signals in noisy data and (2) anomaly detection across multiple channels. Anomaly detectors and data fusion components modeled after human immune system processes were tested against a variety of natural and synthetic surveillance datasets. A pilot scale immune-system-based biosurveillance system performed at least as well as traditional statistical anomaly detection data fusion approaches. Machine learning approaches leveraging Deep Learning recurrent neural networks were developed and applied to challenging unstructured and multimodal health surveillance data. Within the limits imposed of data availability, both immune systems and deep learning methods were found to improve anomaly detection and data fusion performance for particularly challenging data subsets.

More Details

Soil moisture estimation using tomographic ground penetrating radar in a MCMC–Bayesian framework

Stochastic Environmental Research and Risk Assessment

Bao, Jie; Hou, Zhangshuan; Ray, Jaideep; Huang, Maoyi; Swiler, Laura P.; Ren, Huiying

In this study, we focus on a hydrogeological inverse problem specifically targeting monitoring soil moisture variations using tomographic ground penetrating radar (GPR) travel time data. Technical challenges exist in the inversion of GPR tomographic data for handling non-uniqueness, nonlinearity and high-dimensionality of unknowns. We have developed a new method for estimating soil moisture fields from crosshole GPR data. It uses a pilot-point method to provide a low-dimensional representation of the relative dielectric permittivity field of the soil, which is the primary object of inference: the field can be converted to soil moisture using a petrophysical model. We integrate a multi-chain Markov chain Monte Carlo (MCMC)–Bayesian inversion framework with the pilot point concept, a curved-ray GPR travel time model, and a sequential Gaussian simulation algorithm, for estimating the dielectric permittivity at pilot point locations distributed within the tomogram, as well as the corresponding geostatistical parameters (i.e., spatial correlation range). We infer the dielectric permittivity as a probability density function, thus capturing the uncertainty in the inference. The multi-chain MCMC enables addressing high-dimensional inverse problems as required in the inversion setup. The method is scalable in terms of number of chains and processors, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. The proposed inversion approach can successfully approximate the posterior density distributions of the pilot points, and capture the true values. The computational efficiency, accuracy, and convergence behaviors of the inversion approach were also systematically evaluated, by comparing the inversion results obtained with different levels of noises in the observations, increased observational data, as well as increased number of pilot points.

More Details

Learning an eddy viscosity model using shrinkage and Bayesian calibration: A jet-in-crossflow case study

ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; Dechant, Lawrence

We demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynoldsaveraged Navier-Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscosity model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models ("curve-fits"). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. We find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.

More Details

Final Documentation: Incident Management And Probabilities Courses of action Tool (IMPACT)

Edwards, Donna M.; Ray, Jaideep; Tucker, Mark D.; Whetzel, Jonathan H.; Cauthen, Katherine R.

This report pulls together the documentation produced for the IMPACT tool, a software-based decision support tool that provides situational awareness, incident characterization, and guidance on public health and environmental response strategies for an unfolding bio-terrorism incident.

More Details

Bayesian inversion of seismic and electromagnetic data for marine gas reservoir characterization using multi-chain Markov chain Monte Carlo sampling

Journal of Applied Geophysics

Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Bao, Jie; Swiler, Laura P.

In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated — reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.

More Details

SAChES: Scalable Adaptive Chain-Ensemble Sampling

Swiler, Laura P.; Ray, Jaideep; Ebeida, Mohamed; Huang, Maoyi; Hou, Zhangshuan; Bao, Jie; Ren, Huiying

We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the use of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.

More Details

K-ε turbulence model parameter estimates using an approximate self-similar jet-in-crossflow solution

8th AIAA Theoretical Fluid Mechanics Conference, 2017

Dechant, Lawrence; Ray, Jaideep; Lefantzi, Sophia; Ling, Julia; Arunajatesan, Srinivasan

The k-ε turbulence model has been described as perhaps “the most widely used complete turbulence model.” This family of heuristic Reynolds Averaged Navier-Stokes (RANS) turbulence closures is supported by a suite of model parameters that have been estimated by demanding the satisfaction of well-established canonical flows such as homogeneous shear flow, log-law behavior, etc. While this procedure does yield a set of so-called nominal parameters, it is abundantly clear that they do not provide a universally satisfactory turbulence model that is capable of simulating complex flows. Recent work on the Bayesian calibration of the k-ε model using jet-in-crossflow wind tunnel data has yielded parameter estimates that are far more predictive than nominal parameter values. Here we develop a self-similar asymptotic solution for axisymmetric jet-in-crossflow interactions and derive analytical estimates of the parameters that were inferred using Bayesian calibration. The self-similar method utilizes a near field approach to estimate the turbulence model parameters while retaining the classical far-field scaling to model flow field quantities. Our parameter values are seen to be far more predictive than the nominal values, as checked using RANS simulations and experimental measurements. They are also closer to the Bayesian estimates than the nominal parameters. A traditional simplified jet trajectory model is explicitly related to the turbulence model parameters and is shown to yield good agreement with measurement when utilizing the analytical derived turbulence model coefficients. The close agreement between the turbulence model coefficients obtained via Bayesian calibration and the analytically estimated coefficients derived in this paper is consistent with the contention that the Bayesian calibration approach is firmly rooted in the underlying physical description.

More Details

Robust bayesian calibration of a RANS model for jet-in-crossflow simulations

8th AIAA Theoretical Fluid Mechanics Conference, 2017

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; Dechant, Lawrence

Compressible jet-in-crossflow interactions are poorly simulated using Reynolds-Averaged Navier Stokes (RANS) equations. This is due to model-form errors (physical approximations) in RANS as well as the use of parameter values simply picked from literature (hence- forth, the nominal values of the parameters). Previous work on the Bayesian calibration of RANS models has yielded joint probability densities of C = (Cµ;Cϵ2;Cϵ1), the most influential parameters of the RANS equations. The calibrated values were far more predictive than the nominal parameter values and the advantage held across a range of freestream Mach numbers and jet strengths. In this work we perform Bayesian calibration across a range of Mach numbers and jet strengths and compare the joint densities, with a view of determining whether compressible jet-in-crossflow could be simulated with either a single joint probability density or a point estimate for C. We find that probability densities for ;Cϵ2 agree and also indicate that the range typically used in aerodynamic simulations should be extended. The densities for ;Cϵ1 agree, approximately, with the nominal value. The densities for ;Cµ do not show any clear trend, indicating that they are not strongly constrained by the calibration observables, and in turn, do not affect them much. We also compare the calibrated results to a recently developed analytical model of a jet-in-cross flow interaction. We find that the values of C estimated by the analytical model delivers prediction accuracies comparable to the calibrated joint densities of the parameters across a range of Mach numbers and jet strengths.

More Details

Robust bayesian calibration of a RANS model for jet-in-crossflow simulations

8th AIAA Theoretical Fluid Mechanics Conference 2017

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; Dechant, Lawrence

Compressible jet-in-crossflow interactions are poorly simulated using Reynolds-Averaged Navier Stokes (RANS) equations. This is due to model-form errors (physical approximations) in RANS as well as the use of parameter values simply picked from literature (hence- forth, the nominal values of the parameters). Previous work on the Bayesian calibration of RANS models has yielded joint probability densities of C = (Cµ;Cϵ2;Cϵ1), the most influential parameters of the RANS equations. The calibrated values were far more predictive than the nominal parameter values and the advantage held across a range of freestream Mach numbers and jet strengths. In this work we perform Bayesian calibration across a range of Mach numbers and jet strengths and compare the joint densities, with a view of determining whether compressible jet-in-crossflow could be simulated with either a single joint probability density or a point estimate for C. We find that probability densities for ;Cϵ2 agree and also indicate that the range typically used in aerodynamic simulations should be extended. The densities for ;Cϵ1 agree, approximately, with the nominal value. The densities for ;Cµ do not show any clear trend, indicating that they are not strongly constrained by the calibration observables, and in turn, do not affect them much. We also compare the calibrated results to a recently developed analytical model of a jet-in-cross flow interaction. We find that the values of C estimated by the analytical model delivers prediction accuracies comparable to the calibrated joint densities of the parameters across a range of Mach numbers and jet strengths.

More Details

A statistical approach for isolating fossil fuel emissions in atmospheric inverse problems

Journal of Geophysical Research

Yadav, Vineet; Michalak, Anna M.; Ray, Jaideep; Shiga, Yoichi P.

Independent verification and quantification of fossil fuel (FF) emissions constitutes a considerable scientific challenge. By coupling atmospheric observations of CO2 with models of atmospheric transport, inverse models offer the possibility of overcoming this challenge. However, disaggregating the biospheric and FF flux components of terrestrial fluxes from CO2 concentration measurements has proven to be difficult, due to observational and modeling limitations. In this study, we propose a statistical inverse modeling scheme for disaggregating winter time fluxes on the basis of their unique error covariances and covariates, where these covariances and covariates are representative of the underlying processes affecting FF and biospheric fluxes. The application of the method is demonstrated with one synthetic and two real data prototypical inversions by using in situ CO2 measurements over North America. Inversions are performed only for the month of January, as predominance of biospheric CO2 signal relative to FF CO2 signal and observational limitations preclude disaggregation of the fluxes in other months. The quality of disaggregation is assessed primarily through examination of a posteriori covariance between disaggregated FF and biospheric fluxes at regional scales. Findings indicate that the proposed method is able to robustly disaggregate fluxes regionally at monthly temporal resolution with a posteriori cross covariance lower than 0.15 µmolm-2 s-1 between FF and biospheric fluxes. Error covariance models and covariates based on temporally varying FF inventory data provide a more robust disaggregation over static proxies (e.g., nightlight intensity and population density). However, the synthetic data case study shows that disaggregation is possible even in absence of detailed temporally varying FF inventory data.

More Details

Online mapping and forecasting of epidemics using open-source indicators

Ray, Jaideep; Lefantzi, Sophia; Bauer, Joshua B.; Khalil, Mohammad; Rothfuss, Andrew J.; Cauthen, Katherine R.; Finley, Patrick D.; Smith, Halley

Open-source indicators have been proposed as a way of tracking and forecasting disease outbreaks. Some, such are meteorological data, are readily available as reanalysis products. Others, such as those derived from our online behavior (web searches, media article etc.) are gathered easily and are more timely than public health reporting. In this study we investigate how these datastreams may be combined to provide useful epidemiological information. The investigation is performed by building data assimilation systems to track influenza in California and dengue in India. The first does not suffer from incomplete data and was chosen to explore disease modeling needs. The second explores the case when observational data is sparse and disease modeling complexities are beside the point. The two test cases are for opposite ends of the disease tracking spectrum. We find that data assimilation systems that produce disease activity maps can be constructed. Further, being able to combine multiple open-source datastreams is a necessity as any one individually is not very informative. The data assimilation systems have very little in common except that they contain disease models, calibration algorithms and some ability to impute missing data. Thus while the data assimilation systems share the goal for accurate forecasting, they are practically designed to compensate for the shortcomings of the datastreams. Thus we expect them to be disease and location-specific.

More Details

Imputing data that are missing at high rates using a boosting algorithm

JSM Proceedings

Cauthen, Katherine R.; Lambert, Gregory; Ray, Jaideep; Lefantzi, Sophia

Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

More Details

Bayesian parameter estimation of a κ-ϵ Model for accurate jet-in-crossflow simulations

Journal of Aircraft

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; Dechant, Lawrence

Reynolds-Averaged Navier-Stokes models are not very accurate for high-Reynolds-number compressible jet-incrossflow interactions. The inaccuracy arises from the use of inappropriate model parameters and model-form errors in the Reynolds-Averaged Navier-Stokes model. In this work, the hypothesis is pursued that Reynolds-Averaged Navier-Stokes predictions can be significantly improved by using parameters inferred from experimental measurements of a supersonic jet interacting with a transonic crossflow.ABayesian inverse problem is formulated to estimate three Reynolds-Averaged Navier-Stokes parameters (Cμ;Cϵ2;Cϵ1), and a Markov chain Monte Carlo method is used to develop a probability density function for them. The cost of the Markov chain Monte Carlo is addressed by developing statistical surrogates for the Reynolds-Averaged Navier-Stokes model. It is found that only a subset of the (Cμ;Cϵ2;Cϵ1) spaceRsupports realistic flow simulations.Ris used as a prior belief when formulating the inverse problem. It is enforced with a classifier in the current Markov chain Monte Carlo solution. It is found that the calibrated parameters improve predictions of the entire flowfield substantially when compared to the nominal/ literature values of (Cμ;Cϵ2;Cϵ1); furthermore, this improvement is seen to hold for interactions at other Mach numbers and jet strengths for which the experimental data are available to provide a comparison. The residual error is quantifies, which is an approximation of the model-form error; it is most easily measured in terms of turbulent stresses.

More Details

A robust technique to make a 2D advection solver tolerant to soft faults

Procedia Computer Science

Strazdins, Peter; Harding, Brendan; Lee, Chung; Mayo, Jackson R.; Ray, Jaideep; Armstrong, Robert C.

We present a general technique to solve Partial Differential Equations, called robust stencils, which make them tolerant to soft faults, i.e. bit flips arising in memory or CPU calculations. We show how it can be applied to a two-dimensional Lax-Wendroff solver. The resulting 2D robust stencils are derived using an orthogonal application of their 1D counterparts. Combinations of 3 to 5 base stencils can then be created. We describe how these are then implemented in a parallel advection solver. Various robust stencil combinations are explored, representing tradeoff between performance and robustness. The results indicate that the 3-stencil robust combinations are slightly faster on large parallel workloads than Triple Modular Redundancy (TMR). They also have one third of the memory footprint. We expect the improvement to be significant if suitable optimizations are performed. Because faults are avoided each time new points are computed, the proposed stencils are also comparably robust to faults as TMR for a large range of error rates. The technique can be generalized to 3D (or higher dimensions) with similar benefits.

More Details

Decreasing the temporal complexity for nonlinear, implicit reduced-order models by forecasting

Computer Methods in Applied Mechanics and Engineering

Carlberg, Kevin T.; Ray, Jaideep; Van Bloemen Waanders, Bart

Implicit numerical integration of nonlinear ODEs requires solving a system of nonlinear algebraic equations at each time step. Each of these systems is often solved by a Newton-like method, which incurs a sequence of linear-system solves. Most model-reduction techniques for nonlinear ODEs exploit knowledge of a system's spatial behavior to reduce the computational complexity of each linear-system solve. However, the number of linear-system solves for the reduced-order simulation often remains roughly the same as that for the full-order simulation.We propose exploiting knowledge of the model's temporal behavior to (1) forecast the unknown variable of the reduced-order system of nonlinear equations at future time steps, and (2) use this forecast as an initial guess for the Newton-like solver during the reduced-order-model simulation. To compute the forecast, we propose using the Gappy POD technique. The goal is to generate an accurate initial guess so that the Newton solver requires many fewer iterations to converge, thereby decreasing the number of linear-system solves in the reduced-order-model simulation.

More Details

A sparse reconstruction method for the estimation of multi-resolution emission fields via atmospheric inversion

Geoscientific Model Development

Ray, Jaideep; Lee, Jina; Yadav, V.; Lefantzi, Sophia; Michalak, A.M.; Van Bloemen Waanders, Bart

Atmospheric inversions are frequently used to estimate fluxes of atmospheric greenhouse gases (e.g., biospheric CO2 flux fields) at Earth's surface. These inversions typically assume that flux departures from a prior model are spatially smoothly varying, which are then modeled using a multi-variate Gaussian. When the field being estimated is spatially rough, multi-variate Gaussian models are difficult to construct and a wavelet-based field model may be more suitable. Unfortunately, such models are very high dimensional and are most conveniently used when the estimation method can simultaneously perform data-driven model simplification (removal of model parameters that cannot be reliably estimated) and fitting. Such sparse reconstruction methods are typically not used in atmospheric inversions. In this work, we devise a sparse reconstruction method, and illustrate it in an idealized atmospheric inversion problem for the estimation of fossil fuel CO2 (ffCO2) emissions in the lower 48 states of the USA. Our new method is based on stagewise orthogonal matching pursuit (StOMP), a method used to reconstruct compressively sensed images. Our adaptations bestow three properties to the sparse reconstruction procedure which are useful in atmospheric inversions. We have modified StOMP to incorporate prior information on the emission field being estimated and to enforce non-negativity on the estimated field. Finally, though based on wavelets, our method allows for the estimation of fields in non-rectangular geometries, e.g., emission fields inside geographical and political boundaries. Our idealized inversions use a recently developed multi-resolution (i.e., wavelet-based) random field model developed for ffCO2 emissions and synthetic observations of ffCO2 concentrations from a limited set of measurement sites. We find that our method for limiting the estimated field within an irregularly shaped region is about a factor of 10 faster than conventional approaches. It also reduces the overall computational cost by a factor of 2. Further, the sparse reconstruction scheme imposes non-negativity without introducing strong nonlinearities, such as those introduced by employing log-transformed fields, and thus reaps the benefits of simplicity and computational speed that are characteristic of linear inverse problems.

More Details

Bayesian calibration of a RANS model with a complex response surface-a case study with jet-in-crossflow configuration

45th AIAA Fluid Dynamics Conference

Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; Dechant, Lawrence

We demonstrate a Bayesian method that can be used to calibrate computationally expensive 3D RANS models with complex response surfaces. Such calibrations, conditioned on experimental data, can yield turbulence model parameters as probability density functions (PDF), concisely capturing the uncertainty in the estimation. Methods such as Markov chain Monte Carlo construct the PDF by sampling, and consequently a quick-running surrogate is used instead of the RANS simulator. The surrogate can be very difficult to design if the model’s response i.e., the dependence of the calibration variable (the observable) on the parameters being estimated is complex. We show how the training data used to construct the surrogate models can also be employed to isolate a promising and physically realistic part of the parameter space, within which the response is well-behaved and easily modeled. We design a classifier, based on treed linear models, to model the “well-behaved region”. This classifier serves as a prior in a Bayesian calibration study aimed at estimating 3 k-ε parameters C = (Cμ, Cε2, Cε1) from experimental data of a transonic jet-in-crossflow interaction. The robustness of the calibration is investigated by checking its predictions of variables not included in the calibration data. We also check the limit of applicability of the calibration by testing at an off-calibration point.

More Details

Estimation of k-ε parameters using surrogate models and jet-in-crossflow data

Lefantzi, Sophia; Ray, Jaideep; Arunajatesan, Srinivasan; Dechant, Lawrence

We demonstrate a Bayesian method that can be used to calibrate computationally expensive 3D RANS (Reynolds Av- eraged Navier Stokes) models with complex response surfaces. Such calibrations, conditioned on experimental data, can yield turbulence model parameters as probability density functions (PDF), concisely capturing the uncertainty in the parameter estimates. Methods such as Markov chain Monte Carlo (MCMC) estimate the PDF by sampling, with each sample requiring a run of the RANS model. Consequently a quick-running surrogate is used instead to the RANS simulator. The surrogate can be very difficult to design if the model's response i.e., the dependence of the calibration variable (the observable) on the parameter being estimated is complex. We show how the training data used to construct the surrogate can be employed to isolate a promising and physically realistic part of the parameter space, within which the response is well-behaved and easily modeled. We design a classifier, based on treed linear models, to model the "well-behaved region". This classifier serves as a prior in a Bayesian calibration study aimed at estimating 3 k - ε parameters ( C μ, C ε2 , C ε1 ) from experimental data of a transonic jet-in-crossflow interaction. The robustness of the calibration is investigated by checking its predictions of variables not included in the cal- ibration data. We also check the limit of applicability of the calibration by testing at off-calibration flow regimes. We find that calibration yield turbulence model parameters which predict the flowfield far better than when the nomi- nal values of the parameters are used. Substantial improvements are still obtained when we use the calibrated RANS model to predict jet-in-crossflow at Mach numbers and jet strengths quite different from those used to generate the ex- perimental (calibration) data. Thus the primary reason for poor predictive skill of RANS, when using nominal values of the turbulence model parameters, was parametric uncertainty, which was rectified by calibration. Post-calibration, the dominant contribution to model inaccuraries are due to the structural errors in RANS.

More Details

Breaking Computational Barriers: Real-time Analysis and Optimization with Large-scale Nonlinear Models via Model Reduction

Drohmann, Martin; Tuminaro, Raymond S.; Boggs, Paul T.; Ray, Jaideep; Van Bloemen Waanders, Bart; Carlberg, Kevin T.

Model reduction for dynamical systems is a promising approach for reducing the computational cost of large-scale physics-based simulations to enable high-fidelity models to be used in many- query (e.g., Bayesian inference) and near-real-time (e.g., fast-turnaround simulation) contexts. While model reduction works well for specialized problems such as linear time-invariant systems, it is much more difficult to obtain accurate, stable, and efficient reduced-order models (ROMs) for systems with general nonlinearities. This report describes several advances that enable nonlinear reduced-order models (ROMs) to be deployed in a variety of time-critical settings. First, we present an error bound for the Gauss-Newton with Approximated Tensors (GNAT) nonlinear model reduction technique. This bound allows the state-space error for the GNAT method to be quantified when applied with the backward Euler time-integration scheme. Second, we present a methodology for preserving classical Lagrangian structure in nonlinear model reduction. This technique guarantees that important properties--such as energy conservation and symplectic time-evolution maps--are preserved when performing model reduction for models described by a Lagrangian formalism (e.g., molecular dynamics, structural dynamics). Third, we present a novel technique for decreasing the temporal complexity --defined as the number of Newton-like iterations performed over the course of the simulation--by exploiting time-domain data. Fourth, we describe a novel method for refining projection-based reduced-order models a posteriori using a goal-oriented framework similar to mesh-adaptive h -refinement in finite elements. The technique allows the ROM to generate arbitrarily accurate solutions, thereby providing the ROM with a 'failsafe' mechanism in the event of insufficient training data. Finally, we present the reduced-order model error surrogate (ROMES) method for statistically quantifying reduced- order-model errors. This enables ROMs to be rigorously incorporated in uncertainty-quantification settings, as the error model can be treated as a source of epistemic uncertainty. This work was completed as part of a Truman Fellowship appointment. We note that much additional work was performed as part of the Fellowship. One salient project is the development of the Trilinos-based model-reduction software module Razor , which is currently bundled with the Albany PDE code and currently allows nonlinear reduced-order models to be constructed for any application supported in Albany. Other important projects include the following: 1. ROMES-equipped ROMs for Bayesian inference: K. Carlberg, M. Drohmann, F. Lu (Lawrence Berkeley National Laboratory), M. Morzfeld (Lawrence Berkeley National Laboratory). 2. ROM-enabled Krylov-subspace recycling: K. Carlberg, V. Forstall (University of Maryland), P. Tsuji, R. Tuminaro. 3. A pseudo balanced POD method using only dual snapshots: K. Carlberg, M. Sarovar. 4. An analysis of discrete v. continuous optimality in nonlinear model reduction: K. Carlberg, M. Barone, H. Antil (George Mason University). Journal articles for these projects are in progress at the time of this writing.

More Details

A second-order coupled immersed boundary-SAMR construction for chemically reacting flow over a heat-conducting Cartesian grid-conforming solid

Journal of Computational Physics

Kedia, Kushal S.; Safta, Cosmin; Ray, Jaideep; Najm, Habib N.; Ghoniem, Ahmed F.

In this paper, we present a second-order numerical method for simulations of reacting flow around heat-conducting immersed solid objects. The method is coupled with a block-structured adaptive mesh refinement (SAMR) framework and a low-Mach number operator-split projection algorithm. A "buffer zone" methodology is introduced to impose the solid-fluid boundary conditions such that the solver uses symmetric derivatives and interpolation stencils throughout the interior of the numerical domain; irrespective of whether it describes fluid or solid cells. Solid cells are tracked using a binary marker function. The no-slip velocity boundary condition at the immersed wall is imposed using the staggered mesh. Near the immersed solid boundary, single-sided buffer zones (inside the solid) are created to resolve the species discontinuities, and dual buffer zones (inside and outside the solid) are created to capture the temperature gradient discontinuities. The development discussed in this paper is limited to a two-dimensional Cartesian grid-conforming solid. We validate the code using benchmark simulations documented in the literature. We also demonstrate the overall second-order convergence of our numerical method. To demonstrate its capability, a reacting flow simulation of a methane/air premixed flame stabilized on a channel-confined bluff-body using a detailed chemical kinetics model is discussed. © 2014 Elsevier Inc.

More Details

Bayesian calibration of the Community Land Model using surrogates

Ray, Jaideep; Swiler, Laura P.

We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.

More Details

Kalman-filtered compressive sensing for high resolution estimation of anthropogenic greenhouse gas emissions from sparse measurements

Ray, Jaideep; Lee, Jina; Lefantzi, Sophia; Van Bloemen Waanders, Bart

The estimation of fossil-fuel CO2 emissions (ffCO2) from limited ground-based and satellite measurements of CO2 concentrations will form a key component of the monitoring of treaties aimed at the abatement of greenhouse gas emissions. The limited nature of the measured data leads to a severely-underdetermined estimation problem. If the estimation is performed at fine spatial resolutions, it can also be computationally expensive. In order to enable such estimations, advances are needed in the spatial representation of ffCO2 emissions, scalable inversion algorithms and the identification of observables to measure. To that end, we investigate parsimonious spatial parameterizations of ffCO2 emissions which can be used in atmospheric inversions. We devise and test three random field models, based on wavelets, Gaussian kernels and covariance structures derived from easily-observed proxies of human activity. In doing so, we constructed a novel inversion algorithm, based on compressive sensing and sparse reconstruction, to perform the estimation. We also address scalable ensemble Kalman filters as an inversion mechanism and quantify the impact of Gaussian assumptions inherent in them. We find that the assumption does not impact the estimates of mean ffCO2 source strengths appreciably, but a comparison with Markov chain Monte Carlo estimates show significant differences in the variance of the source strengths. Finally, we study if the very different spatial natures of biogenic and ffCO2 emissions can be used to estimate them, in a disaggregated fashion, solely from CO2 concentration measurements, without extra information from products of incomplete combustion e.g., CO. We find that this is possible during the winter months, though the errors can be as large as 50%.

More Details

Tuning a RANS k-e model for jet-in-crossflow simulations

Ray, Jaideep; Arunajatesan, Srinivasan; Dechant, Lawrence

We develop a novel calibration approach to address the problem of predictive ke RANS simulations of jet-incrossflow. Our approach is based on the hypothesis that predictive ke parameters can be obtained by estimating them from a strongly vortical flow, specifically, flow over a square cylinder. In this study, we estimate three ke parameters, C%CE%BC, Ce2 and Ce1 by fitting 2D RANS simulations to experimental data. We use polynomial surrogates of 2D RANS for this purpose. We conduct an ensemble of 2D RANS runs using samples of (C%CE%BC;Ce2;Ce1) and regress Reynolds stresses to the samples using a simple polynomial. We then use this surrogate of the 2D RANS model to infer a joint distribution for the ke parameters by solving a Bayesian inverse problem, conditioned on the experimental data. The calibrated (C%CE%BC;Ce2;Ce1) distribution is used to seed an ensemble of 3D jet-in-crossflow simulations. We compare the ensemble's predictions of the flowfield, at two planes, to PIV measurements and estimate the predictive skill of the calibrated 3D RANS model. We also compare it against 3D RANS predictions using the nominal (uncalibrated) values of (C%CE%BC;Ce2;Ce1), and find that calibration delivers a significant improvement to the predictive skill of the 3D RANS model. We repeat the calibration using surrogate models based on kriging and find that the calibration, based on these more accurate models, is not much better that those obtained with simple polynomial surrogates. We discuss the reasons for this rather surprising outcome.

More Details

A multiresolution spatial parametrization for the estimation of fossil-fuel carbon dioxide emissions via atmospheric inversions

Ray, Jaideep; Lee, Jina; Lefantzi, Sophia; Van Bloemen Waanders, Bart

The estimation of fossil-fuel CO2 emissions (ffCO2) from limited ground-based and satellite measurements of CO2 concentrations will form a key component of the monitoring of treaties aimed at the abatement of greenhouse gas emissions. To that end, we construct a multiresolution spatial parametrization for fossil-fuel CO2 emissions (ffCO2), to be used in atmospheric inversions. Such a parametrization does not currently exist. The parametrization uses wavelets to accurately capture the multiscale, nonstationary nature of ffCO2 emissions and employs proxies of human habitation, e.g., images of lights at night and maps of built-up areas to reduce the dimensionality of the multiresolution parametrization. The parametrization is used in a synthetic data inversion to test its suitability for use in atmospheric inverse problem. This linear inverse problem is predicated on observations of ffCO2 concentrations collected at measurement towers. We adapt a convex optimization technique, commonly used in the reconstruction of compressively sensed images, to perform sparse reconstruction of the time-variant ffCO2 emission field. We also borrow concepts from compressive sensing to impose boundary conditions i.e., to limit ffCO2 emissions within an irregularly shaped region (the United States, in our case). We find that the optimization algorithm performs a data-driven sparsification of the spatial parametrization and retains only of those wavelets whose weights could be estimated from the observations. Further, our method for the imposition of boundary conditions leads to a 10computational saving over conventional means of doing so. We conclude with a discussion of the accuracy of the estimated emissions and the suitability of the spatial parametrization for use in inverse problems with a significant degree of regularization.

More Details

Nowcasting influenza outbreaks using open-source media report

Ray, Jaideep

We construct and verify a statistical method to nowcast influenza activity from a time-series of the frequency of reports concerning influenza related topics. Such reports are published electronically by both public health organizations as well as newspapers/media sources, and thus can be harvested easily via web crawlers. Since media reports are timely, whereas reports from public health organization are delayed by at least two weeks, using timely, open-source data to compensate for the lag in %E2%80%9Cofficial%E2%80%9D reports can be useful. We use morbidity data from networks of sentinel physicians (both the Center of Disease Control's ILINet and France's Sentinelles network) as the gold standard of influenza-like illness (ILI) activity. The time-series of media reports is obtained from HealthMap (http://healthmap.org). We find that the time-series of media reports shows some correlation ( 0.5) with ILI activity; further, this can be leveraged into an autoregressive moving average model with exogenous inputs (ARMAX model) to nowcast ILI activity. We find that the ARMAX models have more predictive skill compared to autoregressive (AR) models fitted to ILI data i.e., it is possible to exploit the information content in the open-source data. We also find that when the open-source data are non-informative, the ARMAX models reproduce the performance of AR models. The statistical models are tested on data from the 2009 swine-flu outbreak as well as the mild 2011-2012 influenza season in the U.S.A.

More Details

Are we there yet? When to stop a Markov chain while generating random graphs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ray, Jaideep; Pinar, Ali; Comandur, Seshadhri

Markov chains are convenient means of generating realizations of networks with a given (joint or otherwise) degree distribution, since they simply require a procedure for rewiring edges. The major challenge is to find the right number of steps to run such a chain, so that we generate truly independent samples. Theoretical bounds for mixing times of these Markov chains are too large to be practically useful. Practitioners have no useful guide for choosing the length, and tend to pick numbers fairly arbitrarily. We give a principled mathematical argument showing that it suffices for the length to be proportional to the number of desired number of edges. We also prescribe a method for choosing this proportionality constant. We run a series of experiments showing that the distributions of common graph properties converge in this time, providing empirical evidence for our claims. © 2012 Springer-Verlag.

More Details

Efficient uncertainty quantification methodologies for high-dimensional climate land models

Sargsyan, Khachik; Safta, Cosmin; Berry, Robert D.; Ray, Jaideep; Debusschere, Bert; Najm, Habib N.

In this report, we proposed, examined and implemented approaches for performing efficient uncertainty quantification (UQ) in climate land models. Specifically, we applied Bayesian compressive sensing framework to a polynomial chaos spectral expansions, enhanced it with an iterative algorithm of basis reduction, and investigated the results on test models as well as on the community land model (CLM). Furthermore, we discussed construction of efficient quadrature rules for forward propagation of uncertainties from high-dimensional, constrained input space to output quantities of interest. The work lays grounds for efficient forward UQ for high-dimensional, strongly non-linear and computationally costly climate models. Moreover, to investigate parameter inference approaches, we have applied two variants of the Markov chain Monte Carlo (MCMC) method to a soil moisture dynamics submodel of the CLM. The evaluation of these algorithms gave us a good foundation for further building out the Bayesian calibration framework towards the goal of robust component-wise calibration.

More Details

Bayesian data assimilation for stochastic multiscale models of transport in porous media

Lefantzi, Sophia; Klise, Katherine A.; Salazar, Luke; Mckenna, Sean A.; Van Bloemen Waanders, Bart; Ray, Jaideep

We investigate Bayesian techniques that can be used to reconstruct field variables from partial observations. In particular, we target fields that exhibit spatial structures with a large spectrum of lengthscales. Contemporary methods typically describe the field on a grid and estimate structures which can be resolved by it. In contrast, we address the reconstruction of grid-resolved structures as well as estimation of statistical summaries of subgrid structures, which are smaller than the grid resolution. We perform this in two different ways (a) via a physical (phenomenological), parameterized subgrid model that summarizes the impact of the unresolved scales at the coarse level and (b) via multiscale finite elements, where specially designed prolongation and restriction operators establish the interscale link between the same problem defined on a coarse and fine mesh. The estimation problem is posed as a Bayesian inverse problem. Dimensionality reduction is performed by projecting the field to be inferred on a suitable orthogonal basis set, viz. the Karhunen-Loeve expansion of a multiGaussian. We first demonstrate our techniques on the reconstruction of a binary medium consisting of a matrix with embedded inclusions, which are too small to be grid-resolved. The reconstruction is performed using an adaptive Markov chain Monte Carlo method. We find that the posterior distributions of the inferred parameters are approximately Gaussian. We exploit this finding to reconstruct a permeability field with long, but narrow embedded fractures (which are too fine to be grid-resolved) using scalable ensemble Kalman filters; this also allows us to address larger grids. Ensemble Kalman filtering is then used to estimate the values of hydraulic conductivity and specific yield in a model of the High Plains Aquifer in Kansas. Strong conditioning of the spatial structure of the parameters and the non-linear aspects of the water table aquifer create difficulty for the ensemble Kalman filter. We conclude with a demonstration of the use of multiscale stochastic finite elements to reconstruct permeability fields. This method, though computationally intensive, is general and can be used for multiscale inference in cases where a subgrid model cannot be constructed.

More Details

Deriving a model for influenza epidemics from historical data

Ray, Jaideep

In this report we describe how we create a model for influenza epidemics from historical data collected from both civilian and military societies. We derive the model when the population of the society is unknown but the size of the epidemic is known. Our interest lies in estimating a time-dependent infection rate to within a multiplicative constant. The model form fitted is chosen for its similarity to published models for HIV and plague, enabling application of Bayesian techniques to discriminate among infectious agents during an emerging epidemic. We have developed models for the progression of influenza in human populations. The model is framed as a integral, and predicts the number of people who exhibit symptoms and seek care over a given time-period. The start and end of the time period form the limits of integration. The disease progression model, in turn, contains parameterized models for the incubation period and a time-dependent infection rate. The incubation period model is obtained from literature, and the parameters of the infection rate are fitted from historical data including both military and civilian populations. The calibrated infection rate models display a marked difference in which the 1918 Spanish Influenza pandemic differed from the influenza seasons in the US between 2001-2008 and the progression of H1N1 in Catalunya, Spain. The data for the 1918 pandemic was obtained from military populations, while the rest are country-wide or province-wide data from the twenty-first century. We see that the initial growth of infection in all cases were about the same; however, military populations were able to control the epidemic much faster i.e., the decay of the infection-rate curve is much higher. It is not clear whether this was because of the much higher level of organization present in a military society or the seriousness with which the 1918 pandemic was addressed. Each outbreak to which the influenza model was fitted yields a separate set of parameter values. We suggest 'consensus' parameter values for military and civilian populations in the form of normal distributions so that they may be further used in other applications. Representing the parameter values as distributions, instead of point values, allows us to capture the uncertainty and scatter in the parameters. Quantifying the uncertainty allows us to use these models further in inverse problems, predictions under uncertainty and various other studies involving risk.

More Details

Real-time characterization of partially observed epidemics using surrogate models

Safta, Cosmin; Ray, Jaideep; Sargsyan, Khachik; Lefantzi, Sophia

We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiological parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.

More Details
Results 1–200 of 228
Results 1–200 of 228