Publications

Results 1–25 of 34

Search results

Jump to search filters

Detecting technological maturity from bibliometric patterns

Expert Systems with Applications

Cauthen, Katherine R.; Rai, Prashant; Hale, Nicholas; Freeman, Laura; Ray, Jaideep R.

The capability to identify emergent technologies based upon easily accessed open-source indicators, such as publications, is important for decision-makers in industry and government. The scientific contribution of this work is the proposition of a machine learning approach to the detection of the maturity of emerging technologies based on publication counts. Time-series of publication counts have universal features that distinguish emerging and growing technologies. We train an artificial neural network classifier, a supervised machine learning algorithm, upon these features to predict the maturity (emergent vs. growth) of an arbitrary technology. With a training set comprised of 22 technologies we obtain a classification accuracy ranging from 58.3% to 100% with an average accuracy of 84.6% for six test technologies. To enhance classifier performance, we augmented the training corpus with synthetic time-series technology life cycle curves, formed by calculating weighted averages of curves in the original training set. Training the classifier on the synthetic data set resulted in improved accuracy, ranging from 83.3% to 100% with an average accuracy of 90.4% for the test technologies. The performance of our classifier exceeds that of competing machine learning approaches in the literature, which report an average classification accuracy of only 85.7% at maximum. Moreover, in contrast to current methods our approach does not require subject matter expertise to generate training labels, and it can be automated and scaled.

More Details

Automated EWMA Anomaly Detection Pipeline

Proceedings of the American Control Conference

Gilletly, Samuel G.; Cauthen, Katherine R.; Mott, Joshua; Brown, Nathanael J.

There is a need to perform offline anomaly detection in count data streams to simultaneously identify both systemic changes and outliers, simultaneously. We propose a new algorithmic method, called the Anomaly Detection Pipeline, which leverages common statistical process control procedures in a novel way to accomplish this. The method we propose does not require user-defined control or phase I training data, automatically identifying regions of stability for improved parameter estimation to support change point detection. The method does not require data to be normally distributed, and it detects outliers relative to the regimes in which they occur. Our proposed method performs comparably to state-of-the-art change point detection methods, provides additional capabilities, and is extendable to a larger set of possible data streams than known methods.

More Details

Comparison of distribution selection methods

Communications in Statistics: Simulation and Computation

Chiew, Esther; Cauthen, Katherine R.; Brown, Nathanael J.; Nozick, Linda K.

Many methods have been suggested to choose between distributions. There has been relatively less study to examine whether these methods accurately recover the distributions being studied. Hence, this research compares several popular distribution selection methods through a Monte Carlo simulation study and identifies which are robust for several types of discrete probability distributions. In addition, we study whether it matters that the distribution selection method does not accurately pick the correct probability distribution by calculating the expected distance, which is the amount of information lost for each distribution selection method compared to the generating probability distribution.

More Details

Predictive Data-driven Platform for Subsurface Energy Production

Yoon, Hongkyu Y.; Verzi, Stephen J.; Cauthen, Katherine R.; Musuvathy, Srideep M.; Melander, Darryl J.; Norland, Kyle; Morales, Adriana M.; Lee, Jonghyun; Sun, Alexander

Subsurface energy activities such as unconventional resource recovery, enhanced geothermal energy systems, and geologic carbon storage require fast and reliable methods to account for complex, multiphysical processes in heterogeneous fractured and porous media. Although reservoir simulation is considered the industry standard for simulating these subsurface systems with injection and/or extraction operations, reservoir simulation requires spatio-temporal “Big Data” into the simulation model, which is typically a major challenge during model development and computational phase. In this work, we developed and applied various deep neural network-based approaches to (1) process multiscale image segmentation, (2) generate ensemble members of drainage networks, flow channels, and porous media using deep convolutional generative adversarial network, (3) construct multiple hybrid neural networks such as convolutional LSTM and convolutional neural network-LSTM to develop fast and accurate reduced order models for shale gas extraction, and (4) physics-informed neural network and deep Q-learning for flow and energy production. We hypothesized that physicsbased machine learning/deep learning can overcome the shortcomings of traditional machine learning methods where data-driven models have faltered beyond the data and physical conditions used for training and validation. We improved and developed novel approaches to demonstrate that physics-based ML can allow us to incorporate physical constraints (e.g., scientific domain knowledge) into ML framework. Outcomes of this project will be readily applicable for many energy and national security problems that are particularly defined by multiscale features and network systems.

More Details

Detecting Communities and Attributing Purpose to Human Mobility Data

Proceedings - Winter Simulation Conference

John, Esther W.; Cauthen, Katherine R.; Brown, Nathanael J.; Nozick, Linda K.

Many individuals' mobility can be characterized by strong patterns of regular movements and is influenced by social relationships. Social networks are also often organized into overlapping communities which are associated in time or space. We develop a model that can generate the structure of a social network and attribute purpose to individuals' movements, based solely on records of individuals' locations over time. This model distinguishes the attributed purpose of check-ins based on temporal and spatial patterns in check-in data. Because a location-based social network dataset with authoritative ground-truth to test our entire model does not exist, we generate large scale datasets containing social networks and individual check-in data to test our model. We find that our model reliably assigns community purpose to social check-in data, and is robust over a variety of different situations.

More Details

Implications of Power Plant Idling and Cycling on Water Use Intensity

Environmental Science and Technology

Tidwell, Vincent C.; Shaneyfelt, Calvin; Cauthen, Katherine R.; Klise, Geoffrey T.; Fields, Fletcher; Clement, Zachary; Bauer, Diana

Survey data from the Energy Information Administration (EIA) was combined with data from the Environmental Protection Agency (EPA) to explore ways in which operations might impact water use intensity (both withdrawals and consumption) at thermoelectric power plants. Two disparities in cooling and power systems operations were identified that could impact water use intensity: (1) Idling Gap - where cooling systems continue to operate when their boilers and generators are completely idled; and (2) Cycling Gap - where cooling systems operate at full capacity, while their associated boiler and generator systems cycle over a range of loads. Analysis of the EIA and EPA data indicated that cooling systems operated on average 13% more than their corresponding power system (Idling Gap), while power systems operated on average 30% below full load when the boiler was reported as operating (Cycling Gap). Regression analysis was then performed to explore whether the degree of power plant idling/cycling could be related to the physical characteristics of the plant, its environment or time of year. While results suggested that individual power plants' operations were unique, weak trends consistently pointed to a plant's place on the dispatch curve as influencing patterns of cooling system, boiler, and generator operation. This insight better positions us to interpret reported power plant water use data as well as improve future water use projections.

More Details

Conditioning multi-model ensembles for disease forecasting

Ray, Jaideep R.; Cauthen, Katherine R.; Lefantzi, Sophia L.; Burks, Lynne

In this study we investigate how an ensemble of disease models can be conditioned to observational data, in a bid to improve its predictive skill. We use the ensemble of influenza forecasting models gathered by the US Centers for Disease Control and Prevention (CDC) as the exemplar. This ensemble is used every year to forecast the annual influenza outbreak in the United States. The models constituting this ensemble draw on very different modeling assumptions and approximations and are a diverse collection of methods to approximate epidemiological dynamics. Currently, each models' predictions are accorded the same importance, or weight, when compiling the ensemble's forecast. We consider this equally-weighted ensemble as the baseline case which has to be improved upon. In this study, we explore whether an ensemble forecast can be improved by "conditioning" the ensemble to whatever observational data is available from the ongoing outbreak. "Conditioning" can imply according the ensemble's members different weights which evolve over time, or simply perform the forecast using the top k (equally-weighted) models. In the latter case, the composition of the "top-k-see of models evolves over time. This is called "model averaging" in statistics. We explore four methods to perform model-averaging, three of which are new. We find that the CDC ensemble responds best to the "top-k-models" approach to model-averaging. All the new MA methods perform better than the baseline equally-weighted ensemble. The four model-averaging methods treat the models as black-boxes and simply use their forecasts as inputs i.e., one does not need access to the models at all, but rather only their forecasts. The model-averaging approaches reviewed in this report thus form a general framework for model-averaging any model ensemble.

More Details

Integrated Cyber/Physical Grid Resiliency Modeling

Dawson, Lon A.; Verzi, Stephen J.; Levin, Drew L.; Melander, Darryl J.; Sorensen, Asael H.; Cauthen, Katherine R.; Wilches-Bernal, Felipe; Berg, Timothy M.; Lavrova, Olga A.; Guttromson, Ross G.

This project explored coupling modeling and analysis methods from multiple domains to address complex hybrid (cyber and physical) attacks on mission critical infrastructure. Robust methods to integrate these complex systems are necessary to enable large trade-space exploration including dynamic and evolving cyber threats and mitigations. Reinforcement learning employing deep neural networks, as in the AlphaGo Zero solution, was used to identify "best" (or approximately optimal) resilience strategies for operation of a cyber/physical grid model. A prototype platform was developed and the machine learning (ML) algorithm was made to play itself in a game of 'Hurt the Grid'. This proof of concept shows that machine learning optimization can help us understand and control complex, multi-dimensional grid space. A simple, yet high-fidelity model proves that the data have spatial correlation which is necessary for any optimization or control. Our prototype analysis showed that the reinforcement learning successfully improved adversary and defender knowledge to manipulate the grid. When expanded to more representative models, this exact type of machine learning will inform grid operations and defense - supporting mitigation development to defend the grid from complex cyber attacks! This same research can be expanded to similar complex domains.

More Details

Final Documentation: Incident Management And Probabilities Courses of action Tool (IMPACT)

Edwards, Donna M.; Ray, Jaideep R.; Tucker, Mark D.; Whetzel, Jonathan H.; Cauthen, Katherine R.

This report pulls together the documentation produced for the IMPACT tool, a software-based decision support tool that provides situational awareness, incident characterization, and guidance on public health and environmental response strategies for an unfolding bio-terrorism incident.

More Details
Results 1–25 of 34
Results 1–25 of 34