Publications Search

The capability to identify emergent technologies based upon easily accessed open-source indicators, such as publications, is important for decision-makers in industry and government. The scientific contribution of this work is the proposition of a machine learning approach to the detection of the maturity of emerging technologies based on publication counts. Time-series of publication counts have universal features that distinguish emerging and growing technologies. We train an artificial neural network classifier, a supervised machine learning algorithm, upon these features to predict the maturity (emergent vs. growth) of an arbitrary technology. With a training set comprised of 22 technologies we obtain a classification accuracy ranging from 58.3% to 100% with an average accuracy of 84.6% for six test technologies. To enhance classifier performance, we augmented the training corpus with synthetic time-series technology life cycle curves, formed by calculating weighted averages of curves in the original training set. Training the classifier on the synthetic data set resulted in improved accuracy, ranging from 83.3% to 100% with an average accuracy of 90.4% for the test technologies. The performance of our classifier exceeds that of competing machine learning approaches in the literature, which report an average classification accuracy of only 85.7% at maximum. Moreover, in contrast to current methods our approach does not require subject matter expertise to generate training labels, and it can be automated and scaled.

More Details

TYPE Journal Article YEAR 2022

DOI OSTI Scopus

Anomaly Detection Pipeline

Gilletly, Samuel D.; Cauthen, Katherine R.; Mott, Joshua R.; Brown, Nathanael J.K.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Automated EWMA Anomaly Detection Pipeline

Proceedings of the American Control Conference

Gilletly, Samuel D.; Cauthen, Katherine R.; Mott, Joshua R.; Brown, Nathanael J.K.

There is a need to perform offline anomaly detection in count data streams to simultaneously identify both systemic changes and outliers, simultaneously. We propose a new algorithmic method, called the Anomaly Detection Pipeline, which leverages common statistical process control procedures in a novel way to accomplish this. The method we propose does not require user-defined control or phase I training data, automatically identifying regions of stability for improved parameter estimation to support change point detection. The method does not require data to be normally distributed, and it detects outliers relative to the regimes in which they occur. Our proposed method performs comparably to state-of-the-art change point detection methods, provides additional capabilities, and is extendable to a larger set of possible data streams than known methods.

More Details

TYPE Conference Paper YEAR 2022

DOI OSTI Scopus

Comparison of distribution selection methods

Communications in Statistics: Simulation and Computation

Chiew, Esther; Cauthen, Katherine R.; Brown, Nathanael J.K.; Nozick, Linda

Many methods have been suggested to choose between distributions. There has been relatively less study to examine whether these methods accurately recover the distributions being studied. Hence, this research compares several popular distribution selection methods through a Monte Carlo simulation study and identifies which are robust for several types of discrete probability distributions. In addition, we study whether it matters that the distribution selection method does not accurately pick the correct probability distribution by calculating the expected distance, which is the amount of information lost for each distribution selection method compared to the generating probability distribution.

More Details

TYPE Journal Article YEAR 2022

DOI OSTI Scopus

Predictive Data-driven Platform for Subsurface Energy Production

Yoon, Hongkyu; Verzi, Stephen J.; Cauthen, Katherine R.; Musuvathy, Srideep S.; Melander, Darryl J.; Norland, Kyle; Morales, Adriana M.; Lee, Jonghyun; Sun, Alexander

Subsurface energy activities such as unconventional resource recovery, enhanced geothermal energy systems, and geologic carbon storage require fast and reliable methods to account for complex, multiphysical processes in heterogeneous fractured and porous media. Although reservoir simulation is considered the industry standard for simulating these subsurface systems with injection and/or extraction operations, reservoir simulation requires spatio-temporal “Big Data” into the simulation model, which is typically a major challenge during model development and computational phase. In this work, we developed and applied various deep neural network-based approaches to (1) process multiscale image segmentation, (2) generate ensemble members of drainage networks, flow channels, and porous media using deep convolutional generative adversarial network, (3) construct multiple hybrid neural networks such as convolutional LSTM and convolutional neural network-LSTM to develop fast and accurate reduced order models for shale gas extraction, and (4) physics-informed neural network and deep Q-learning for flow and energy production. We hypothesized that physicsbased machine learning/deep learning can overcome the shortcomings of traditional machine learning methods where data-driven models have faltered beyond the data and physical conditions used for training and validation. We improved and developed novel approaches to demonstrate that physics-based ML can allow us to incorporate physical constraints (e.g., scientific domain knowledge) into ML framework. Outcomes of this project will be readily applicable for many energy and national security problems that are particularly defined by multiscale features and network systems.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

DETECTING COMMUNITIES AND ATTRIBUTING PURPOSE TO HUMAN MOBILITY DATA

John, Esther W.L.; Cauthen, Katherine R.; Brown, Nathanael J.K.; Nozick, Linda

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI

Multi-Int Data Fusion for Proliferation Detection (NSARD 2021 presentation [final])

Brown, Nathanael J.K.; Hoffman, Matthew J.; Bussell, Sammy J.; Gilletly, Samuel D.; Cauthen, Katherine R.; John, Esther W.L.; Nozick, Linda

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Detecting Communities and Attributing Purpose to Human Mobility Data

Proceedings - Winter Simulation Conference

John, Esther W.L.; Cauthen, Katherine R.; Brown, Nathanael J.K.; Nozick, Linda

Many individuals' mobility can be characterized by strong patterns of regular movements and is influenced by social relationships. Social networks are also often organized into overlapping communities which are associated in time or space. We develop a model that can generate the structure of a social network and attribute purpose to individuals' movements, based solely on records of individuals' locations over time. This model distinguishes the attributed purpose of check-ins based on temporal and spatial patterns in check-in data. Because a location-based social network dataset with authoritative ground-truth to test our entire model does not exist, we generate large scale datasets containing social networks and individual check-in data to test our model. We find that our model reliably assigns community purpose to social check-in data, and is robust over a variety of different situations.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Hybrid CNN-LSTM Framework for Predicting Subsurface Energy Production

Kaikaus, Jamshed; Cauthen, Katherine R.; Yoon, Hongkyu

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

ANN classifier for technological maturity based on bibliometric patterns

Cauthen, Katherine R.; Rai, Prashant; Ray, Jaideep

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Detecting Communities and Attributing Purpose to Human Mobility Data

Chiew, Esther; Cauthen, Katherine R.; Brown, Nathanael J.K.; Nozick, Linda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI OSTI

Modeling Mobility in Socieal Networks with Overlapping Communities (NSARD slide)

Brown, Nathanael J.K.; Cauthen, Katherine R.; Nozick, Linda; Chiew, Esther

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Modeling Mobility in Social Networks with Overlapping Communities (NSARD poster)

Brown, Nathanael J.K.; Cauthen, Katherine R.; Nozick, Linda; Chiew, Esther

More Details

TYPE Presentation YEAR 2020

OSTI

A Unique Similarity Metric for Anomaly Detection in Temporal Networks

Brown, Nathanael J.K.; Cauthen, Katherine R.; Durfee, Justin D.; Frazier, Christopher R.; Nozick, Linda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

A Unique Similarity Metric for Anomaly Detection in Social Networks

Brown, Nathanael J.K.; Frazier, Christopher R.; Cauthen, Katherine R.; Nozick, Linda

More Details

TYPE Conference Poster YEAR 2019

OSTI

Implications of Power Plant Idling and Cycling on Water Use Intensity

Environmental Science and Technology

Tidwell, Vincent C.; Shaneyfelt, Calvin; Cauthen, Katherine R.; Klise, Geoffrey T.; Fields, Fletcher; Clement, Zachary; Bauer, Diana

Survey data from the Energy Information Administration (EIA) was combined with data from the Environmental Protection Agency (EPA) to explore ways in which operations might impact water use intensity (both withdrawals and consumption) at thermoelectric power plants. Two disparities in cooling and power systems operations were identified that could impact water use intensity: (1) Idling Gap - where cooling systems continue to operate when their boilers and generators are completely idled; and (2) Cycling Gap - where cooling systems operate at full capacity, while their associated boiler and generator systems cycle over a range of loads. Analysis of the EIA and EPA data indicated that cooling systems operated on average 13% more than their corresponding power system (Idling Gap), while power systems operated on average 30% below full load when the boiler was reported as operating (Cycling Gap). Regression analysis was then performed to explore whether the degree of power plant idling/cycling could be related to the physical characteristics of the plant, its environment or time of year. While results suggested that individual power plants' operations were unique, weak trends consistently pointed to a plant's place on the dispatch curve as influencing patterns of cooling system, boiler, and generator operation. This insight better positions us to interpret reported power plant water use data as well as improve future water use projections.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI Scopus DOI OSTI Scopus

A Unique Graph Similarity Metric for Anomaly Detection

Brown, Nathanael J.K.; Frazier, Christopher R.; Cauthen, Katherine R.; Nozick, Linda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A Unique Graph Similarity Metric for Anomaly Detection

Brown, Nathanael J.K.; Frazier, Christopher R.; Cauthen, Katherine R.; Nozick, Linda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Conditioning multi-model ensembles for disease forecasting

Ray, Jaideep; Cauthen, Katherine R.; Lefantzi, Sophia; Burks, Lynne

In this study we investigate how an ensemble of disease models can be conditioned to observational data, in a bid to improve its predictive skill. We use the ensemble of influenza forecasting models gathered by the US Centers for Disease Control and Prevention (CDC) as the exemplar. This ensemble is used every year to forecast the annual influenza outbreak in the United States. The models constituting this ensemble draw on very different modeling assumptions and approximations and are a diverse collection of methods to approximate epidemiological dynamics. Currently, each models' predictions are accorded the same importance, or weight, when compiling the ensemble's forecast. We consider this equally-weighted ensemble as the baseline case which has to be improved upon. In this study, we explore whether an ensemble forecast can be improved by "conditioning" the ensemble to whatever observational data is available from the ongoing outbreak. "Conditioning" can imply according the ensemble's members different weights which evolve over time, or simply perform the forecast using the top k (equally-weighted) models. In the latter case, the composition of the "top-k-see of models evolves over time. This is called "model averaging" in statistics. We explore four methods to perform model-averaging, three of which are new. We find that the CDC ensemble responds best to the "top-k-models" approach to model-averaging. All the new MA methods perform better than the baseline equally-weighted ensemble. The four model-averaging methods treat the models as black-boxes and simply use their forecasts as inputs i.e., one does not need access to the models at all, but rather only their forecasts. The model-averaging approaches reviewed in this report thus form a general framework for model-averaging any model ensemble.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Publications

Search results