Subsurface energy activities such as unconventional resource recovery, enhanced geothermal energy systems, and geologic carbon storage require fast and reliable methods to account for complex, multiphysical processes in heterogeneous fractured and porous media. Although reservoir simulation is considered the industry standard for simulating these subsurface systems with injection and/or extraction operations, reservoir simulation requires spatio-temporal “Big Data” into the simulation model, which is typically a major challenge during model development and computational phase. In this work, we developed and applied various deep neural network-based approaches to (1) process multiscale image segmentation, (2) generate ensemble members of drainage networks, flow channels, and porous media using deep convolutional generative adversarial network, (3) construct multiple hybrid neural networks such as convolutional LSTM and convolutional neural network-LSTM to develop fast and accurate reduced order models for shale gas extraction, and (4) physics-informed neural network and deep Q-learning for flow and energy production. We hypothesized that physicsbased machine learning/deep learning can overcome the shortcomings of traditional machine learning methods where data-driven models have faltered beyond the data and physical conditions used for training and validation. We improved and developed novel approaches to demonstrate that physics-based ML can allow us to incorporate physical constraints (e.g., scientific domain knowledge) into ML framework. Outcomes of this project will be readily applicable for many energy and national security problems that are particularly defined by multiscale features and network systems.
The main goal of this project was to create a state-of-the-art predictive capability that screens and identifies wellbores that are at the highest risk of catastrophic failure. This capability is critical to a host of subsurface applications, including gas storage, hydrocarbon extraction and storage, geothermal energy development, and waste disposal, which depend on seal integrity to meet U.S. energy demands in a safe and secure manner. In addition to the screening tool, this project also developed several other supporting capabilities to help understand fundamental processes involved in wellbore failure. This included novel experimental methods to characterize permeability and porosity evolution during compressive failure of cement, as well as methods and capabilities for understanding two-phase flow in damaged wellbore systems, and novel fracture-resistant cements made from recycled fibers.
Concerns about cyber threats to space systems are increasing. Researchers are developing intrusion detection and protection systems to mitigate these threats, but sparsity of cyber threat data poses a significant challenge to these efforts. Development of credible threat data sets are needed to overcome this challenge. This paper describes the extension/development of three data generation algorithms (generative adversarial networks, variational auto-encoders, and generative algorithm for multi-variate timeseries) to generate cyber threat data for space systems. The algorithms are applied to a use case that leverages the NASA Operational Simulation for Small Satellites (NOS$^{3})$ platform. Qualitative and quantitative measures are applied to evaluate the generated data. Strengths and weaknesses of each algorithm are presented, and suggested improvements are provided. For this use case, generative algorithm for multi-variate timeseries performed best according to both qualitative and quantitative measures.
Concerns about cyber threats to space systems are increasing. Researchers are developing intrusion detection and protection systems to mitigate these threats, but sparsity of cyber threat data poses a significant challenge to these efforts. Development of credible threat data sets are needed to overcome this challenge. This paper describes the extension/development of three data generation algorithms (generative adversarial networks, variational auto-encoders, and generative algorithm for multi-variate timeseries) to generate cyber threat data for space systems. The algorithms are applied to a use case that leverages the NASA Operational Simulation for Small Satellites (NOS$^{3})$ platform. Qualitative and quantitative measures are applied to evaluate the generated data. Strengths and weaknesses of each algorithm are presented, and suggested improvements are provided. For this use case, generative algorithm for multi-variate timeseries performed best according to both qualitative and quantitative measures.
Approximately 93% of US total energy supply is dependent on wellbores in some form. The industry will drill more wells in next ten years than in the last 100 years (King, 2014). Global well population is around 1.8 million of which approximately 35% has some signs of leakage (i.e. sustained casing pressure). Around 5% of offshore oil and gas wells “fail” early, more with age and most with maturity. 8.9% of “shale gas” wells in the Marcellus play have experienced failure (120 out of 1,346 wells drilled in 2012) (Ingraffea et al., 2014). Current methods for identifying wells that are at highest priority for increased monitoring and/or at highest risk for failure consists of “hand” analysis of multi-arm caliper (MAC) well logging data and geomechanical models. Machine learning (ML) methods are of interest to explore feasibility for increasing analysis efficiency and/or enhanced detection of precursors to failure (e.g. deformations). MAC datasets used to train ML algorithms and preliminary tests were run for “predicting” casing collar locations and performed above 90% in classification and identifying of casing collar locations.
Approximately 93% of US total energy supply is dependent on wellbores in some form. The industry will drill more wells in next ten years than in the last 100 years (King, 2014). Global well population is around 1.8 million of which approximately 35% has some signs of leakage (i.e. sustained casing pressure). Around 5% of offshore oil and gas wells “fail” early, more with age and most with maturity. 8.9% of “shale gas” wells in the Marcellus play have experienced failure (120 out of 1,346 wells drilled in 2012) (Ingraffea et al., 2014). Current methods for identifying wells that are at highest priority for increased monitoring and/or at highest risk for failure consists of “hand” analysis of multi-arm caliper (MAC) well logging data and geomechanical models. Machine learning (ML) methods are of interest to explore feasibility for increasing analysis efficiency and/or enhanced detection of precursors to failure (e.g. deformations). MAC datasets used to train ML algorithms and preliminary tests were run for “predicting” casing collar locations and performed above 90% in classification and identifying of casing collar locations.
Estimation of permeability in porous media is fundamental to understanding coupled multi-physics processes critical to various geoscience and environmental applications. Recent emerging machine learning methods with physics-based constraints and/or physical properties can provide a new means to improve computational efficiency while improving machine learning-based prediction by accounting for physical information during training. Here we first used three-dimensional (3D) real rock images to estimate permeability of fractured and porous media using 3D convolutional neural networks (CNNs) coupled with physics-informed pore topology characteristics (e.g., porosity, surface area, connectivity) during the training stage. Training data of permeability were generated using lattice Boltzmann simulations of segmented real rock 3D images. Our preliminary results show that neural network architecture and usage of physical properties strongly impact the accuracy of permeability predictions. In the future we can adjust our methodology to other rock types by choosing the appropriate architecture and proper physical properties, and optimizing the hyperparameters.
We describe efforts in generating synthetic malware samples that have specified behaviors that can then be used to train a machine learning (ML) algorithm to detect behaviors in malware. The idea behind detecting behaviors is that a set of core behaviors exists that are often shared in many malware variants and that being able to detect behaviors will improve the detection of novel malware. However, empirically the multi-label task of detecting behaviors is significantly more difficult than malware classification, only achieving on average 84% accuracy across all behaviors as opposed to the greater than 95% multi-class or binary accuracy reported in many malware detection studies. One of the difficulties in identifying behaviors is that while there are ample malware samples, most data sources do not include behavioral labels, which means that generally there is insufficient training data for behavior identification. Inspired by the success of generative models in improving image processing techniques, we examine and extend a 1) conditional variational auto-encoder and 2) a flow-based generative model for malware generation with behavior labels. Initial experiments indicate that synthetic data is able to capture behavioral information and increase the recall of behaviors in novel malware from 32% to 45% without increasing false positives and to 52% with increased false positives.
Machine learning (ML) techniques are being used to detect increasing amounts of malware and variants. Despite successful applications of ML, we hypothesize that the full potential of ML is not realized in malware analysis (MA) due to a semantic gap between the ML and MA communities-as demonstrated in the data that is used. Due in part to the available data, ML has primarily focused on detection whereas MA is also interested in identifying behaviors. We review existing open-source malware datasets used in ML and find a lack of behavioral information that could facilitate stronger impact by ML in MA. As a first step in bridging this gap, we label existing data with behavioral information using open-source MA reports-1) altering the analysis from identifying malware to identifying behaviors, 2)~aligning ML better with MA, and 3)~allowing ML models to generalize to novel malware in a zero/few-shot learning manner. We classify the behavior of a malware family not seen during training using transfer learning from a state-of-the-art model for malware family classification and achieve 57%-84% accuracy on behavioral identification but fail to outperform the baseline set by a majority class predictor. This highlights opportunities for improvement on this task related to the data representation, the need for malware specific ML techniques, and a larger training set of malware samples labeled with behaviors.
Permeability prediction of porous media system is very important in many engineering and science domains including earth materials, bio-, solid-materials, and energy applications. In this work we evaluated how machine learning can be used to predict the permeability of porous media with physical properties. An emerging challenge for machine learning/deep learning in engineering and scientific research is the ability to incorporate physics into machine learning process. We used convolutional neural networks (CNNs) to train a set of image data of bead packing and additional physical properties such as porosity and surface area of porous media are used as training data either by feeding them to the fully connected network directly or through the multilayer perception network. Our results clearly show that the optimal neural network architecture and implementation of physics-informed constraints are important to properly improve the model prediction of permeability. A comprehensive analysis of hyperparameters with different CNN architectures and the data implementation scheme of the physical properties need to be performed to optimize our learning system for various porous media system.