Publications

54 Results

Search results

Jump to search filters

Accelerating phase-field predictions via recurrent neural networks learning the microstructure evolution in latent space

Computer Methods in Applied Mechanics and Engineering

Hu, Chongze; Martin, Shawn; Dingreville, Remi

The phase-field method is a popular modeling technique used to describe the dynamics of microstructures and their physical properties at the mesoscale. However, because in these simulations the microstructure is described by a system of continuous variables evolving both in space and time, phase-field models are computationally expensive. They require refined spatio-temporal discretization and a parallel computing approach to achieve a useful degree of accuracy. As an alternative, we present and discuss an accelerated phase-field approach which uses a recurrent neural network (RNN) to learn the microstructure evolution in latent space. We perform a comprehensive analysis of different dimensionality-reduction methods and types of recurrent units in RNNs. Specifically, we compare statistical functions combined with linear and nonlinear embedding techniques to represent the microstructure evolution in latent space. We also evaluate several RNN models that implement a gating mechanism, including the long short-term memory (LSTM) unit and the gated recurrent unit (GRU) as the microstructure-learning engine. We analyze the different combinations of these methods on the spinodal decomposition of a two-phase system. Our comparison reveals that describing the microstructure evolution in latent space using an autocorrelation-based principal component analysis (PCA) method is the most efficient. We find that the LSTM and GRU RNN implementations provide comparable accuracy with respect to the high-fidelity phase-field predictions, but with a considerable computational speedup relative to the full simulation. This study not only enhances our understanding of the performance of dimensionality reduction on the microstructure evolution, but it also provides insights on strategies for accelerating phase-field modeling via machine learning techniques.

More Details

Dial-A-Cluster User Manual

Crossno, Patricia J.; Gittinger, Jaxon M.; Hunt, Warren L.; Letter, Matthew; Martin, Shawn; Sielicki, Milosz

The Dial-A-Cluster (DAC) model allows interactive visualization of multivariate time series data. A multivariate time series dataset consists of an ensemble of data points, where each data point consists of a set of time series curves. The example of a DAC dataset used in this guide is a collection of 100 cities in the United States, where each city collects a year's worth of weather data, including daily temperature, humidity, and wind speed measurements.

More Details

Rapid Response Data Science for COVID-19

Bandlow, Alisa; Bauer, Travis L.; Crossno, Patricia J.; Garcia, Rudy J.; Astuto Gribble, Lisa A.; Hernandez, Patricia M.; Martin, Shawn; Mcclain, Jonathan T.; Patrizi, Laura P.

This report describes the results of a seven day effort to assist subject matter experts address a problem related to COVID-19. In the course of this effort, we analyzed the 29K documents provided as part of the White House's call to action. This involved applying a variety of natural language processing techniques and compression-based analytics in combination with visualization techniques and assessment with subject matter experts to pursue answers to a specific question. In this paper, we will describe the algorithms, the software, the study performed, and availability of the software developed during the effort.

More Details

VideoSwarm: Analyzing video ensembles

IS and T International Symposium on Electronic Imaging Science and Technology

Martin, Shawn; Sielicki, Milosz; Gittinger, Jaxon M.; Letter, Matthew; Hunt, Warren L.; Crossno, Patricia J.

We present VideoSwarm, a system for visualizing video ensembles generated by numerical simulations. VideoSwarm is a web application, where linked views of the ensemble each represent the data using a different level of abstraction. VideoSwarm uses multidimensional scaling to reveal relationships between a set of simulations relative to a single moment in time, and to show the evolution of video similarities over a span of time. VideoSwarm is a plug-in for Slycat, a web-based visualization framework which provides a web-server, database, and Python infrastructure. The Slycat framework provides support for managing multiple users, maintains access control, and requires only a Slycat supported commodity browser (such as Firefox, Chrome, or Safari).

More Details

CAD DEFEATURING USING MACHINE LEARNING

Proceedings of the 28th International Meshing Roundtable, IMR 2019

Owen, Steven J.; Shead, Timothy M.; Martin, Shawn

We describe new machine-learning-based methods to defeature CAD models for tetrahedral meshing. Using machine learning predictions of mesh quality for geometric features of a CAD model prior to meshing we can identify potential problem areas and improve meshing outcomes by presenting a prioritized list of suggested geometric operations to users. Our machine learning models are trained using a combination of geometric and topological features from the CAD model and local quality metrics for ground truth. We demonstrate a proof-of-concept implementation of the resulting work ow using Sandia's Cubit Geometry and Meshing Toolkit.

More Details

Slycat™ User Manual

Crossno, Patricia J.; Gittinger, Jaxon M.; Hunt, Warren L.; Letter, Matthew; Martin, Shawn; Sielicki, Milosz

Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features across the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.

More Details

Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors

Molecular Informatics

Martin, Shawn; Foulk, James W.; Anderson, Travis M.

We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18th, 2014. The dataset consists of 4,329 measurements taken from 165 ILs made up of 72 cations and 34 anions. We benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset.

More Details

Machine learning models of errors in large eddy simulation predictions of surface pressure fluctuations

47th AIAA Fluid Dynamics Conference, 2017

Barone, Matthew F.; Fike, Jeffrey; Chowdhary, Kenny; Davis, Warren L.; Ling, Julia; Martin, Shawn

We investigate a novel application of deep neural networks to modeling of errors in prediction of surface pressure fluctuations beneath a compressible, turbulent flow. In this context, the truth solution is given by Direct Numerical Simulation (DNS) data, while the predictive model is a wall-modeled Large Eddy Simulation (LES). The neural network provides a means to map relevant statistical flow-features within the LES solution to errors in prediction of wall pressure spectra. We simulate a number of flat plate turbulent boundary layers using both DNS and wall-modeled LES to build up a database with which to train the neural network. We then apply machine learning techniques to develop an optimized neural network model for the error in terms of relevant flow features.

More Details

Visualizing Wind Farm Wakes Using SCADA Data

Martin, Shawn; Westergaard, Carsten H.; White, Jonathan R.; Karlson, Benjamin

As wind farms scale to include more and more turbines, questions about turbine wake interactions become increasingly important. Turbine wakes reduce wind speed and downwind turbines suffer decreased performance. The cumulative effect of the wakes throughout a wind farm will therefore decrease the performance of the entire farm. These interactions are dynamic and complicated, and it is difficult to quantify the overall effect of the wakes. This problem has attracted some attention in terms of computational modelling for siting turbines on new farms, but less attention in terms of empirical studies and performance validation of existing farms. In this report, Supervisory Control and Data Acquisition (SCADA) data from an existing wind farm is analyzed in order to explore methods for documenting wake interactions. Visualization techniques are proposed and used to analyze wakes in a 67 turbine farm. The visualizations are based on directional analysis using power measurements, and can be considered to be normalized capacity factors below rated power. Wind speed measurements are not used in the analysis except for data pre-processing. Four wake effects are observed; including wake deficit, channel speed up, and two potentially new effects, single and multiple shear point speed up. In addition, an attempt is made to quantify wake losses using the same SCADA data. Power losses for the specific wind farm investigated are relatively low, estimated to be in the range of 3-5%. Finally, a simple model based on the wind farm geometrical layout is proposed. Key parameters for the model have been estimated by comparing wake profiles at different ranges and making some ad hoc assumptions. A preliminary comparison of six selected profiles shows excellent agreement with the model. Where discrepancies are observed, reasonable explanations can be found in multi-turbine speedup effects and landscape features, which are yet to be modelled.

More Details

Continuous Reliability Enhancement for Wind (CREW). Program Update

Karlson, Benjamin; Carter, Charles; Martin, Shawn; Westergaard, Carsten

Sandia's Continuous Reliability Enhancement for Wind (CREW) Program is a follow on project to the Wind Plant Reliability Database and Analysis Program. The goal of CREW is to characterize the reliability performance of the US fleet to serve as a basis for improved reliability and increased availability of turbines. This document states the objectives of CREW and describes how data collected for CREW will be used in analysis. A critical aspect to the success of the CREW project is data input from participating owner/operators. The level of detail and the quality of input data provided dictates the type of analysis that can be accomplished. Options for analysis range from high level availability summaries to detailed analysis of failure modes for individual equipment items. Specific types of input data are identified followed by samples of the type of output that can be expected along with a discussion of benefits to the user community.

More Details

Interactive visualization of multivariate time series data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Martin, Shawn; Quach, Tu-Toan

Organizing multivariate time series data for presentation to an analyst is a challenging task. Typically, a dataset contains hundreds or thousands of datapoints, and each datapoint consists of dozens of time series measurements. Analysts are interested in how the datapoints are related, which measurements drive trends and/or produce clusters, and how the clusters are related to available metadata. In addition, interest in particular time series measurements will change depending on what the analyst is trying to understand about the dataset. Rather than providing a monolithic single use machine learning solution, we have developed a system that encourages analyst interaction. This system, Dial-A-Cluster (DAC), uses multidimensional scaling to provide a visualization of the datapoints depending on distance measures provided for each time series. The analyst can interactively adjust (dial) the relative influence of each time series to change the visualization (and resulting clusters). Additional computations are provided which optimize the visualization according to metadata of interest and rank time series measurements according to their influence on analyst selected clusters. The DAC system is a plug-in for Slycat (slycat.readthedocs.org), a framework which provides a web server, database, and Python infrastructure. The DAC web application allows an analyst to keep track of multiple datasets and interact with each as described above. It requires no installation, runs on any platform, and enables analyst collaboration. We anticipate an open source release in the near future.

More Details

Encoding and Analyzing Aerial Imagery Using Geospatial Semantic Graphs

Rintoul, Mark D.; Watson, Jean-Paul; Mclendon, William; Parekh, Ojas D.; Martin, Shawn

While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.

More Details

Predicting building contamination using machine learning

Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007

Martin, Shawn; Mckenna, Sean A.

Potential events involving biological or chemical contamination of buildings are of major concern in the area of homeland security. Tools are needed to provide rapid, onsite predictions of contaminant levels given only approximate measurements in limited locations throughout a building. In principal, such tools could use calculations based on physical process models to provide accurate predictions. In practice, however, physical process models are too complex and computationally costly to be used in a real-time scenario. In this paper, we investigate the feasibility of using machine learning to provide easily computed but approximate models that would be applicable in the field. We develop a machine learning method based on Support Vector Machine regression and classification. We apply our method to problems of estimating contamination levels and contaminant source location. © 2007 IEEE.

More Details

Innovative high pressure gas MEM's based neutron detector for ICF and active SNM detection

Chandler, Gordon A.; Renzi, Ronald F.; Derzon, Mark S.; Martin, Shawn

An innovative helium3 high pressure gas detection system, made possible by utilizing Sandia's expertise in Micro-electrical Mechanical fluidic systems, is proposed which appears to have many beneficial performance characteristics with regards to making these neutron measurements in the high bremsstrahlung and electrical noise environments found in High Energy Density Physics experiments and especially on the very high noise environment generated on the fast pulsed power experiments performed here at Sandia. This same system may dramatically improve active WMD and contraband detection as well when employed with ultrafast (10-50 ns) pulsed neutron sources.

More Details

Boolean dynamics of genetic regulatory networks inferred from microarray time series data

Bioinformatics

Martin, Shawn; Zhang, Zhaoduo Z.; Martino, Anthony; Faulon, Jean-Loup M.

Motivation: Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this article we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction. Results: We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our method first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation-inhibition networks to match the discretized data. Finally, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics. © The Author 2007. Published by Oxford University Press. All rights reserved.

More Details

Developing algorithms for predicting protein-protein interactions of homology modeled proteins

Roe, Diana C.; Sale, Kenneth L.; Faulon, Jean-Loup M.; Martin, Shawn

The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

More Details

Reverse engineering biological networks :applications in immune responses to bio-toxins

Faulon, Jean-Loup M.; Zhang, Zhaoduo Z.; Martino, Anthony; Timlin, Jerilyn A.; Haaland, David M.; Martin, Shawn; Davidson, George S.; May, Elebeoba; Slepoy, Alexander S.

Our aim is to determine the network of events, or the regulatory network, that defines an immune response to a bio-toxin. As a model system, we are studying T cell regulatory network triggered through tyrosine kinase receptor activation using a combination of pathway stimulation and time-series microarray experiments. Our approach is composed of five steps (1) microarray experiments and data error analysis, (2) data clustering, (3) data smoothing and discretization, (4) network reverse engineering, and (5) network dynamics analysis and fingerprint identification. The technological outcome of this study is a suite of experimental protocols and computational tools that reverse engineer regulatory networks provided gene expression data. The practical biological outcome of this work is an immune response fingerprint in terms of gene expression levels. Inferring regulatory networks from microarray data is a new field of investigation that is no more than five years old. To the best of our knowledge, this work is the first attempt that integrates experiments, error analyses, data clustering, inference, and network analysis to solve a practical problem. Our systematic approach of counting, enumeration, and sampling networks matching experimental data is new to the field of network reverse engineering. The resulting mathematical analyses and computational tools lead to new results on their own and should be useful to others who analyze and infer networks.

More Details

Application of multidisciplinary analysis to gene expression

Davidson, George S.; Haaland, David M.; Martin, Shawn

Molecular analysis of cancer, at the genomic level, could lead to individualized patient diagnostics and treatments. The developments to follow will signal a significant paradigm shift in the clinical management of human cancer. Despite our initial hopes, however, it seems that simple analysis of microarray data cannot elucidate clinically significant gene functions and mechanisms. Extracting biological information from microarray data requires a complicated path involving multidisciplinary teams of biomedical researchers, computer scientists, mathematicians, statisticians, and computational linguists. The integration of the diverse outputs of each team is the limiting factor in the progress to discover candidate genes and pathways associated with the molecular biology of cancer. Specifically, one must deal with sets of significant genes identified by each method and extract whatever useful information may be found by comparing these different gene lists. Here we present our experience with such comparisons, and share methods developed in the analysis of an infant leukemia cohort studied on Affymetrix HG-U95A arrays. In particular, spatial gene clustering, hyper-dimensional projections, and computational linguistics were used to compare different gene lists. In spatial gene clustering, different gene lists are grouped together and visualized on a three-dimensional expression map, where genes with similar expressions are co-located. In another approach, projections from gene expression space onto a sphere clarify how groups of genes can jointly have more predictive power than groups of individually selected genes. Finally, online literature is automatically rearranged to present information about genes common to multiple groups, or to contrast the differences between the lists. The combination of these methods has improved our understanding of infant leukemia. While the complicated reality of the biology dashed our initial, optimistic hopes for simple answers from microarrays, we have made progress by combining very different analytic approaches.

More Details

High throughput instruments, methods, and informatics for systems biology

Davidson, George S.; Sinclair, Michael B.; Thomas, Edward V.; Werner-Washburne, Margaret C.; Martin, Shawn; Boyack, Kevin W.; Wylie, Brian N.; Haaland, David M.; Timlin, Jerilyn A.; Keenan, Michael R.

High throughput instruments and analysis techniques are required in order to make good use of the genomic sequences that have recently become available for many species, including humans. These instruments and methods must work with tens of thousands of genes simultaneously, and must be able to identify the small subsets of those genes that are implicated in the observed phenotypes, or, for instance, in responses to therapies. Microarrays represent one such high throughput method, which continue to find increasingly broad application. This project has improved microarray technology in several important areas. First, we developed the hyperspectral scanner, which has discovered and diagnosed numerous flaws in techniques broadly employed by microarray researchers. Second, we used a series of statistically designed experiments to identify and correct errors in our microarray data to dramatically improve the accuracy, precision, and repeatability of the microarray gene expression data. Third, our research developed new informatics techniques to identify genes with significantly different expression levels. Finally, natural language processing techniques were applied to improve our ability to make use of online literature annotating the important genes. In combination, this research has improved the reliability and precision of laboratory methods and instruments, while also enabling substantially faster analysis and discovery.

More Details

Kernel Near Principal Component Analysis

Martin, Shawn

We propose a novel algorithm based on Principal Component Analysis (PCA). First, we present an interesting approximation of PCA using Gram-Schmidt orthonormalization. Next, we combine our approximation with the kernel functions from Support Vector Machines (SVMs) to provide a nonlinear generalization of PCA. After benchmarking our algorithm in the linear case, we explore its use in both the linear and nonlinear cases. We include applications to face data analysis, handwritten digit recognition, and fluid flow.

More Details
54 Results
54 Results