Publications

63 Results
Skip to search filters

TAFI/Kebab End of Project Report

Rintoul, Mark D.; Wisniewski, Kyra L.; Ward, Katrina J.; Khanna, Kanad K.

This report focuses on the two primary goals set forth in Sandia’s TAFI effort, referred to here under the name Kebab. The first goal is to overlay a trajectory onto a large database of historical trajectories, all with very different sampling rates than the original track. We demonstrate a fast method to accomplish this, even for databases that hold over a million tracks. The second goal is to then demonstrate that these matched historical trajectories can be used to make predictions about unknown qualities associated with the original trajectory. As part of this work, we also examine the problem of defining the qualities of a trajectory in a reproducible way.

More Details

Large-Scale Trajectory Analysis via Feature Vectors

Rintoul, Mark D.; Jones, Jessica L.; Newton, Benjamin D.; Wisniewski, Kyra L.; Wilson, Andrew T.; Ginaldi, Melissa J.; Waddell, Cleveland A.; Goss, Kenneth G.; Ward, Katrina J.

The explosion of both sensors and GPS-enabled devices has resulted in position/time data being the next big frontier for data analytics. However, many of the problems associated with large numbers of trajectories do not necessarily have an analog with many of the historic big-data applications such as text and image analysis. Modern trajectory analytics exploits much of the cutting-edge research in machine-learning, statistics, computational geometry and other disciplines. We will show that for doing trajectory analytics at scale, it is necessary to fundamentally change the way the information is represented through a feature-vector approach. We then demonstrate the ability to solve large trajectory analytics problems using this representation.

More Details

Dynamic Multi-Sensor Multi-Mission Optimal Planning Tool

Valicka, Christopher G.; Rowe, Stephen R.; Zou, Simon Z.; Mitchell, Scott A.; Irelan, William R.; Pollard, Eric L.; Garcia, Deanna G.; Hackebeil, Gabriel A.; Staid, Andrea S.; Rintoul, Mark D.; Watson, Jean-Paul W.; Hart, William E.; Rathinam, Sivakumar R.; Ntaimo, Lewis N.

Remote sensing systems have firmly established a role in providing immense value to commercial industry, scientific exploration, and national security. Continued maturation of sensing technology has reduced the cost of deploying highly-capable sensors while at the same time increased reliance on the information these sensors can provide. The demand for time on these sensors is unlikely to diminish. Coordination of next-generation sensor systems, larger constellations of satellites, unmanned aerial vehicles, ground telescopes, etc. is prohibitively complex for existing heuristics- based scheduling techniques. The project was a two-year collaboration spanning multiple Sandia centers and included a partnership with Texas A&M University. We have developed algorithms and software for collection scheduling, remote sensor field-of-view pointing models, and bandwidth- constrained prioritization of sensor data. Our approach followed best practices from the operations research and computational geometry communities. These models provide several advantages over state of the art techniques. In particular, our approach is more flexible compared to heuristics that tightly couple models and solution techniques. First, our mixed-integer linear models afford a rig- orous analysis so that sensor planners can quantitatively describe a schedule relative to the best possible. Optimal or near-optimal schedules can be produced with commercial solvers in opera- tional run-times. These models can be modified and extended to incorporate different scheduling and resource constraints and objective function definitions. Further, we have extended these mod- els to proactively schedule sensors under weather and ad hoc collection uncertainty. This approach stands in contrast to existing deterministic schedulers which assume a single future weather or ad hoc collection scenario. The field-of-view pointing algorithm produces a mosaic with the fewest number of images required to fully cover a region of interest. The bandwidth-constrained al- gorithms find the highest priority information that can be transmitted. All of these are based on mixed-integer linear programs so that, in the future, collection scheduling, field-of-view, and band- width prioritization can be combined into a single problem. Experiments conducted using the de- veloped models, commercial solvers, and benchmark datasets have demonstrated that proactively scheduling against uncertainty regularly and significantly outperforms deterministic schedulers. Acknowledgement We would like to acknowledge John T. Feddema, Brian N. Post, John H. Ganter, and Swaroop Darbha for providing critical project stewardship and fruitful remote sensing utilization discus- sions. A special thanks to Mohamed S. Ebeida for his contributions to the development of the Maximal Poisson Sampling technique. We would also like to thank Kaarthik Sundar and Jianglei Qin for their significant scheduling algorithm and model development contributions to the project. The authors would like to acknowledge the Sandia LDRD program for their support of this work. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Cor- poration, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

More Details

Time series discord detection in medical data using a parallel relational database

Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015

Woodbridge, Diane W.; Wilson, Andrew T.; Rintoul, Mark D.; Goldstein, Richard H.

Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients' emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithms on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.

More Details

Time Series Discord Detection in Medical Data using a Parallel Relational Database

Woodbridge, Diane W.; Rintoul, Mark D.; Wilson, Andrew T.; Goldstein, Richard H.

Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithms on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.

More Details

Trajectory analysis via a geometric feature space approach

Statistical Analysis and Data Mining

Rintoul, Mark D.; Wilson, Andrew T.

This study aimed to organize a body of trajectories in order to identify, search for and classify both common and uncommon behaviors among objects such as aircraft and ships. Existing comparison functions such as the Fréchet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as the total distance traveled and the distance between start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans who are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to identify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories and identify outliers.

More Details

PANTHER. Trajectory Analysis

Rintoul, Mark D.; Wilson, Andrew T.; Valicka, Christopher G.; Kegelmeyer, William P.; Shead, Timothy M.; Czuchlewski, Kristina R.; Newton, Benjamin D.

We want to organize a body of trajectories in order to identify, search for, classify and predict behavior among objects such as aircraft and ships. Existing compari- son functions such as the Fr'echet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as total distance traveled and distance be- tween start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans that are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to iden- tify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories, predict destination and apply unsupervised machine learning algorithms.

More Details

A computational framework for ontologically storing and analyzing very large overhead image sets

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2014

Brost, Randolph B.; Rintoul, Mark D.; McLendon, William C.; Strip, David R.; Parekh, Ojas D.; Woodbridge, Diane W.

We describe a computational approach to remote sensing image analysis that addresses many of the classic problems associated with storage, search, and query. This process starts by automatically annotating the fundamental objects in the image data set that will be used as a basis for an ontology, including both the objects (such as building, road, water, etc.) and their spatial and temporal relationships (is within 100 m of, is surrounded by, has changed in the past year, etc.). Data sets that can include multiple time slices of the same area are then processed using automated tools that reduce the images to the objects and relationships defined in an ontology based on the primitive objects, and this representation is stored in a geospatial-temporal semantic graph. Image searches are then defined in terms of the ontology (e.g. find a building greater than 103 m2 that borders a body of water), and the graph is searched for such relationships. This approach also enables the incorporation of non-image data that is related to the ontology. We demonstrate through an initial implementation of the entire system on large data sets (109 - 1011 pixels) that this system is robust against variations in di?erent image collection parameters, provides a way for analysts to query data sets in a more natural way, and can greatly reduce the memory footprint of the search.

More Details

Encoding and analyzing aerial imagery using geospatial semantic graphs

Rintoul, Mark D.; Watson, Jean-Paul W.; McLendon, William C.; Parekh, Ojas D.

While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.

More Details

Evaluating parallel relational databases for medical data analysis

Wilson, Andrew T.; Rintoul, Mark D.

Hospitals have always generated and consumed large amounts of data concerning patients, treatment and outcomes. As computers and networks have permeated the hospital environment it has become feasible to collect and organize all of this data. This raises naturally the question of how to deal with the resulting mountain of information. In this report we detail a proof-of-concept test using two commercially available parallel database systems to analyze a set of real, de-identified medical records. We examine database scalability as data sizes increase as well as responsiveness under load from multiple users.

More Details

Genomes to Life Project Quarterly Report April 2005

Heffelfinger, Grant S.; Martino, Anthony M.; Rintoul, Mark D.

This SAND report provides the technical progress through April 2005 of the Sandia-led project, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling," funded by the DOE Office of Science Genomics:GTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these - 4 -pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. The ultimate goal of this effort is develop and apply new experimental and computational methods needed to generate a new level of understanding of how the Synechococcus genome affects carbon fixation at the global scale. Anticipated experimental and computational methods will provide ever-increasing insight about the individual elements and steps in the carbon fixation process, however relating an organism's genome to its cellular response in the presence of varying environments will require systems biology approaches. Thus a primary goal for this effort is to integrate the genomic data generated from experiments and lower level simulations with data from the existing body of literature into a whole cell model. We plan to accomplish this by developing and applying a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats. These challenges are unprecedented in high performance scientific computing and necessitate the development of a companion computational infrastructure to support this effort. More information about this project can be found at www.genomes-to-life.org Acknowledgment We want to gratefully acknowledge the contributions of: Grant Heffelfinger1*, Anthony Martino2, Brian Palenik6, Andrey Gorin3, Ying Xu10,3, Mark Daniel Rintoul1, Al Geist3, Matthew Ennis1, with Pratul Agrawal3, Hashim Al-Hashimi8, Andrea Belgrano12, Mike Brown1, Xin Chen9, Paul Crozier1, PguongAn Dam10, Jean-Loup Faulon2, Damian Gessler12, David Haaland1, Victor Havin4, C.F. Huang5, Tao Jiang9, Howland Jones1, David Jung3, Katherine Kang14, Michael Langston15, Shawn Martin1, Shawn Means1, Vijaya Natarajan4, Roy Nielson5, Frank Olken4, Victor Olman10, Ian Paulsen14, Steve Plimpton1, Andreas Reichsteiner5, Nagiza Samatova3, Arie Shoshani4, Michael Sinclair1, Alex Slepoy1, Shawn Stevens8, Charlie Strauss5, Zhengchang Su10, Ed Thomas1, Jerilyn Timlin1, WimVermaas13, Xiufeng Wan11, HongWei Wu10, Dong Xu11, Grover Yip8, Erik Zuiderweg8 *Author to whom correspondence should be addressed (gsheffe@sandia.gov) 1. Sandia National Laboratories, Albuquerque, NM 2. Sandia National Laboratories, Livermore, CA 3. Oak Ridge National Laboratory, Oak Ridge, TN 4. Lawrence Berkeley National Laboratory, Berkeley, CA 5. Los Alamos National Laboratory, Los Alamos, NM 6. University of California, San Diego 7. University of Illinois, Urbana/Champaign 8. University of Michigan, Ann Arbor 9. University of California, Riverside 10. University of Georgia, Athens 11. University of Missouri, Columbia 12. National Center for Genome Resources, Santa Fe, NM 13. Arizona State University 14. The Institute for Genomic Research 15. University of Tennessee 5 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL8500.

More Details

Genomics :GTL project quarterly report April 2005

Heffelfinger, Grant S.; Martino, Anthony M.; Rintoul, Mark D.

This SAND report provides the technical progress through April 2005 of the Sandia-led project, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling'', funded by the DOE Office of Science GenomicsGTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO{sub 2} are important terms in the global environmental response to anthropogenic atmospheric inputs of CO{sub 2} and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microamy experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. The ultimate goal of this effort is develop and apply new experimental and computational methods needed to generate a new level of understanding of how the Synechococcus genome affects carbon fixation at the global scale. Anticipated experimental and computational methods will provide ever-increasing insight about the individual elements and steps in the carbon fixation process, however relating an organism's genome to its cellular response in the presence of varying environments will require systems biology approaches. Thus a primary goal for this effort is to integrate the genomic data generated from experiments and lower level simulations with data from the existing body of literature into a whole cell model. We plan to accomplish this by developing and applying a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats. These challenges are unprecedented in high performance scientific computing and necessitate the development of a companion computational infrastructure to support this effort.

More Details

Deciphering the genetic regulatory code using an inverse error control coding framework

May, Elebeoba E.; Johnston, Anna M.; Watson, Jean-Paul W.; Hart, William E.; Rintoul, Mark D.

We have found that developing a computational framework for reconstructing error control codes for engineered data and ultimately for deciphering genetic regulatory coding sequences is a challenging and uncharted area that will require advances in computational technology for exact solutions. Although exact solutions are desired, computational approaches that yield plausible solutions would be considered sufficient as a proof of concept to the feasibility of reverse engineering error control codes and the possibility of developing a quantitative model for understanding and engineering genetic regulation. Such evidence would help move the idea of reconstructing error control codes for engineered and biological systems from the high risk high payoff realm into the highly probable high payoff domain. Additionally this work will impact biological sensor development and the ability to model and ultimately develop defense mechanisms against bioagents that can be engineered to cause catastrophic damage. Understanding how biological organisms are able to communicate their genetic message efficiently in the presence of noise can improve our current communication protocols, a continuing research interest. Towards this end, project goals include: (1) Develop parameter estimation methods for n for block codes and for n, k, and m for convolutional codes. Use methods to determine error control (EC) code parameters for gene regulatory sequence. (2) Develop an evolutionary computing computational framework for near-optimal solutions to the algebraic code reconstruction problem. Method will be tested on engineered and biological sequences.

More Details

Genomes to Life Project Quartely Report October 2004

Heffelfinger, Grant S.; Martino, Anthony M.; Rintoul, Mark D.

This SAND report provides the technical progress through October 2004 of the Sandia-led project, %22Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling,%22 funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these - 4 - pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. The ultimate goal of this effort is develop and apply new experimental and computational methods needed to generate a new level of understanding of how the Synechococcus genome affects carbon fixation at the global scale. Anticipated experimental and computational methods will provide ever-increasing insight about the individual elements and steps in the carbon fixation process, however relating an organism's genome to its cellular response in the presence of varying environments will require systems biology approaches. Thus a primary goal for this effort is to integrate the genomic data generated from experiments and lower level simulations with data from the existing body of literature into a whole cell model. We plan to accomplish this by developing and applying a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats. These challenges are unprecedented in high performance scientific computing and necessitate the development of a companion computational infrastructure to support this effort. More information about this project, including a copy of the original proposal, can be found at www.genomes-to-life.org Acknowledgment We want to gratefully acknowledge the contributions of the GTL Project Team as follows: Grant S. Heffelfinger1*, Anthony Martino2, Andrey Gorin3, Ying Xu10,3, Mark D. Rintoul1, Al Geist3, Matthew Ennis1, Hashimi Al-Hashimi8, Nikita Arnold3, Andrei Borziak3, Bianca Brahamsha6, Andrea Belgrano12, Praveen Chandramohan3, Xin Chen9, Pan Chongle3, Paul Crozier1, PguongAn Dam10, George S. Davidson1, Robert Day3, Jean Loup Faulon2, Damian Gessler12, Arlene Gonzalez2, David Haaland1, William Hart1, Victor Havin3, Tao Jiang9, Howland Jones1, David Jung3, Ramya Krishnamurthy3, Yooli Light2, Shawn Martin1, Rajesh Munavalli3, Vijaya Natarajan3, Victor Olman10, Frank Olken4, Brian Palenik6, Byung Park3, Steven Plimpton1, Diana Roe2, Nagiza Samatova3, Arie Shoshani4, Michael Sinclair1, Alex Slepoy1, Shawn Stevens8, Chris Stork1, Charlie Strauss5, Zhengchang Su10, Edward Thomas1, Jerilyn A. Timlin1, Xiufeng Wan11, HongWei Wu10, Dong Xu11, Gong-Xin Yu3, Grover Yip8, Zhaoduo Zhang2, Erik Zuiderweg8 *Author to whom correspondence should be addressed (gsheffe%40sandia.gov) 1. Sandia National Laboratories, Albuquerque, NM 2. Sandia National Laboratories, Livermore, CA 3. Oak Ridge National Laboratory, Oak Ridge, TN 4. Lawrence Berkeley National Laboratory, Berkeley, CA 5. Los Alamos National Laboratory, Los Alamos, NM 6. University of California, San Diego 7. University of Illinois, Urbana/Champaign 8. University of Michigan, Ann Arbor 9. University of California, Riverside 10. University of Georgia, Athens 11. University of Missouri, Columbia 12. National Center for Genome Resources, Santa Fe, NM Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

More Details

Genomes to life project quarterly report June 2004

Heffelfinger, Grant S.; Martino, Anthony M.; Rintoul, Mark D.

This SAND report provides the technical progress through June 2004 of the Sandia-led project, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling'', funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO{sub 2} are important terms in the global environmental response to anthropogenic atmospheric inputs of CO{sub 2} and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. The ultimate goal of this effort is develop and apply new experimental and computational methods needed to generate a new level of understanding of how the Synechococcus genome affects carbon fixation at the global scale. Anticipated experimental and computational methods will provide ever-increasing insight about the individual elements and steps in the carbon fixation process, however relating an organism's genome to its cellular response in the presence of varying environments will require systems biology approaches. Thus a primary goal for this effort is to integrate the genomic data generated from experiments and lower level simulations with data from the existing body of literature into a whole cell model. We plan to accomplish this by developing and applying a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats. These challenges are unprecedented in high performance scientific computing and necessitate the development of a companion computational infrastructure to support this effort.

More Details

The signature molecular descriptor: 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides

Journal of Molecular Graphics and Modelling

Churchwell, Carla J.; Rintoul, Mark D.; Martin, Shawn; Visco, Donald P.; Kotu, Archana; Larson, Richard S.; Sillerud, Laurel O.; Brown, David C.; Faulon, Jean L.

We present a methodology for solving the inverse-quantitative structure-activity relationship (QSAR) problem using the molecular descriptor called signature. This methodology is detailed in four parts. First, we create a QSAR equation that correlates the occurrence of a signature to the activity values using a stepwise multilinear regression technique. Second, we construct constraint equations, specifically the graphicality and consistency equations, which facilitate the reconstruction of the solution compounds directly from the signatures. Third, we solve the set of constraint equations, which are both linear and Diophantine in nature. Last, we reconstruct and enumerate the solution molecules and calculate their activity values from the QSAR equation. We apply this inverse-QSAR method to a small set of LFA-1/ICAM-1 peptide inhibitors to assist in the search and design of more-potent inhibitory compounds. Many novel inhibitors were predicted, a number of which are predicted to be more potent than the strongest inhibitor in the training set. Two of the more potent inhibitors were synthesized and tested in-vivo, confirming them to be the strongest inhibiting peptides to date. Some of these compounds can be recycled to train a new QSAR and develop a more focused library of lead compounds. © 2003 Elsevier Inc. All rights reserved.

More Details

Parallel tempering Monte Carlo in LAMMPS

Rintoul, Mark D.; Sears, Mark P.; Plimpton, Steven J.; Rintoul, Mark D.

We present here the details of the implementation of the parallel tempering Monte Carlo technique into a LAMMPS, a heavily used massively parallel molecular dynamics code at Sandia. This technique allows for many replicas of a system to be run at different simulation temperatures. At various points in the simulation, configurations can be swapped between different temperature environments and then continued. This allows for large regions of energy space to be sampled very quickly, and allows for minimum energy configurations to emerge in very complex systems, such as large biomolecular systems. By including this algorithm into an existing code, we immediately gain all of the previous work that had been put into LAMMPS, and allow this technique to quickly be available to the entire Sandia and international LAMMPS community. Finally, we present an example of this code applied to folding a small protein.

More Details

Detection and reconstruction of error control codes for engineered and biological regulatory systems

May, Elebeoba E.; May, Elebeoba E.; Johnston, Anna M.; Hart, William E.; Watson, Jean-Paul W.; Pryor, Richard J.; Rintoul, Mark D.

A fundamental challenge for all communication systems, engineered or living, is the problem of achieving efficient, secure, and error-free communication over noisy channels. Information theoretic principals have been used to develop effective coding theory algorithms to successfully transmit information in engineering systems. Living systems also successfully transmit biological information through genetic processes such as replication, transcription, and translation, where the genome of an organism is the contents of the transmission. Decoding of received bit streams is fairly straightforward when the channel encoding algorithms are efficient and known. If the encoding scheme is unknown or part of the data is missing or intercepted, how would one design a viable decoder for the received transmission? For such systems blind reconstruction of the encoding/decoding system would be a vital step in recovering the original message. Communication engineers may not frequently encounter this situation, but for computational biologists and biotechnologist this is an immediate challenge. The goal of this work is to develop methods for detecting and reconstructing the encoder/decoder system for engineered and biological data. Building on Sandia's strengths in discrete mathematics, algorithms, and communication theory, we use linear programming and will use evolutionary computing techniques to construct efficient algorithms for modeling the coding system for minimally errored engineered data stream and genomic regulatory DNA and RNA sequences. The objective for the initial phase of this project is to construct solid parallels between biological literature and fundamental elements of communication theory. In this light, the milestones for FY2003 were focused on defining genetic channel characteristics and providing an initial approximation for key parameters, including coding rate, memory length, and minimum distance values. A secondary objective addressed the question of determining similar parameters for a received, noisy, error-control encoded data set. In addition to these goals, we initiated exploration of algorithmic approaches to determine if a data set could be approximated with an error-control code and performed initial investigations into optimization based methodologies for extracting the encoding algorithm given the coding rate of an encoded noise-free and noisy data stream.

More Details

Carbon sequestration in Synechococcus Sp.: from molecular machines to hierarchical modeling

Proposed for publication in OMICS: A Journal of Integrative Biology, Vol. 6, No.4, 2002.

Heffelfinger, Grant S.; Faulon, Jean-Loup M.; Frink, Laura J.; Haaland, David M.; Hart, William E.; Lane, Todd L.; Heffelfinger, Grant S.; Plimpton, Steven J.; Roe, Diana C.; Timlin, Jerilyn A.; Martino, Anthony M.; Rintoul, Mark D.; Davidson, George S.

The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to ''achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life.'' While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling.'' This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.

More Details

Applications of Transport/Reaction Codes to Problems in Cell Modeling

Means, Shawn A.; Rintoul, Mark D.; Shadid, John N.; Rintoul, Mark D.

We demonstrate two specific examples that show how our exiting capabilities in solving large systems of partial differential equations associated with transport/reaction systems can be easily applied to outstanding problems in computational biology. First, we examine a three-dimensional model for calcium wave propagation in a Xenopus Laevis frog egg and verify that a proposed model for the distribution of calcium release sites agrees with experimental results as a function of both space and time. Next, we create a model of the neuron's terminus based on experimental observations and show that the sodium-calcium exchanger is not the route of sodium's modulation of neurotransmitter release. These state-of-the-art simulations were performed on massively parallel platforms and required almost no modification of existing Sandia codes.

More Details

Computational methods for coupling microstructural and micromechanical materials response simulations

Holm, Elizabeth A.; Wellman, Gerald W.; Battaile, Corbett C.; Buchheit, Thomas E.; Fang, H.E.; Rintoul, Mark D.; Glass, Sarah J.; Knorovsky, Gerald A.; Neilsen, Michael K.

Computational materials simulations have traditionally focused on individual phenomena: grain growth, crack propagation, plastic flow, etc. However, real materials behavior results from a complex interplay between phenomena. In this project, the authors explored methods for coupling mesoscale simulations of microstructural evolution and micromechanical response. In one case, massively parallel (MP) simulations for grain evolution and microcracking in alumina stronglink materials were dynamically coupled. In the other, codes for domain coarsening and plastic deformation in CuSi braze alloys were iteratively linked. this program provided the first comparison of two promising ways to integrate mesoscale computer codes. Coupled microstructural/micromechanical codes were applied to experimentally observed microstructures for the first time. In addition to the coupled codes, this project developed a suite of new computational capabilities (PARGRAIN, GLAD, OOF, MPM, polycrystal plasticity, front tracking). The problem of plasticity length scale in continuum calculations was recognized and a solution strategy was developed. The simulations were experimentally validated on stockpile materials.

More Details

A precise determination of the void percolation threshold for two distributions of overlapping spheres

Physical Review Letters

Rintoul, Mark D.

The void percolation threshold is calculated for a distribution of overlapping spheres with equal radii, and for a binary sized distribution of overlapping spheres, where half of the spheres have radii twice as large as the other half. Using systems much larger than previous work, the authors determine a much more precise value for the percolation thresholds and correlation length exponent. The values for the percolation thresholds are shown to be significantly different, in contrast with previous, less precise works that speculated that the threshold might be universal with respect to sphere size distribution.

More Details
63 Results
63 Results