In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.
Molecular analysis of cancer, at the genomic level, could lead to individualized patient diagnostics and treatments. The developments to follow will signal a significant paradigm shift in the clinical management of human cancer. Despite our initial hopes, however, it seems that simple analysis of microarray data cannot elucidate clinically significant gene functions and mechanisms. Extracting biological information from microarray data requires a complicated path involving multidisciplinary teams of biomedical researchers, computer scientists, mathematicians, statisticians, and computational linguists. The integration of the diverse outputs of each team is the limiting factor in the progress to discover candidate genes and pathways associated with the molecular biology of cancer. Specifically, one must deal with sets of significant genes identified by each method and extract whatever useful information may be found by comparing these different gene lists. Here we present our experience with such comparisons, and share methods developed in the analysis of an infant leukemia cohort studied on Affymetrix HG-U95A arrays. In particular, spatial gene clustering, hyper-dimensional projections, and computational linguistics were used to compare different gene lists. In spatial gene clustering, different gene lists are grouped together and visualized on a three-dimensional expression map, where genes with similar expressions are co-located. In another approach, projections from gene expression space onto a sphere clarify how groups of genes can jointly have more predictive power than groups of individually selected genes. Finally, online literature is automatically rearranged to present information about genes common to multiple groups, or to contrast the differences between the lists. The combination of these methods has improved our understanding of infant leukemia. While the complicated reality of the biology dashed our initial, optimistic hopes for simple answers from microarrays, we have made progress by combining very different analytic approaches.
High throughput instruments and analysis techniques are required in order to make good use of the genomic sequences that have recently become available for many species, including humans. These instruments and methods must work with tens of thousands of genes simultaneously, and must be able to identify the small subsets of those genes that are implicated in the observed phenotypes, or, for instance, in responses to therapies. Microarrays represent one such high throughput method, which continue to find increasingly broad application. This project has improved microarray technology in several important areas. First, we developed the hyperspectral scanner, which has discovered and diagnosed numerous flaws in techniques broadly employed by microarray researchers. Second, we used a series of statistically designed experiments to identify and correct errors in our microarray data to dramatically improve the accuracy, precision, and repeatability of the microarray gene expression data. Third, our research developed new informatics techniques to identify genes with significantly different expression levels. Finally, natural language processing techniques were applied to improve our ability to make use of online literature annotating the important genes. In combination, this research has improved the reliability and precision of laboratory methods and instruments, while also enabling substantially faster analysis and discovery.
The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to ''achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life.'' While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling.'' This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.
Collaboration between Sandia National Laboratories and the University of New Mexico Biology Department resulted in the capability to train students in microarray techniques and the interpretation of data from microarray experiments. These studies provide for a better understanding of the role of stationary phase and the gene regulation involved in exit from stationary phase, which may eventually have important clinical implications. Importantly, this research trained numerous students and is the basis for three new Ph.D. projects.
We present the application of a new knowledge visualization tool, VxInsight, to the mapping and analysis of patent databases. Patent data are mined and placed in a database, relationships between the patents are identified, primarily using the citation and classification structures, then the patents are clustered using a proprietary force-directed placement algorithm. Related patents cluster together to produce a 3-D landscape view of the tens of thousands of patents. The user can navigate the landscape by zooming into or out of regions of interest. Querying the underlying database places a colored marker on each patent matching the query. Automatically generated labels, showing landscape content, update continually upon zooming. Optionally, citation links between patents may be shown on the landscape. The combination of these features enables powerful analyses of patent databases.
VxInsight provides a visual mechanism for browsing, exploring and retrieving information from a database. The graphical display conveys information about the relationship between objects in several ways and on multiple scales. In this way, individual objects are always observed within a larger context. For example, consider a database consisting of a set of scientific papers. Imagine that the papers have been organized in a two dimensional geometry so that related papers are located close to each other. Now construct a landscape where the altitude reflects the local density of papers. Papers on physics will form a mountain range, and a different range will stand over the biological papers. In between will be research reports from biophysics and other bridging disciplines. Now, imagine exploring these mountains. If we zoom in closer, the physics mountains will resolve into a set of sub-disciplines. Eventually, by zooming in far enough, the individual papers become visible. By pointing and clicking you can learn more about papers of interest or retrieve their full text. Although physical proximity conveys a great deal of information about the relationship between documents, you can also see which papers reference which others, by drawing lines between the citing and cited papers. For even more information, you can choose to highlight papers by a particular researcher or a particular institution, or show the accumulation of papers through time, watching some disciplines explode and other stagnate. VxInsight is a general purpose tool, which enables this kind of interaction with wide variety of relational data: documents, patents, web pages, and financial transactions are just a few examples. The tool allows users to interactively browse, explore and retrieve information from the database in an intuitive way.
In the last decade there has been interest and research in the area of designing circuits with genetic algorithms, evolutionary algorithms, and genetic programming. However, the ability to design circuits of the size and complexity required by modern engineering design problems, simply by specifying required outputs for given inputs has as yet eluded researchers. This paper describes current research in the area of designing logic circuits using an evolutionary algorithm. The goal of the research is to improve the effectiveness of this method and make it a practical aid for design engineers. A novel method of implementing the algorithm is introduced, and results are presented for various multiprocessing systems. In addition to evolving standard arithmetic circuits, work in the area of evolving circuits that perform digital signal processing tasks is described.
The emerging field of haptics represents a fundamental change in human-computer interaction (HCI), and presents solutions to problems that are difficult or impossible to solve with a two-dimensional, mouse-based interface. To take advantage of the potential of haptics, however, innovative interaction techniques and programming environments are needed. This paper describes FGB (FLIGHT GHUI Builder), a programming tool that can be used to create an application specific graphical and haptic user interface (GHUI). FGB is itself a graphical and haptic user interface with which a programmer can intuitively create and manipulate components of a GHUI in real time in a graphical environment through the use of a haptic device. The programmer can create a GHUI without writing any programming code. After a user interface is created, FGB writes the appropriate programming code to a file, using the FLIGHT API, to recreate what the programmer created in the FGB interface. FGB saves programming time and increases productivity, because a programmer can see the end result as it is created, and FGB does much of the programming itself. Interestingly, as FGB was created, it was used to help build itself. The further FGB was in its development, the more easily and quickly it could be used to create additional functionality and improve its own design. As a finished product, FGB can be used to recreate itself in much less time than it originally required, and with much less programming. This paper describes FGB's GHUI components, the techniques used in the interface, how the output code is created, where programming additions and modifications should be placed, and how it can be compared to and integrated with existing API's such as MFC and Visual C++, OpenGL, and GHOST.
Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTA parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.
This research project used Pad++, a user interface originally developed by Jim Holland. The objective was to explore the utility of that user interface to large databases of information such as those found on the World Wide Web. A web browser based on Pad++ was developed in the first year of this project The first year results, including the human factors were documented in a video and were presented at a SNL-wide seminar. The second year of this research project focused on applying the results of the first year research. The work in the second year involves using Pad++ as a basis for tools to manage large complicated web sites. Pad++ is ideally suited to this complex activity. A prototype was developed, which presents Web relationships in 3D hyperspace, following research from the Geometry Center at the University of Minnesota. Various human factors studies were completed, which indicate Pad++ web browsers allow users to comprehend 23% faster than when using Netscape.