|
| |||||
|---|---|---|---|---|---|
Brett W. Bader, W. Philip Kegelmeyer and Peter A. Chew. Multilingual Sentiment Analysis using Latent Semantic Indxing and Machine Learning. The 2011 International Conference on Data Mining Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (ICDM SENTIRE), December, 2011. [BibTeX] Abstract: We present a novel approach to predicting the sentiment of documents in multiple languages, without translation. The only prerequisite is a multilingual parallel corpus wherein a training sample of the documents, in a single language only, have been tagged with their overall sentiment. Latent Semantic Indexing (LSI) converts that multilingual corpus into a multilingual "concept space". New documents are then projected into that space, allowing cross-lingual semantic comparisons between the documents without the need for translation. Meanwhile, the training documents with known sentiment are used to build a machine learning model which can, because of the multilingual nature of the document projections, be used to predict sentiment in the other languages. We explain and evaluate the accuracy of this approach. We also design and conduct experiments to investigate the extent to which topic and sentiment separately contribute to that classification accuracy, and thereby shed some initial light on the question of whether topic and sentiment can be sensibly teased apart. |
|||||
BibTeX:
@inproceedings{ICDM11,
author = {Brett W. Bader and W. Philip Kegelmeyer and Peter A. Chew},
title = {Multilingual Sentiment Analysis using Latent Semantic Indxing and Machine Learning},
booktitle = {The 2011 International Conference on Data Mining Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (ICDM SENTIRE)},
month = {December},
year = {2011},
note = {To appear}
}
|
|||||
Peter A. Chew, Brett W. Bader, Stephen Helmreich, Ahmed Abdelali and Stephen J. Verzi. An information-theoretic, vector-space-model approach to cross-language information retrieval. Natural Language Engineering, 17(1):37-70, January, 2011. (doi:10.1017/S1351324910000185) [BibTeX] Abstract: In this article, we demonstrate several novel ways in which insights from information theory (IT) and computational linguistics (CL) can be woven into a vector-space-model (VSM) approach to information retrieval (IR). Our proposals focus, essentially, on three areas: pre-processing (morphological analysis), term weighting, and alternative geometrical models to the widely used term-by-document matrix. The latter include (1) PARAFAC2 decomposition of a term-by-document-by-language tensor, and (2) eigenvalue decomposition of a term-by-term matrix (inspired by Statistical Machine Translation). We evaluate all proposals, comparing them to a 'standard' approach based on Latent Semantic Analysis, on a multilingual document clustering task. The evidence suggests that proper consideration of IT within IR is indeed called for: in all cases, our best results are achieved using the information-theoretic variations upon the standard approach. Furthermore, we show that different information-theoretic options can be combined for still better results. A key function of language is to encode and convey information, and contributions of IT to the field of CL can be traced back a number of decades. We think that our proposals help bring IR and CL more into line with one another. In our conclusion, we suggest that the fact that our proposals yield empirical improvements is not coincidental given that they increase the theoretical transparency of VSM approaches to IR; on the contrary, they help shed light on why aspects of these approaches work as they do. © Cambridge University Press 2011 |
|||||
BibTeX:
@article{NLE11,
author = {Peter A. Chew and Brett W. Bader and Stephen Helmreich and Ahmed Abdelali and Stephen J. Verzi},
title = {An information-theoretic, vector-space-model approach to cross-language information retrieval},
month = {January},
journal = {Natural Language Engineering},
year = {2011},
volume = {17},
number = {1},
pages = {37-70},
doi = {10.1017/S1351324910000185}
}
|
|||||
Bernard Zak, Brett Bader, Ray Bambha, Hope Michelsen, Mark Boslough and Andy R. Jacobson. Reduction of Uncertainties in Remote Measurement of Greenhouse Gas Fluxes. IEEE 2010 Aerospace Conference, 2010. [BibTeX] Abstract: As the U.S. and the International Community come to grips with anthropogenic climate change, it will be necessary to develop accurate techniques with global span for remote measurement of emissions and uptake of greenhouse gases (GHGs), with special emphasis on carbon dioxide. Presently, techniques exist for in situ and local remote measurements. The first steps towards expansion of these techniques to span the world are only now being taken with the launch of satellites with the capability to accurately measure column abundances of selected GHGs, including carbon dioxide. These satellite sensors do not directly measure emissions and uptake. The satellite data, appropriately filtered and processed, provide only one necessary, but not sufficient, input for the determination of emission and uptake rates. Optimal filtering and processing is a challenge in itself. But these data must be further combined with output from data-assimilation models of atmospheric structure and flows in order to infer emission and uptake rates for relevant points and regions. In addition, it is likely that substantially more accurate determinations would be possible given the addition of data from a sparse network of in situ and/or upward-looking remote GHG sensors. We will present the most promising approaches we’ve found for combining satellite, in situ, fixed remote sensing, and other potentially available data with atmospheric data-assimilation and backward-dispersion models for the purpose of determination of point and regional GHG emission and uptake rates. We anticipate that the first application of these techniques will be to GHG management for the U.S. Subsequent application may be to confirmation of compliance of other nations with future international GHG agreements. |
|||||
BibTeX:
@inproceedings{IEEE-AC2010,
author = {Bernard Zak and Brett Bader and Ray Bambha and Hope Michelsen and Mark Boslough and Andy R. Jacobson},
title = {Reduction of Uncertainties in Remote Measurement of Greenhouse Gas Fluxes},
booktitle = {IEEE 2010 Aerospace Conference},
year = {2010}
}
|
|||||
Bernard D. Zak, Brett W. Bader, Ray Bambha, Mark B. E. Boslough, Hope A. Michelsen & Matthew W. Moorman. Reduction of Uncertainties in Remote Measurement of Emissions and Uptake of Greenhouse Gases. Technical Report SAND2009-8244, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, January, 2010. [BibTeX] |
|||||
BibTeX:
@techreport{SAND2009-8244,
author = {Bernard D. Zak and Brett W. Bader and Ray Bambha and Mark B. E. Boslough and Hope A. Michelsen and Matthew W. Moorman},
title = {Reduction of Uncertainties in Remote Measurement of Emissions and Uptake of Greenhouse Gases},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {January},
year = {2010},
number = {SAND2009-8244}
}
|
|||||
Philip Kegelmeyer (editor). Network Discovery, Characterization, and Prediction: A Grand Challenge LDRD Final Report. Sandia National Laboratories, Albuquerque, NM and Livermore, CA,, Technical Report SAND2010-8715, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December, 2010. [BibTeX] Abstract: This report is the final summation of Sandia's Grand Challenge LDRD project #119351, "Network Discovery, Characterization and Prediction" (the "NGC") which ran from FY08 to FY10. The aim of the NGC, in a nutshell, was to research, develop, and evaluate relevant analysis capabilities that address adversarial networks. Unlike some Grand Challenge efforts, that ambition created cultural subgoals, as well as technical and programmatic ones, as the insistence on "relevancy" required that the Sandia informatics research communities and the analyst user communities come to appreciate each others' needs and capabilities in a very deep and concrete way. The NGC generated a number of technical, programmatic, and cultural advances, detailed in this report. There were new algorithmic insights and research that resulted in fifty-three refereed publications and presentations; this report concludes with an abstract-annotated bibliography pointing to them all. The NGC generated three substantial prototypes that not only achieved their intended goals of testing our algorithmic integration, but which also served as vehicles for customer education and program development. The NGC, as intended, has catalyzed future work in this domain; by the end it had already brought in, in new funding, as much funding as had been invested in it. Finally, the NGC knit together previously disparate research staff and user expertise in a fashion that not only addressed our immediate research goals, but which promises to have created an enduring cultural legacy of mutual understanding, in service of Sandia's national security responsibilities in cybersecurity and counter proliferation. |
|||||
BibTeX:
@techreport{SAND2010-8715,
author = {Philip Kegelmeyer (editor)},
title = {Network Discovery, Characterization, and Prediction: A Grand Challenge LDRD Final Report},
address = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA,},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {December},
year = {2010},
number = {SAND2010-8715}
}
|
|||||
Brett W. Bader and Peter A. Chew. Algebraic Techniques for Multilingual Document Clustering. Text Mining: Applications and Theory, United Kingdom, Wiley, 2010. [BibTeX] © 2010, John Wiley and Sons, Ltd |
|||||
BibTeX:
@incollection{Wiley10,
author = {Brett W. Bader and Peter A. Chew},
title = {Algebraic Techniques for Multilingual Document Clustering},
booktitle = {Text Mining: Applications and Theory},
address = {United Kingdom},
publisher = {Wiley},
year = {2010}
}
|
|||||
Brett W. Bader. Constrained and unconstrained optimization. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, Volume 1, Elsevier, 2009. [BibTeX] Abstract: This chapter covers the basic concepts, theory, and algorithms of numerical optimization as well as its application to problems in chemometrics. Optimization is a very useful tool for data analysis. Indeed, quite frequently the methods used in standard chemometric data fitting problems are, at the heart, optimization techniques, though they may be disguised or described as something else. Hence, while optimization is at the core of many standard chemometric analysis techniques, its presence may not be immediately recognized. For instance, many data fitting techniques use algorithms for solving an optimization problem of minimizing the residual squared error. The goal of this chapter is to explain optimization theory and methods so that the practitioner has enough knowledge to understand the field, to know how to apply various optimization techniques, and to know where to go for further information on very difficult problems. This chapter starts with a general survey of the field of optimization and describes some motivating examples from chemometric data analysis. Then we delve into methods for solving unconstrained and constrained optimization problems, along with two popular strategies for global convergence. Finally, we end with a brief discussion of global optimization techniques and so-called direct search methods that do not require derivatives. © 2009 Elsevier |
|||||
BibTeX:
@incollection{Elsevier09,
author = {Brett W. Bader},
title = {Constrained and unconstrained optimization},
booktitle = {Comprehensive Chemometrics: Chemical and Biochemical Data Analysis},
publisher = {Elsevier},
year = {2009},
volume = {1}
}
|
|||||
Peter Chew, Brett Bader and Alla Rozovskaya. Using DEDICOM for Completely Unsupervised Part-of-speech Tagging. Proceedings of North American Association of Computational Linguistics, Human Language Technologies (NAACL-HLT) Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, 2009. [URL] [BibTeX] [older version] Abstract: A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schütze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of types with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each type, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging. |
|||||
BibTeX:
@inproceedings{NAACL09,
author = {Peter Chew and Brett Bader and Alla Rozovskaya},
title = {Using {DEDICOM} for Completely Unsupervised Part-of-speech Tagging},
booktitle = {Proceedings of North American Association of Computational Linguistics, Human Language Technologies (NAACL-HLT) Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics},
year = {2009},
url = {http://ngc.sandia.gov/assets/documents/naacl2009_final.pdf}
}
|
|||||
Charles Van Loan (editor). Future Directions in Tensor-Based Computation and Modeling. NSF Workshop Report, May, 2009. [URL] [BibTeX] Abstract: High-dimensional modeling is becoming ubiquitous across the sciences and engineering because of advances in sensor technology and storage technology. Computationally-oriented researchers no longer have to avoid what were once intractably large, tensor-structured data sets. The current NSF promotion of “computational thinking” is timely: we need a focused international effort to oversee the transition from matrix-based to tensor-based computational thinking. The successful problem-solving tools provided by the numerical linear algebra community need to be broadened and generalized. However, tensor-based research is not just matrix-based research with additional subscripts. Tensors are data ob jects in their own right and there is much to learn about their geometry and their connections to statistics and operator theory. This requires full participation of researchers from engineering, the natural sciences, and the information sciences, together with statisticians, mathematicians, numerical analysts, and software/language designers. Representatives from these disciplines participated in the Workshop. We believe that the NSF can help ensure the vitality of “big N” engineering and science by systematically supporting research in tensor-based computation and modeling. |
|||||
BibTeX:
@techreport{NSF09,
author = {Charles Van Loan (editor)},
title = {Future Directions in Tensor-Based Computation and Modeling},
institution = {NSF Workshop Report},
month = {May},
year = {2009},
url = {http://www.cs.cornell.edu/cv/TenWork/FinalReport.pdf}
}
|
|||||
Peter Chew, Brett Bader & Alla Rozovskaya. Using DEDICOM for Completely Unsupervised Part-of-speech Tagging. Technical Report SAND2009-0842, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, February, 2009. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2009-0842,
author = {Peter Chew and Brett Bader and Alla Rozovskaya},
title = {Using {DEDICOM} for Completely Unsupervised Part-of-speech Tagging},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {February},
year = {2009},
number = {SAND2009-0842}
}
|
|||||
Brett W. Bader, Michael W. Berry and Amy N. Langville. Text analysis using nonnegative matrix/tensor factorizations. Text Mining: Classification, Clustering, and Applications, Boca Raton, FL, Taylor-Francis, 2009. [BibTeX] © 2009 by Taylor and Francis Group, LLC |
|||||
BibTeX:
@incollection{TaylorFrancis09,
author = {Brett W. Bader and Michael W. Berry and Amy N. Langville},
title = {Text analysis using nonnegative matrix/tensor factorizations},
booktitle = {Text Mining: Classification, Clustering, and Applications},
address = {Boca Raton, FL},
publisher = {Taylor-Francis},
year = {2009}
}
|
|||||
Tamara G. Kolda and Brett W. Bader. Tensor Decompositions and Applications. SIAM Review, 51(3):455-500, August, 2009. (doi:10.1137/07070111X) [BibTeX] [older version] Abstract: This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N-way array. Decompositions of higher-order tensors (i.e., N-way arrays with N >= 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rankone tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors. Keywords: tensor decompositions, multiway arrays, multilinear algebra, parallel factors (PARAFAC), canonical decomposition (CANDECOMP), higher-order principal components analysis (Tucker), higher-order singular value decomposition (HOSVD) © 2009 Society for Industrial and Applied Mathematics |
|||||
BibTeX:
@article{TensorReview,
author = {Tamara G. Kolda and Brett W. Bader},
title = {Tensor Decompositions and Applications},
month = {August},
journal = {SIAM Review},
year = {2009},
volume = {51},
number = {3},
pages = {455--500},
doi = {10.1137/07070111X}
}
|
|||||
Brett W. Bader, Andrey Puretskiy and Michael W. Berry. Scenario detection using nonnegative tensor factorization. The 13th Iberoamerican Congress on Pattern Recognition (CIARP), 2008. [BibTeX] Abstract: In the relatively new field of visual analytics there is a great need for automated approaches to both verify and discover the intentions and schemes of primary actors through time. Data mining and knowledge discovery play critical roles in facilitating the ability to extract meaningful information from large and complex textual-based (digital) collections. In this study, we develop a mathematical strategy based on nonnegative tensor factorization (NTF) to extract and sequence important activities and specific events from sources such as news articles. The ability to automatically reconstruct a plot or confirm involvement in a questionable activity is greatly facilitated by our approach. As a variant of the PARAFAC multidimensional data model, we apply our NTF algorithm to the terrorism-based scenarios of the VAST 2007 Contest data set to demonstrate how term-by-entity associations can be used for scenario/plot discovery and evaluation. Keywords: nonnegative tensor factorization, PARAFAC, scenario discovery, VAST 2007, visual analytics |
|||||
BibTeX:
@inproceedings{CIARP08,
author = {Brett W. Bader and Andrey Puretskiy and Michael W. Berry},
title = {Scenario detection using nonnegative tensor factorization},
booktitle = {The 13th Iberoamerican Congress on Pattern Recognition (CIARP)},
year = {2008}
}
|
|||||
Peter Chew, Philip Kegelmeyer, Brett Bader and Amed Abdelali. The Knowledge of Good and Evil: Multilingual Ideology Classification with PARAFAC2 and Machine Learning. The 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) 2008, 2008. [BibTeX] |
|||||
BibTeX:
@inproceedings{CICLing08,
author = {Peter Chew and Philip Kegelmeyer and Brett Bader and Amed Abdelali},
title = {The Knowledge of Good and Evil: Multilingual Ideology Classification with PARAFAC2 and Machine Learning},
booktitle = {The 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) 2008},
year = {2008},
note = {also published in J. Language Forum, 2008}
}
|
|||||
Brett Bader and Peter Chew. Enhancing multilingual latent semantic analysis with term alignment information. The 22nd International Conference on Computational Linguistics (COLING), 2008. [BibTeX] Abstract: Latent Semantic Analysis (LSA) is based on the Singular Value Decomposition (SVD) of a term-by-document matrix for identifying relationships among terms and documents from co-occurrence patterns. Among the multiple ways of computing the SVD of a rectangular matrix X, one approach is to compute the eigenvalue decomposition (EVD) of a square 2 ×2composite matrix consisting of four blocks with X and X^T in the off-diagonal blocks and zero matrices in the diagonal blocks. We point out that significant value can be added to LSA by filling in some of the values in the diagonal blocks (corresponding to explicit term-to-term or document-to-document associations) and computing a term-by-concept matrix from the EVD. For the case of multilingual LSA, we incorporate information on cross-language term alignments of the same sort used in Statistical Machine Translation (SMT). Since all elements of the proposed EVD-based approach can rely entirely on lexical statistics, hardly any price is paid for the improved empirical results. In particular, the approach, like LSA or SMT, can still be generalized to virtually any language(s); computation of the EVD takes similar resources to that of the SVD since all the blocks are sparse; and the results of EVD are just as economical as those of SVD. |
|||||
BibTeX:
@inproceedings{COLING08,
author = {Brett Bader and Peter Chew},
title = {Enhancing multilingual latent semantic analysis with term alignment information},
booktitle = {The 22nd International Conference on Computational Linguistics (COLING)},
year = {2008}
}
|
|||||
Peter A. Chew, Brett W. Bader and Amed Abdelali. Latent morpho-semantic analysis: multilingual information retrieval with character n-grams and mutual information. The 22nd International Conference on Computational Linguistics (COLING), 2008. [BibTeX] Abstract: We describe an entirely statistics-based, unsupervised, and language-independent approach to multilingual information retrieval, which we call Latent Morpho-Semantic Analysis (LMSA). LMSA overcomes some of the shortcomings of related previous approaches such as Latent Semantic Analysis (LSA). LMSA has an important theoretical advantage over LSA: it combines well-known techniques in a novel way to break the terms of LSA down into units which correspond more closely to morphemes. Thus, it has a particular appeal for use with morphologically complex languages such as Arabic. We show through empirical results that the theoretical advantages of LMSA can translate into significant gains in precision in multilingual information retrieval tests. These gains are not matched either when a standard stemmer is used with LSA, or when terms are indiscriminately broken down into n-grams. |
|||||
BibTeX:
@inproceedings{COLING08b,
author = {Peter A. Chew and Brett W. Bader and Amed Abdelali},
title = {Latent morpho-semantic analysis: multilingual information retrieval with character n-grams and mutual information},
booktitle = {The 22nd International Conference on Computational Linguistics (COLING)},
year = {2008}
}
|
|||||
J. Dan Morrow, Brett Bader, Peter Chew and Ann Speed. Ideological determination using small amounts of text. The annual meeting of the ISA's 49th Annual Convention, Bridging Multiple Divides, San Francisco, CA, 2008. [URL] [BibTeX] Abstract: Sandia National Laboratories has been working on text-based methods for building cognitive models since 2003. This capability, called STANLEY (Sandia Text Analysis Extensible Library) provides mechanisms for quantitatively analyzing complex semantic relationships in large bodies of text. This basic capability then enables various analyses including comparisons of similarities between the semantic relationships in two or more bodies of text and assessments of how those relationships change over time and in relation to external events. The use of such as STANLEY can enable a human to identify trends and changes in large corpora of text that they would not have been able to do otherwise. This, in turn, can help a human analyst make better sense of documents being produced by people who are remote in time or space and are therefore not available for one-on-one interactions. It can also provide guidance or focus for other less quantitative methods of study. |
|||||
BibTeX:
@inproceedings{ISA08,
author = {J. Dan Morrow and Brett Bader and Peter Chew and Ann Speed},
title = {Ideological determination using small amounts of text},
booktitle = {The annual meeting of the ISA's 49th Annual Convention, Bridging Multiple Divides},
address = {San Francisco, CA},
year = {2008},
url = {http://www.allacademic.com/meta/p254519_index.html}
}
|
|||||
Tamara G. Kolda and Brett W. Bader. Multi-way Data Analysis and Applications (extended abstract). Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis, Number (SAND2008-6109), pp. 42-45, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, September, 2008. [BibTeX] |
|||||
BibTeX:
@incollection{SAND2008-6109,
author = {Tamara G. Kolda and Brett W. Bader},
title = {Multi-way Data Analysis and Applications (extended abstract)},
booktitle = {Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis},
month = {September},
publisher = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
year = {2008},
number = {SAND2008-6109},
pages = {42--45}
}
|
|||||
Brett W. Bader, Travis L. Bauer, Chris Beers, Peter Chew, J. Dan Morrow, Nicholoas Pattengale, Ann Speed, Brian Titus & Christina Warrender. Determining ideology shifts from adversary text. Technical Report SAND2008-6186, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, September, 2008. [BibTeX] |
|||||
BibTeX:
@techreport{SAND2008-6186,
author = {Brett W. Bader and Travis L. Bauer and Chris Beers and Peter Chew and J. Dan Morrow and Nicholoas Pattengale and Ann Speed and Brian Titus and Christina Warrender},
title = {Determining ideology shifts from adversary text},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {September},
year = {2008},
number = {SAND2008-6186}
}
|
|||||
Brett W. Bader, Michael W. Berry and Murray Browne. Discussion tracking in Enron email using PARAFAC. Survey of Text Mining II: Clustering, Classification, and Retrieval, London, Springer, 2008. (doi:10.1007/978-1-84800-046-9) [BibTeX] Abstract: In this chapter, we apply a non-negative tensor factorization algorithm to extract and detect meaningful discussions from electronic mail messages for a period of one year. For the publicly released Enron electronic mail collection, we encode a sparse term-author-month array for subsequent three-way factorization using the the PARAllel FACtors (or PARAFAC) three-way decomposition first proposed by Harshman. Using non-negative tensors, we preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in thread detection and interpretation are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting tensor factorizations can be used to produce Gantt-like charts that can be used to assess the duration, order, and dependencies of focused discussions against the progression of time. © Springer-Verlag London Limited 2008 |
|||||
BibTeX:
@incollection{Springer08,
author = {Brett W. Bader and Michael W. Berry and Murray Browne},
title = {Discussion tracking in {Enron} email using {PARAFAC}},
booktitle = {Survey of Text Mining II: Clustering, Classification, and Retrieval},
address = {London},
publisher = {Springer},
year = {2008},
doi = {10.1007/978-1-84800-046-9}
}
|
|||||
Brett W. Bader, Richard A. Harshman and Tamara G. Kolda. Temporal Analysis of Semantic Graphs using ASALSAN. ICDM 2007: Proceedings of the 7th IEEE International Conference on Data Mining, pp. 33-42, October, 2007. (doi:10.1109/ICDM.2007.54) [BibTeX] [older version] Abstract: ASALSAN is a new algorithm for computing three-way DEDICOM, which is a linear algebra model for analyzing intrinsically asymmetric relationships, such as trade among nations or the exchange of emails among individuals, that incorporates a third mode of the data, such as time. ASALSAN is unique because it enables computing the three-way DEDICOM model on large, sparse data. A nonnegative version of ASALSAN is described as well. When we apply these techniques to adjacency arrays arising from directed graphs with edges labeled by time, we obtain a smaller graph on latent semantic dimensions and gain additional information about their changing relationships over time. We demonstrate these techniques on international trade data and the Enron email corpus to uncover latent components and their transient behavior. The mixture of roles assigned to individuals by ASALSAN showed strong correspondence with known job classifications and revealed the patterns of communication between these roles. Changes in the communication pattern over time, e.g., between top executives and the legal department, were also apparent in the solutions. |
|||||
BibTeX:
@inproceedings{ICDM07,
author = {Brett W. Bader and Richard A. Harshman and Tamara G. Kolda},
title = {Temporal Analysis of Semantic Graphs using {ASALSAN}},
booktitle = {ICDM 2007: Proceedings of the 7th IEEE International Conference on Data Mining},
month = {October},
year = {2007},
pages = {33--42},
doi = {10.1109/ICDM.2007.54}
}
|
|||||
Peter A. Chew, Brett W. Bader, Tamara G. Kolda and Ahmed Abdelali. Cross-language information retrieval using PARAFAC2. KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 143-152, ACM Press, 2007. (doi:10.1145/1281192.1281211) [BibTeX] [older version] Abstract: (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar to other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in crosslanguage information retrieval. Keywords: Latent Semantic Analysis (LSA), information retrieval, multilingual, clustering, PARAFAC2 |
|||||
BibTeX:
@inproceedings{KDD2007,
author = {Peter A. Chew and Brett W. Bader and Tamara G. Kolda and Ahmed Abdelali},
title = {Cross-language information retrieval using {PARAFAC2}},
booktitle = {KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
publisher = {ACM Press},
year = {2007},
pages = {143--152},
doi = {10.1145/1281192.1281211}
}
|
|||||
Peter A. Chew, Brett W. Bader, Tamara G. Kolda & Ahmed Abdelali. Cross-language information retrieval using PARAFAC2. Technical Report SAND2007-2706, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, May, 2007. [URL] [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2007-2706,
author = {Peter A. Chew and Brett W. Bader and Tamara G. Kolda and Ahmed Abdelali},
title = {Cross-language information retrieval using {PARAFAC2}},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {May},
year = {2007},
number = {SAND2007-2706},
url = {http://www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2007/072706.pdf}
}
|
|||||
Tamara G. Kolda & Brett W. Bader. Tensor Decompositions and Applications. Technical Report SAND2007-6702, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, November, 2007. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2007-6702,
author = {Tamara G. Kolda and Brett W. Bader},
title = {Tensor Decompositions and Applications},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {November},
year = {2007},
number = {SAND2007-6702}
}
|
|||||
Brett W. Bader & Tamara G. Kolda. Final Report: Data Mining on Attributed Relationship Graphs. Technical Report SAND2007-8018, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December, 2007. [BibTeX] |
|||||
BibTeX:
@techreport{SAND2007-8018,
author = {Brett W. Bader and Tamara G. Kolda},
title = {Final Report: Data Mining on Attributed Relationship Graphs},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {December},
year = {2007},
number = {SAND2007-8018}
}
|
|||||
Brett W. Bader and Tamara G. Kolda. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 30(1):205-231, December, 2007. (doi:10.1137/060676489) [BibTeX] [older version] Abstract: In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB. Keywords: sparse multidimensional arrays; multilinear algebraic computations; tensor decompositions; Tucker model; parallel factors (PARAFAC) model; MATLAB classes; canonical decomposition (CANDECOMP) © 2007 Society for Industrial and Applied Mathematics |
|||||
BibTeX:
@article{SIAM-67648,
author = {Brett W. Bader and Tamara G. Kolda},
title = {Efficient {MATLAB} computations with sparse and factored tensors},
month = {December},
journal = {SIAM Journal on Scientific Computing},
year = {2007},
volume = {30},
number = {1},
pages = {205--231},
doi = {10.1137/060676489}
}
|
|||||
Brett W. Bader and Robert B. Schnabel. On the performance of tensor methods for solving ill-conditioned problems. SIAM J. Scientific Computing, 29(6):2329-2351, October, 2007. (doi:10.1137/040607745) [URL] [BibTeX] Abstract: This paper investigates the performance of tensor methods for solving small- to large-scale systems of nonlinear equations where the Jacobian matrix at the root is ill-conditioned or singular. This condition occurs on many classes of problems, such as identifying or approaching turning points in path-following problems. The singular case has been studied more than the highly ill-conditioned case, for both Newton and tensor methods. It is known that Newton-based methods do not work well with singular problems because they converge linearly to the solution and, in some cases, with poor accuracy. On the other hand, direct tensor methods have performed well on singular problems and have superlinear convergence on such problems under certain conditions. This behavior originates from the use of a special, restricted form of the second-order term included in the local tensor model that provides information lacking in a (nearly) singular Jacobian. With several implementations available for large-scale problems, tensor methods now are capable of solving larger problems. We compare the performance of tensor methods and Newton-based methods for small- to large-scale problems over a range of conditionings, from well-conditioned to ill-conditioned to singular. Previous studies with tensor methods concerned only the ends of this spectrum. Our results show that tensor methods are increasingly superior to Newton-based methods as the problem grows more ill-conditioned. Keywords: nonlinear equations, tensor methods, Newton's method, ill-conditioned problems, singular problems © 2007 Society for Industrial and Applied Mathematics |
|||||
BibTeX:
@article{SISC07,
author = {Brett W. Bader and Robert B. Schnabel},
title = {On the performance of tensor methods for solving ill-conditioned problems},
month = {October},
journal = {SIAM J. Scientific Computing},
year = {2007},
volume = {29},
number = {6},
pages = {2329-2351},
url = {http://siamdl.aip.org/dbt/dbt.jsp?KEY=SJOCE3&Volume=29&Issue=6},
doi = {10.1137/040607745}
}
|
|||||
Brett W. Bader, Michael W. Berry and Murray Browne. Discussion tracking in Enron email using PARAFAC. Text Mining 2007, Workshop at the SIAM International Conference on Data Mining, April, 2007. [BibTeX] [newer version] |
|||||
BibTeX:
@inproceedings{TMW07,
author = {Brett W. Bader and Michael W. Berry and Murray Browne},
title = {Discussion tracking in {Enron} email using {PARAFAC}},
booktitle = {Text Mining 2007, Workshop at the SIAM International Conference on Data Mining},
month = {April},
year = {2007}
}
|
|||||
Brett W. Bader and Tamara G. Kolda. Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software, 32(4):635-653, December, 2006. (doi:10.1145/1186785.1186794) [URL] [BibTeX] [older version] Abstract: Tensors (also known as multidimensional arrays or N-way arrays) are used in a variety of applications ranging from chemometrics to psychometrics. We describe four MATLAB classes for tensor manipulations that can be used for fast algorithm prototyping. The tensor class extends the functionality of MATLAB's multidimensional arrays by supporting additional operations such as tensor multiplication. The tensor_as_matrix class supports the "matricization" of a tensor, i.e., the conversion of a tensor to a matrix (and vice versa), a commonly used operation in many algorithms. Two additional classes represent tensors stored in decomposed formats: cp_tensor and tucker_tensor. We describe all of these classes and then demonstrate their use by showing how to implement several tensor algorithms that have appeared in the literature. Keywords: Higher-Order Tensors, Multilinear Algebra, N-Way Arrays, MATLAB |
|||||
BibTeX:
@article{ACM-TOMS-TENSORTOOLBOX,
author = {Brett W. Bader and Tamara G. Kolda},
title = {Algorithm 862: {MATLAB} tensor classes for fast algorithm prototyping},
month = {December},
journal = {ACM Transactions on Mathematical Software},
year = {2006},
volume = {32},
number = {4},
pages = {635--653},
url = {http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/ACM-TOMS-TensorToolbox.pdf},
doi = {10.1145/1186785.1186794}
}
|
|||||
Brett W. Bader, Richard Harshman & Tamara G. Kolda. Temporal analysis of social networks using three-way DEDICOM. Technical Report SAND2006-2161, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, April, 2006. [URL] [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2006-2161,
author = {Brett W. Bader and Richard Harshman and Tamara G. Kolda},
title = {Temporal analysis of social networks using three-way {DEDICOM}},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {April},
year = {2006},
number = {SAND2006-2161},
url = {http://www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2006/062161.pdf}
}
|
|||||
Brett W. Bader & Tamara G. Kolda. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December, 2006. [URL] [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2006-7592,
author = {Brett W. Bader and Tamara G. Kolda},
title = {Efficient {MATLAB} computations with sparse and factored tensors},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {December},
year = {2006},
number = {SAND2006-7592},
url = {http://www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2006/067592.pdf}
}
|
|||||
Brett W. Bader, Richard Harshman & Tamara G. Kolda. Pattern Analysis of Directed Graphs Using DEDICOM: An Application to Enron Email. Technical Report SAND2006-7744, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December, 2006. [URL] [BibTeX] [newer version] [older version] |
|||||
BibTeX:
@techreport{SAND2006-7744,
author = {Brett W. Bader and Richard Harshman and Tamara G. Kolda},
title = {Pattern Analysis of Directed Graphs Using {DEDICOM}: An Application to Enron Email},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {December},
year = {2006},
number = {SAND2006-7744},
url = {http://www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2006/067744.pdf}
}
|
|||||
Tamara Kolda and Brett Bader. The TOPHITS model for higher-order web link analysis. Proceedings of the SIAM Data Mining Conference Workshop on Link Analysis, Counterterrorism and Security, 2006. [URL] [BibTeX] Abstract: As the size of the web increases, it becomes more and more important to analyze link structure while also considering context. Multilinear algebra provides a novel tool for incorporating anchor text and other information into the authority computation used by link analysis methods such as HITS. Our recently proposed TOPHITS method uses a higher-order analogue of the matrix singular value decomposition called the PARAFAC model to analyze a three-way representation of web data. We compute hubs and authorities together with the terms that are used in the anchor text of the links between them. Adding a third dimension to the data greatly extends the applicability of HITS because the TOPHITS analysis can be performed in advance and offline. Like HITS, the TOPHITS model reveals latent groupings of pages, but TOPHITS also includes latent term information. In this paper, we describe a faster mathematical algorithm for computing the TOPHITS model on sparse data, and Web data is used to compare HITS and TOPHITS. We also discuss how the TOPHITS model can be used in queries, such as computing context-sensitive authorities and hubs. We describe different query response methodologies and present experimental results. Keywords: PARAFAC, multilinear algebra, link analysis, higher-order SVD |
|||||
BibTeX:
@inproceedings{SDM06-LACS,
author = {Tamara Kolda and Brett Bader},
title = {The {TOPHITS} model for higher-order web link analysis},
booktitle = {Proceedings of the SIAM Data Mining Conference Workshop on Link Analysis, Counterterrorism and Security},
year = {2006},
url = {http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/21Tamara_Kolda_SIAMLACS.pdf}
}
|
|||||
K. Willcox, O. Ghattas, B. van Bloemen Waanders and B. Bader. An Optimization framework for goal-oriented, model-based reduction of large-scale systems. Proceedings of the 44th IEEE Conference on Decision and Control and European Control Conference, Seville, Spain, December, 2005. (doi:10.1109/CDC.2005.1582499) [URL] [BibTeX] Abstract: Optimization-ready reduced-order models should target a particular output functional, span an applicable range of dynamic and parametric inputs, and respect the underlying governing equations of the system. To achieve this goal, we present an approach for determining a projection basis that uses a goal-oriented, model-based optimization framework. The mathematical framework permits consideration of general dynamical systems with general parametric variations. The methodology is applicable to both linear and nonlinear systems and to systems with many input parameters. This paper focuses on an initial presentation and demonstration of the methodology on a simple linear model problem of the two-dimensional, time-dependent heat equation with a small number of inputs. For this example, the reduced models determined by the new approach provide considerable improvement over those derived using the proper orthogonal decomposition. |
|||||
BibTeX:
@inproceedings{CDC05,
author = {K. Willcox and O. Ghattas and B. van Bloemen Waanders and B. Bader},
title = {An Optimization framework for goal-oriented, model-based reduction of large-scale systems},
booktitle = {Proceedings of the 44th IEEE Conference on Decision and Control and European Control Conference},
address = {Seville, Spain},
month = {December},
year = {2005},
url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1582499},
doi = {10.1109/CDC.2005.1582499}
}
|
|||||
Tamara G. Kolda, Brett W. Bader and Joseph P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 242-249, November, 2005. (doi:10.1109/ICDM.2005.77) [BibTeX] [older version] Abstract: Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular non-negative matrix that captures the hyperlink structure of the web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higher-order representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, three-way tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank-1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of the SVD, to automatically identify topics in the collection along with the associated authoritative web pages. |
|||||
BibTeX:
@inproceedings{ICDM05,
author = {Tamara G. Kolda and Brett W. Bader and Joseph P. Kenny},
title = {Higher-Order Web Link Analysis Using Multilinear Algebra},
booktitle = {ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining},
month = {November},
year = {2005},
pages = {242--249},
doi = {10.1109/ICDM.2005.77}
}
|
|||||
Tamara G. Kolda, Brett W. Bader & Joseph P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. Technical Report SAND2005-4548, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, July, 2005. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2005-4548,
author = {Tamara G. Kolda and Brett W. Bader and Joseph P. Kenny},
title = {Higher-Order Web Link Analysis Using Multilinear Algebra},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {July},
year = {2005},
number = {SAND2005-4548}
}
|
|||||
Brett W. Bader, Roger P. Pawlowski & Tamara G. Kolda. Robust large-scale parallel nonlinear solvers for simulations. Technical Report SAND2005-6864, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, November, 2005. [URL] [BibTeX] Abstract: This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence. |
|||||
BibTeX:
@techreport{SAND2005-6864,
author = {Brett W. Bader and Roger P. Pawlowski and Tamara G. Kolda},
title = {Robust large-scale parallel nonlinear solvers for simulations},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {November},
year = {2005},
number = {SAND2005-6864},
url = {http://www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2005/056864.pdf}
}
|
|||||
Brett W. Bader. Tensor-Krylov methods for solving large-scale systems of nonlinear equations. SIAM J. Numerical Analysis, 43(3):1321-1347, 2005. (doi:10.1137/040607095) [URL] [BibTeX] Abstract: This paper develops and investigates iterative tensor methods for solving large-scale systems of nonlinear equations. Direct tensor methods for nonlinear equations have performed especially well on small, dense problems where the Jacobian matrix at the solution is singular or ill-conditioned, which may occur when approaching turning points, for example. This research extends direct tensor methods to large-scale problems by developing three tensor-Krylov methods that base each iteration upon a linear model augmented with a limited second-order term, which provides information lacking in a (nearly) singular Jacobian. The advantage of the new tensor-Krylov methods over existing large-scale tensor methods is their ability to solve the local tensor model to a specified accuracy, which produces a more accurate tensor step. The performance of these methods in comparison to Newton-GMRES and tensor-GMRES is explored on three Navier-Stokes fluid flow problems. The numerical results provide evidence that tensor-Krylov methods are generally more robust and more efficient than Newton-GMRES on some important and difficult problems. In addition, the results show that the new tensor-Krylov methods and tensor-GMRES each perform better in certain situations. Keywords: nonlinear systems, tensor methods, Newton's method, linesearch, curvilinear linesearch, Krylov subspace methods |
|||||
BibTeX:
@article{TensorKrylov,
author = {Brett W. Bader},
title = {Tensor-{K}rylov methods for solving large-scale systems of nonlinear equations},
journal = {SIAM J. Numerical Analysis},
year = {2005},
volume = {43},
number = {3},
pages = {1321-1347},
note = {Appeared as Technical Report SAND2004-1837 (August 2004)},
url = {http://epubs.siam.org/SINUM/volume-43/art_60709.html},
doi = {10.1137/040607095}
}
|
|||||
Elizabeth Eskow, Brett Bader, Richard Byrd, Vincent Lamberti, Silvia Crivelli, Teresa Head-Gordon and Robert Schnabel. An Optimization approach to the problem of protein structure prediction. Mathematical Programming, 101(3):497-514, 2004. [BibTeX] Abstract: We describe a large-scale, stochastic-perturbation global optimization algorithm used for determining the structure of proteins. The method incorporates secondary structure predictions (which describe the more basic elements of the protein structure) into the starting structures, and thereafter minimizes using a purely physics-based energy model. Results show this method to be particularly successful on protein targets where structural information from similar proteins is unavailable, i.e., the most difficult targets for most protein structure prediction methods. Our best result to date is on a protein target containing over 4000 atoms and 12,000 cartesian coordinates. |
|||||
BibTeX:
@article{MP04,
author = {Elizabeth Eskow and Brett Bader and Richard Byrd and Vincent Lamberti and Silvia Crivelli and Teresa Head-Gordon and Robert Schnabel},
title = {An Optimization approach to the problem of protein structure prediction},
journal = {Mathematical Programming},
year = {2004},
volume = {101},
number = {3},
pages = {497-514}
}
|
|||||
Brett W. Bader. Tensor-Krylov methods for solving large-scale systems of nonlinear equations. Technical Report SAND2004-1837, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, August, 2004. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2004-1837,
author = {Brett W. Bader},
title = {Tensor-{K}rylov methods for solving large-scale systems of nonlinear equations},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {August},
year = {2004},
number = {SAND2004-1837}
}
|
|||||
Brett W. Bader & Robert B. Schnabel. On the performance of tensor methods for solving ill-conditioned problems. Technical Report SAND2004-1944, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, September, 2004. [URL] [BibTeX] [newer version] Abstract: This paper investigates the performance of tensor methods for solving small- and large-scale systems of nonlinear equations where the Jacobian matrix at the root is ill-conditioned or singular. This condition occurs on many classes of problems, such as identifying or approaching turning points in path following problems. The singular case has been studied more than the highly ill-conditioned case, for both Newton and tensor methods. It is known that Newton-based methods do not work well with singular problems because they converge linearly to the solution and, in some cases, with poor accuracy. On the other hand, direct tensor methods have performed well on singular problems and have superlinear convergence on such problems under certain conditions. This behavior originates from the use of a special, restricted form of the second-order term included in the local tensor model that provides information lacking in a (nearly) singular Jacobian. With several implementations available for large-scale problems, tensor methods now are capable of solving larger problems. We compare the performance of tensor methods and Newton-based methods for both small- and large-scale problems over a range of conditionings, from well-conditioned to ill-conditioned to singular. Previous studies with tensor methods only concerned the ends of this spectrum. Our results show that tensor methods are increasingly superior to Newton-based methods as the problem grows more ill-conditioned. |
|||||
BibTeX:
@techreport{SAND2004-1944,
author = {Brett W. Bader and Robert B. Schnabel},
title = {On the performance of tensor methods for solving ill-conditioned problems},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {September},
year = {2004},
number = {SAND2004-1944},
url = {http://www.sandia.gov/news-center/resources/tech-library/index.html}
}
|
|||||
Brett W. Bader & Tamara G. Kolda. A preliminary report on the development of MATLAB tensor classes for fast algorithm prototyping. Technical Report SAND2004-3487, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, July, 2004. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2004-3487,
author = {Brett W. Bader and Tamara G. Kolda},
title = {A preliminary report on the development of {MATLAB} tensor classes for fast algorithm prototyping},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {July},
year = {2004},
number = {SAND2004-3487}
}
|
|||||
Brett W. Bader & Tamara G. Kolda. MATLAB tensor classes for fast algorithm prototyping. Technical Report SAND2004-5187, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, October, 2004. [BibTeX] [newer version] |
|||||
BibTeX:
@techreport{SAND2004-5187,
author = {Brett W. Bader and Tamara G. Kolda},
title = {{MATLAB} tensor classes for fast algorithm prototyping},
institution = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
month = {October},
year = {2004},
number = {SAND2004-5187}
}
|
|||||
John G. Aunins, Brett Bader, Anthony Caola, Janet Griffiths, Maayan Katz, Peter Licari, Kripa Ram, Colette S. Ranucci and Weichang Zhou. Fluid mechanics, cell distribution, and environment in CellCube bioreactors. Biotechnology Progress, 19(1):2-8, 2003. (doi:10.1021/bp0256521) [URL] [BibTeX] Abstract: Cultivation of MRC-5 cells and attenuated hepatitis A virus (HAV) for the production of VAQTA, an inactivated HAV vaccine (1), is performed in the CellCube reactor, a laminar flow fixed-bed bioreactor with an unusual diamond-shaped, diverging-converging flow geometry. These disposable bioreactors have found some popularity for the production of cells and gene therapy vectors at intermediate scales of operation (2, 3). Early testing of the CellCube revealed that the fluid mechanical environment played a significant role in nonuniform cell distribution patterns generated during the cell growth phase. Specifically, the reactor geometry and manufacturing artifacts, in combination with certain inoculum practices and circulation flow rates, can create cell growth behavior that is not simply explained. Via experimentation and computational fluid dynamics simulations we can account for practically all of the observed cell growth behavior, which appears to be due to a complex mixture of flow distribution, particle deposition under gravity, fluid shear, and possibly nutritional microenvironment. |
|||||
BibTeX:
@article{BioTechProgress03,
author = {John G. Aunins and Brett Bader and Anthony Caola and Janet Griffiths and Maayan Katz and Peter Licari and Kripa Ram and Colette S. Ranucci and Weichang Zhou},
title = {Fluid mechanics, cell distribution, and environment in CellCube bioreactors},
journal = {Biotechnology Progress},
year = {2003},
volume = {19},
number = {1},
pages = {2-8},
url = {http://onlinelibrary.wiley.com/doi/10.1021/btpr.v19:1/issuetoc},
doi = {10.1021/bp0256521}
}
|
|||||
Brett W. Bader and Robert B. Schnabel. Curvilinear linesearch for tensor methods. SIAM J. Scientific Computing, 25(2):604-622, 2003. (doi:10.1137/S1064827502406658) [URL] [BibTeX] Abstract: This paper presents a curvilinear linesearch for use when solving nonlinear equations with tensor methods. Standard tensor methods use a combination of the tensor step and Newton step in a linesearch, but they are handled separately and in an ad hoc manner. Our curvilinear linesearch combines the two directions in a single parametric step, which guarantees a monotonic decrease on the tensor model and also asymptotically approaches the Newton direction as the step length shrinks to zero, thus guaranteeing descent on the nonlinear equations. Numerical experiments on a set of 35 small-scale problems, drawn primarily from the CUTE collection, show an 18-23% improvement (in terms of function evaluations) over previous tensor linesearches when using quadratic backtracking and a 41-83% improvement when halving $ before each trial. Our results also suggest that the curvilinear linesearch is more robust than linesearch-based Newton's method or the standard tensor linesearch, producing fewer failures than either method. Experiments on two large-scale fluid flow problems complement the small-scale results and give preliminary indication of the applicability of the tensor method and the efficiency of the curvilinear linesearch. The theoretical properties, coupled with the better performance, make this a desirable improvement over previous tensor method linesearches. |
|||||
BibTeX:
@article{SISC03,
author = {Brett W. Bader and Robert B. Schnabel},
title = {Curvilinear linesearch for tensor methods},
journal = {SIAM J. Scientific Computing},
year = {2003},
volume = {25},
number = {2},
pages = {604-622},
url = {http://siamdl.aip.org/dbt/dbt.jsp?KEY=SJOCE3&Volume=25&Issue=2},
doi = {10.1137/S1064827502406658}
}
|
|||||
Silvia Crivelli, Elizabeth Eskow, Brett Bader, Vincent Lamberti, Richard Byrd, Robert Schnabel and Teresa Head-Gordon. A Physical approach to protein structure prediction. Biophysical J., 82(1):36-49, January, 2002. (doi:10.1016/S0006-3495(02)75372-1) [URL] [BibTeX] Abstract: We describe our global optimization method called Stochastic Perturbation with Soft Constraints (SPSC), which uses information from known proteins to predict secondary structure, but not in the tertiary structure predictions or in generating the terms of the physics-based energy function. Our approach is also characterized by the use of an all atom energy function that includes a novel hydrophobic solvation function derived from experiments that shows promising ability for energy discrimination against misfolded structures. We present the results obtained using our SPSC method and energy function for blind prediction in the 4th Critical Assessment of Techniques for Protein Structure Prediction competition, and show that our approach is more effective on targets for which less information from known proteins is available. In fact our SPSC method produced the best prediction for one of the most difficult targets of the competition, a new fold protein of 240 amino acids. |
|||||
BibTeX:
@article{BiophysJ02,
author = {Silvia Crivelli and Elizabeth Eskow and Brett Bader and Vincent Lamberti and Richard Byrd and Robert Schnabel and Teresa Head-Gordon},
title = {A Physical approach to protein structure prediction},
month = {January},
journal = {Biophysical J.},
year = {2002},
volume = {82},
number = {1},
pages = {36-49},
url = {http://www.biophysj.org/cgi/content/abstract/82/1/36},
doi = {10.1016/S0006-3495(02)75372-1}
}
|
|||||
Margaret M. Whalen, Rashmi N. Doshi, Brett W. Bader and Arthur D. Bankhurst. Lysophosphatidylcholine and arachidonic acid are required in the cytotoxic response of human natural killer cells to tumor target cells. Cellular Physiology and Biochemistry, 9(6):297-309, 1999. (doi:10.1159/000016324) [URL] [BibTeX] Abstract: Treatment of human natural killer (NK) cells with phospholipase A2 (PLA2) inhibitors, mepacrine and 4-bromophenacyl bromide (BPB), diminished their ability to lyse K562 target cells by as much as 100%. The ability of NK cells to bind to K562 cells was significantly affected by BPB above 2 µM, but not by mepacrine at any concentration tested. This indicates that BPB is having effects on NK cells unrelated to its inhibition of PLA2 activity at concentrations above 2 µM. The activation of phospholipase C in response to K562 cell binding (as measured by inositol phosphate turnover) was unaffected by inhibition of the PLA2 activity. The products of PLA2 catabolism are a fatty acid (often arachidonic acid) and a lysophospholipid. Inhibition of NK cytotoxicity by mepacrine or BPB is reversed significantly when lysophosphatidylcholine, but no other lysolipid, is added back to the NK cells before assaying for cytotoxicity. Arachidonic acid, but not linoleic acid, also significantly reverses inhibition of NK cytotoxicity. Finally, the 15-lipoxygenase product, 15S-hydroperoxyeicosatetraenoic acid (15S-HPETE), is also able to reverse mepacrine-induced inhibition of NK cytotoxicity. The 5-lipoxygenase product 5S-HPETE was not effective. These data indicate that PLA2 activation is a necessary signal in human NK cytotoxicity and that it is not involved in protein tyrosine kinase and subsequent phospholipase C activation; these latter two enzymes are also required in the cytotoxic response. Thus PLA2 activation is either a more distal signal, dependent on activation of some earlier signal, or an independent cosignal stimulated by tumor-target binding which generates lysophosphatidylcholine, arachidonic acid, and/or a lipoxygenase product(s). Keywords: Lipids, Natural cytotoxicity, Arachidonic acid, Lysophosphatidylcholine, Natural killer cell |
|||||
BibTeX:
@article{CellPhysBio99,
author = {Margaret M. Whalen and Rashmi N. Doshi and Brett W. Bader and Arthur D. Bankhurst},
title = {Lysophosphatidylcholine and arachidonic acid are required in the cytotoxic response of human natural killer cells to tumor target cells},
journal = {Cellular Physiology and Biochemistry},
year = {1999},
volume = {9},
number = {6},
pages = {297-309},
url = {http://www.karger.com/journals/cpb/cpb_jh.htm},
doi = {10.1159/000016324}
}
|
|||||
Created by JabRef on 02/10/2011.