Factor analysis has proven an effective approach for distilling high dimensional spectral-image data into a limited number of components that describe the spatial and spectral characteristics of the imaged sample. Principal Component Analysis (PCA) is the most commonly used factor analysis tool; however, PCA constrains both the spectral and abundance factors to be orthogonal, and forces the components to serially maximize the variance that each accounts for. Neither constraint has any basis in physical reality; thus, principal components are abstract and not easily interpreted. The mathematical properties of PCA scores and loadings also differ subtly, which has implications for how they can be used in abstract factor 'rotation' procedures such as Varimax. The Singular Value Decomposition (SVD) is a mathematical technique that is frequently used to compute PCA. In this talk, we will argue that SVD itself provides a more flexible framework for spectral image analysis since spatial-domain and spectral-domain singular vectors are treated in a symmetrical fashion. We will also show that applying an abstract rotation in our choice of either the spatial or spectral domain relaxes the orthogonality requirement in the complementary domain. For instance, samples are often approximately orthogonal in a spatial sense, that is, they consist of relatively discrete chemical phases. In such cases, rotating the singular vectors in a way designed to maximize the simplicity of the spatial representation yields physically acceptable and readily interpretable estimates of the pure-component spectra. This talk will demonstrate that this approach can achieve excellent results for difficult-to-analyze data sets obtained by a variety of spectroscopic imaging techniques.
This short-term, late-start LDRD examined the effects of nutritional deprivation on the energy harvesting complex in microalgae. While the original experimental plan involved a much more detailed study of temperature and nutrition on the antenna system of a variety of TAG producing algae and their concomitant effects on oil production, time and fiscal constraints limited the scope of the study. This work was a joint effort between research teams at Sandia National Laboratories, New Mexico and California. Preliminary results indicate there is a photosystem response to silica starvation in diatoms that could impact the mechanisms for lipid accumulation.
We describe a method of performing trilinear analysis on large data sets using a modification of the PARAFAC-ALS algorithm. Our method iteratively decomposes the data matrix into a core matrix and three loading matrices based on the Tuckerl model. The algorithm is particularly useful for data sets that are too large to upload into a computer's main memory. While the performance advantage in utilizing our algorithm is dependent on the number of data elements and dimensions of the data array, we have seen a significant performance improvement over operating PARAFAC-ALS on the full data set. In one case of data comprising hyperspectral images from a confocal microscope, our method of analysis was approximately 60 times faster than operating on the full data set, while obtaining essentially equivalent results. Published in 2008 by John Wiley & Sons, Ltd.
In this study, (CFx)n cathode reaction during discharge has been investigated using in situ X-ray diffraction (XRD). Mathematical treatment of the in situ XRD data set was performed using multivariate curve resolution with alternating least squares (MCR–ALS), a technique of multivariate analysis. MCR–ALS analysis successfully separated the relatively weak XRD signal intensity due to the chemical reaction from the other inert cell component signals. The resulting dynamic reaction component revealed the loss of (CFx)n cathode signal together with the simultaneous appearance of LiF by-product intensity. Careful examination of the XRD data set revealed an additional dynamic component which may be associated with the formation of an intermediate compound during the discharge process.
Microanalysis is typically performed to analyze the near surface of materials. There are many instances where chemical information about the third spatial dimension is essential to the solution of materials analyses. The majority of 3D analyses however focus on limited spectral acquisition and/or analysis. For truly comprehensive 3D chemical characterization, 4D spectral images (a complete spectrum from each volume element of a region of a specimen) are needed. Furthermore, a robust statistical method is needed to extract the maximum amount of chemical information from that extremely large amount of data. In this paper, an example of the acquisition and multivariate statistical analysis of 4D (3-spatial and 1-spectral dimension) x-ray spectral images is described. The method of utilizing a single- or dual-beam FIB (w/o or w/SEM) to get at 3D chemistry has been described by others with respect to secondary-ion mass spectrometry. The basic methodology described in those works has been modified for comprehensive x-ray microanalysis in a dual-beam FIB/SEM (FEI Co. DB-235). In brief, the FIB is used to serially section a site-specific region of a sample and then the electron beam is rastered over the exposed surfaces with x-ray spectral images being acquired at each section. All this is performed without rotating or tilting the specimen between FIB cutting and SEM imaging/x-ray spectral image acquisition. The resultant 4D spectral image is then unfolded (number of volume elements by number of channels) and subjected to the same multivariate curve resolution (MCR) approach that has proven successful for the analysis of lower-dimension x-ray spectral images. The TSI data sets can be in excess of 4Gbytes. This problem has been overcome (for now) and images up to 6Gbytes have been analyzed in this work. The method for analyzing such large spectral images will be described in this presentation. A comprehensive 3D chemical analysis was performed on several corrosion specimens of Cu electroplated with various metals. Figure 1A shows the top view of the localized corrosion region prepared for FIB sectioning. The TSI region has been coated with Pt and a trench has been milled along the bottom edge of the region, exposing it to the electron beam as seen in Figure 1B. The TSI consisted of 25 sections and was approximately 6Gbytes. Figure 1C shows several of the components rendered in 3D: Green is Cu; blue is Pb; cyan represents one of the corrosion products that contains Cu, Zn, O, S, and C; and orange represents the other corrosion product with Zn, O, S and C. Figure 1 D shows all of the component spectral shapes from the analysis. There is severe pathological overlap of the spectra from Ni, Cu and Zn as well as Pb and S. in spite of this clean spectral shapes have been extracted from the TSI. This powerful TSI technique could be applied to other sectioning methods well.
Spectral imaging where a complete spectrum is collected from each of a series of spatial locations (1D lines, 2D images or 3D volumes) is now available on a wide range of analytical tools - from electron and x-ray to ion beam instruments. With this capability to collect extremely large spectral images comes the need for automated data analysis tools that can rapidly and without bias reduce a large number of raw spectra to a compact, chemically relevant, and easily interpreted representation. It is clear that manual interrogation of individual spectra is impractical even for very small spectral images (< 5000 spectra). More typical spectral images can contain tens of thousands to millions of spectra, which given the constraint of acquisition time may contain between 5 and 300 counts per 1000-channel spectrum. Conventional manual approaches to spectral image analysis such as summing spectra from regions or constructing x-ray maps are prone to bias and possibly error. One way to comprehensively analyze spectral image data, which has been automated, is to utilize an unsupervised self-modeling multivariate statistical analysis method such as multivariate curve resolution (MCR). This approach has proven capable of solving a wide range of analytical problems based upon the counting of x-rays (SEM/STEM-EDX, XRF, PIXE), electrons (EELS, XPS) and ions (TOF-SIMS). As an example of the MCR approach, a STEM x-ray spectral image from a ZrB2-SiC composite was acquired and analyzed. The data were generated in a FEI Tecnai F30-ST TEM/STEM operated at 300kV, equipped with an EDAX SUTW x-ray detector. The spectral image was acquired with the TIA software on the STEM at 128 by 128 pixels (12nm/pixel) for 100msec dwell per pixel (total acquisition time was 30 minutes) with a probe of approximately the same size as each pixel. Each spectrum in the image had, on average, 500 counts. The calculation took 5 seconds on a PC workstation with dual 2.4GHz PentiumIV Xeon processors and 2Gbytes of RAM and resulted in four chemically relevant components, which are shown in Figure 1. The analysis region was at a triple junction of three ZrB2 grains that contained zirconium oxide, aluminum oxide and a glass phase. The power of unbiased statistical methods, such as MCR as applied here, is that no a priori knowledge of the material's chemistry is required. The algorithms, in this case, effectively reduced over 16,000 2000-channel spectra (64Mbytes) to four images and four spectral shapes (72kbytes), which in this case represent chemical phases. This three order of magnitude compression is achieved rapidly with no loss of chemical information. There is also the potential to correlate multiple analytical techniques like, for example, EELS and EDS in the STEM adding sensitivity to light elements as well as bonding information for EELS to the more comprehensive spectral coverage of EDS.
While hyperspectral imaging systems are increasingly used in remote sensing and offer enhanced scene characterization relative to univariate and multispectral technologies, it has proven difficult in practice to extract all of the useful information from these systems due to overwhelming data volume, confounding atmospheric effects, and the limited a priori knowledge regarding the scene. The need exists for the ability to perform rapid and comprehensive data exploitation of remotely sensed hyperspectral imagery. To address this need, this paper describes the application of a fast and rigorous multivariate curve resolution (MCR) algorithm to remotely sensed thermal infrared hyperspectral images. Employing minimal a priori knowledge, notably non-negativity constraints on the extracted endmember profiles and a constant abundance constraint for the atmospheric upwelling component, it is demonstrated that MCR can successfully compensate thermal infrared hyperspectral images for atmospheric upwelling and, thereby, transmittance effects. We take a semi-synthetic approach to obtaining image data containing gas plumes by adding emission gas signals onto real hyperspectral images. MCR can accurately estimate the relative spectral absorption coefficients and thermal contrast distribution of an ammonia gas plume component added near the minimum detectable quantity.
High throughput instruments and analysis techniques are required in order to make good use of the genomic sequences that have recently become available for many species, including humans. These instruments and methods must work with tens of thousands of genes simultaneously, and must be able to identify the small subsets of those genes that are implicated in the observed phenotypes, or, for instance, in responses to therapies. Microarrays represent one such high throughput method, which continue to find increasingly broad application. This project has improved microarray technology in several important areas. First, we developed the hyperspectral scanner, which has discovered and diagnosed numerous flaws in techniques broadly employed by microarray researchers. Second, we used a series of statistically designed experiments to identify and correct errors in our microarray data to dramatically improve the accuracy, precision, and repeatability of the microarray gene expression data. Third, our research developed new informatics techniques to identify genes with significantly different expression levels. Finally, natural language processing techniques were applied to improve our ability to make use of online literature annotating the important genes. In combination, this research has improved the reliability and precision of laboratory methods and instruments, while also enabling substantially faster analysis and discovery.
Analytical instrumentation such as time-of-flight secondary ion mass spectrometry (ToF-SIMS) provides a tremendous quantity of data since an entire mass spectrum is saved at each pixel in an ion image. The analyst often selects only a few species for detailed analysis; the majority of the data are not utilized. Researchers at Sandia National Laboratory (SNL) have developed a powerful multivariate statistical analysis (MVSA) toolkit named AXSIA (Automated eXpert Spectrum Image Analysis) that looks for trends in complete datasets (e.g., analyzes the entire mass spectrum at each pixel). A unique feature of the AXSIA toolkit is the generation of intuitive results (e.g., negative peaks are not allowed in the spectral response). The robust statistical process is able to unambiguously identify all of the spectral features uniquely associated with each distinct component throughout the dataset. General Electric and Sandia used AXSIA to analyze raw data files generated on an Ion Tof IV ToF-SIMS instrument. Here, we will show that the MVSA toolkit identified metallic contaminants within a defect in a polymer sample. These metallic contaminants were not identifiable using standard data analysis protocol.
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) by its parallel nature, generates complex and very large datasets quickly and easily. An example of such a large dataset is a spectral image where a complete spectrum is collected for each pixel. Unfortunately, the large size of the data matrix involved makes it difficult to extract the chemical information from the data using traditional techniques. Because time constraints prevent an analysis of every peak, prior knowledge is used to select the most probable and significant peaks for evaluation. However, this approach may lead to a misinterpretation of the system under analysis. Ideally, the complete spectral image would be used to provide a comprehensive, unbiased materials characterization based on full spectral signatures. Automated eXpert spectral image analysis (AXSIA) software developed at Sandia National Laboratories implements a multivariate curve resolution technique that was originally developed for energy dispersive X-ray spectroscopy (EDS) [Microsci. Microanal. 9 (2003) 1]. This paper will demonstrate the application of the method to TOF-SIMS. AXSIA distills complex and very large spectral image datasets into a limited number of physically realizable and easily interpretable chemical components, including both spectra and concentrations. The number of components derived during the analysis represents the minimum number of components needed to completely describe the chemical information in the original dataset. Since full spectral signatures are used to determine each component, an enhanced signal-to-noise is realized. The efficient statistical aggregation of chemical information enables small and unexpected features to be automatically found without user intervention.
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) is capable of generating huge volumes of data. TOF-SIMS spectrum-images, comprising complete mass spectra at each point in a spatial array, are easily acquired with modern instrumentation. With the addition of depth profiling, spectra can be collected from up to three spatial dimensions leading to data sets that are seemingly unlimited in size. Multivariate statistical techniques such as principal component analysis, multivariate curve resolution and other factor analysis methods are being used to meet the challenge of turning that mountain of data into analytically useful knowledge. These methods work by extracting the essential chemical information embedded in the high dimensional data into a limited number of factors that describe the spectrally active pure components present in the sample. A review of the recent literature shows that the mass spectral data are often scaled prior to multivariate analysis. Common preprocessing steps include normalization of the pixel intensities, and auto- or variance-scaling of the mass spectra. In this paper, we will demonstrate that these pretreatments can lead to less than satisfactory results and, in fact, can be counterproductive. By taking the Poisson nature of the data into consideration, however, a scaling method can be devised that is optimal in a maximum likelihood sense. Using a simple and intuitive example, we will demonstrate the superiority of the optimal scaling approach for estimating the number of pure components, for segregating the chemical information into as few components as possible, and for discriminating small features from noise.