As the size and complexity of high performance computing (HPC) systems grow in line with advancements in hardware and software technology, HPC systems increasingly suffer from performance variations due to shared resource contention as well as software-and hardware-related problems. Such performance variations can lead to failures and inefficiencies, which impact the cost and resilience of HPC systems. To minimize the impact of performance variations, one must quickly and accurately detect and diagnose the anomalies that cause the variations and take mitigating actions. However, it is difficult to identify anomalies based on the voluminous, high-dimensional, and noisy data collected by system monitoring infrastructures. This paper presents a novel machine learning based framework to automatically diagnose performance anomalies at runtime. Our framework leverages historical resource usage data to extract signatures of previously-observed anomalies. We first convert collected time series data into easy-to-compute statistical features. We then identify the features that are required to detect anomalies, and extract the signatures of these anomalies. At runtime, we use these signatures to diagnose anomalies with negligible overhead. We evaluate our framework using experiments on a real-world HPC supercomputer and demonstrate that our approach successfully identifies 98 percent of injected anomalies and consistently outperforms existing anomaly diagnosis techniques.
Vineyard, Craig M.; Dellana, Ryan; Aimone, James B.; Rothganger, Fredrick R.; Severa, William M.
With the successes deep neural networks have achieved across a range of applications, researchers have been exploring computational architectures to more efficiently execute their operation. In addition to the prevalent role of graphics processing units (GPUs), many accelerator architectures have emerged. Neuromorphic is one such particular approach which takes inspiration from the brain to guide the computational principles of the architecture including varying levels of biological realism. In this paper we present results on using the SpiNNaker neuromorphic platform (48-chip model) for deep learning neural network inference. We use the Sandia National Laboratories developed Whetstone spiking deep learning library to train deep multi-layer perceptrons and convolutional neural networks suitable for the spiking substrate on the neural hardware architecture. By using the massively parallel nature of SpiNNaker, we are able to achieve, under certain network topologies, substantial network tiling and consequentially impressive inference throughput. Such high-throughput systems may have eventual application in remote sensing applications where large images need to be chipped, scanned, and processed quickly. Additionally, we explore complex topologies that push the limits of the SpiNNaker routing hardware and investigate how that impacts mapping software-implemented networks to on-hardware instantiations.
New manufacturing technologies such as additive manufacturing require research and development to minimize the uncertainties in the produced parts. The research involves experimental measurements and large simulations, which result in huge quantities of data to store and analyze. We address this challenge by alleviating the data storage requirements using lossy data compression. We select wavelet bases as the mathematical tool for compression. Unlike images, additive manufacturing data is often represented on irregular geometries and unstructured meshes. Thus, we use Alpert tree-wavelets as bases for our data compression method. We first analyze different basis functions for the wavelets and find the one that results in maximal compression and miminal error in the reconstructed data. We then devise a new adaptive thresholding method that is data-agnostic and allows a priori estimation of the reconstruction error. Finally, we propose metrics to quantify the global and local errors in the reconstructed data. One of the error metrics addresses the preservation of physical constraints in reconstructed data fields, such as divergence-free stress field in structural simulations. While our compression and decompression method is general, we apply it to both experimental and computational data obtained from measurements and thermal/structural modeling of the sintering of a hollow cylinder from metal powders using a Laser Engineered Net Shape process. The results show that monomials achieve optimal compression performance when used as wavelet bases. The new thresholding method results in compression ratios that are two to seven times larger than the ones obtained with commonly used thresholds. Overall, adaptive Alpert tree-wavelets can achieve compression ratios between one and three orders of magnitude depending on the features in the data that are required to preserve. These results show that Alpert tree-wavelet compression is a viable and promising technique to reduce the size of large data structures found in both experiments and simulations.
In a previous study, we described a new abstract circuit model for reversible computation called Asynchronous Ballistic Reversible Computing (ABRC), in which localized information bearing pulses propagate ballistically along signal paths between stateful abstract devices, and elastically scatter off those devices serially, while updating the device state in a logically-reversible and deterministic fashion. The ABRC model has been shown to be capable of universal computation. In the research reported here, we begin exploring how the ABRC model might be realized in practice using single flux quantum (SFQ) solitons (fluxons) in superconducting Josephson junction (JJ) circuits. One natural family of realizations could utilize fluxon polarity to represent binary data in individual pulses propagating near-ballistically along discrete or continuous long Josephson junctions (LJJs) or microstrip passive transmission lines (PTLs), and utilize the flux charge (-1, 0, +1) of a JJ-containing superconducting loop with Φ0 < IcL < 2Φ0 to encode a ternary state variable internal to a device. A natural question then arises as to which of the definable abstract ABRC device functionalities using this data representation might be implementable using a JJ circuit that dissipates only a small fraction of the input fluxon energy. We discuss conservation rules and symmetries considered as constraints to be obeyed in these circuits, and begin the process of classifying the possible ABRC devices in this family having up to 3 bidirectional I/O terminals, and up to 3 internal states.
Computational fluid dynamics (CFD)-based wear predictions are computationally expensive to evaluate, even with a high-performance computing infrastructure. Thus, it is difficult to provide accurate local wear predictions in a timely manner. Data-driven approaches provide a more computationally efficient way to approximate the CFD wear predictions without running the actual CFD wear models. In this paper, a machine learning (ML) approach, termed WearGP, is presented to approximate the 3D local wear predictions, using numerical wear predictions from steady-state CFD simulations as training and testing datasets. The proposed framework is built on Gaussian process (GP) and utilized to predict wear in a much shorter time. The WearGP framework can be segmented into three stages. At the first stage, the training dataset is built by using a number of CFD simulations in the order of O(102). At the second stage, the data cleansing and data mining processes are performed, where the nodal wear solutions are extracted from the solution database to build a training dataset. At the third stage, the wear predictions are made, using trained GP models. Two CFD case studies including 3D slurry pump impeller and casing are used to demonstrate the WearGP framework, in which 144 training and 40 testing data points are used to train and test the proposed method, respectively. The numerical accuracy, computational efficiency and effectiveness between the WearGP framework and CFD wear model for both slurry pump impellers and casings are compared. It is shown that the WearGP framework can achieve highly accurate results that are comparable with the CFD results, with a relatively small size training dataset, with a computational time reduction on the order of 105 to 106.
Even as today's most prominent spin-based qubit technologies are maturing in terms of capability and sophistication, there is growing interest in exploring alternate material platforms that may provide advantages, such as enhanced qubit control, longer coherence times, and improved extensibility. Recent advances in heterostructure material growth have opened new possibilities for employing hole spins in semiconductors for qubit applications. Undoped, strained Ge/SiGe quantum wells are promising candidate hosts for hole spin-based qubits due to their low disorder, large intrinsic spin-orbit coupling strength, and absence of valley states. Here, we use a simple one-layer gated device structure to demonstrate both a single quantum dot as well as coupling between two adjacent quantum dots. The hole effective mass in these undoped structures, m∗ ∼ 0.08 m 0, is significantly lower than for electrons in Si/SiGe, pointing to the possibility of enhanced tunnel couplings in quantum dots and favorable qubit-qubit interactions in an industry-compatible semiconductor platform.