Publications

Results 2301–2400 of 9,998

Search results

Jump to search filters

Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning

IEEE Transactions on Parallel and Distributed Systems

Tuncer, Ozan; Ates, Emre; Zhang, Yijia; Turk, Ata; Brandt, James M.; Leung, Vitus J.; Egele, Manuel; Coskun, Ayse K.

As the size and complexity of high performance computing (HPC) systems grow in line with advancements in hardware and software technology, HPC systems increasingly suffer from performance variations due to shared resource contention as well as software-and hardware-related problems. Such performance variations can lead to failures and inefficiencies, which impact the cost and resilience of HPC systems. To minimize the impact of performance variations, one must quickly and accurately detect and diagnose the anomalies that cause the variations and take mitigating actions. However, it is difficult to identify anomalies based on the voluminous, high-dimensional, and noisy data collected by system monitoring infrastructures. This paper presents a novel machine learning based framework to automatically diagnose performance anomalies at runtime. Our framework leverages historical resource usage data to extract signatures of previously-observed anomalies. We first convert collected time series data into easy-to-compute statistical features. We then identify the features that are required to detect anomalies, and extract the signatures of these anomalies. At runtime, we use these signatures to diagnose anomalies with negligible overhead. We evaluate our framework using experiments on a real-world HPC supercomputer and demonstrate that our approach successfully identifies 98 percent of injected anomalies and consistently outperforms existing anomaly diagnosis techniques.

More Details

Low-Power Deep Learning Inference using the SpiNNaker Neuromorphic Platform

ACM International Conference Proceeding Series

Vineyard, Craig M.; Dellana, Ryan; Aimone, James B.; Rothganger, Fredrick R.; Severa, William M.

With the successes deep neural networks have achieved across a range of applications, researchers have been exploring computational architectures to more efficiently execute their operation. In addition to the prevalent role of graphics processing units (GPUs), many accelerator architectures have emerged. Neuromorphic is one such particular approach which takes inspiration from the brain to guide the computational principles of the architecture including varying levels of biological realism. In this paper we present results on using the SpiNNaker neuromorphic platform (48-chip model) for deep learning neural network inference. We use the Sandia National Laboratories developed Whetstone spiking deep learning library to train deep multi-layer perceptrons and convolutional neural networks suitable for the spiking substrate on the neural hardware architecture. By using the massively parallel nature of SpiNNaker, we are able to achieve, under certain network topologies, substantial network tiling and consequentially impressive inference throughput. Such high-throughput systems may have eventual application in remote sensing applications where large images need to be chipped, scanned, and processed quickly. Additionally, we explore complex topologies that push the limits of the SpiNNaker routing hardware and investigate how that impacts mapping software-implemented networks to on-hardware instantiations.

More Details

Adaptive wavelet compression of large additive manufacturing experimental and simulation datasets

Computational Mechanics

Salloum, Maher S.; Johnson, Kyle J.; Bishop, Joseph E.; Aytac, Jon M.; Dagel, Daryl D.; van Bloemen Waanders, Bart G.

New manufacturing technologies such as additive manufacturing require research and development to minimize the uncertainties in the produced parts. The research involves experimental measurements and large simulations, which result in huge quantities of data to store and analyze. We address this challenge by alleviating the data storage requirements using lossy data compression. We select wavelet bases as the mathematical tool for compression. Unlike images, additive manufacturing data is often represented on irregular geometries and unstructured meshes. Thus, we use Alpert tree-wavelets as bases for our data compression method. We first analyze different basis functions for the wavelets and find the one that results in maximal compression and miminal error in the reconstructed data. We then devise a new adaptive thresholding method that is data-agnostic and allows a priori estimation of the reconstruction error. Finally, we propose metrics to quantify the global and local errors in the reconstructed data. One of the error metrics addresses the preservation of physical constraints in reconstructed data fields, such as divergence-free stress field in structural simulations. While our compression and decompression method is general, we apply it to both experimental and computational data obtained from measurements and thermal/structural modeling of the sintering of a hollow cylinder from metal powders using a Laser Engineered Net Shape process. The results show that monomials achieve optimal compression performance when used as wavelet bases. The new thresholding method results in compression ratios that are two to seven times larger than the ones obtained with commonly used thresholds. Overall, adaptive Alpert tree-wavelets can achieve compression ratios between one and three orders of magnitude depending on the features in the data that are required to preserve. These results show that Alpert tree-wavelet compression is a viable and promising technique to reduce the size of large data structures found in both experiments and simulations.

More Details

Asynchronous Ballistic Reversible Fluxon Logic

IEEE Transactions on Applied Superconductivity

Frank, Michael P.; Lewis, Rupert; Missert, Nancy A.; Wolak, Matthaeus W.; Henry, Michael D.

In a previous study, we described a new abstract circuit model for reversible computation called Asynchronous Ballistic Reversible Computing (ABRC), in which localized information bearing pulses propagate ballistically along signal paths between stateful abstract devices, and elastically scatter off those devices serially, while updating the device state in a logically-reversible and deterministic fashion. The ABRC model has been shown to be capable of universal computation. In the research reported here, we begin exploring how the ABRC model might be realized in practice using single flux quantum (SFQ) solitons (fluxons) in superconducting Josephson junction (JJ) circuits. One natural family of realizations could utilize fluxon polarity to represent binary data in individual pulses propagating near-ballistically along discrete or continuous long Josephson junctions (LJJs) or microstrip passive transmission lines (PTLs), and utilize the flux charge (-1, 0, +1) of a JJ-containing superconducting loop with Φ0 < IcL < 2Φ0 to encode a ternary state variable internal to a device. A natural question then arises as to which of the definable abstract ABRC device functionalities using this data representation might be implementable using a JJ circuit that dissipates only a small fraction of the input fluxon energy. We discuss conservation rules and symmetries considered as constraints to be obeyed in these circuits, and begin the process of classifying the possible ABRC devices in this family having up to 3 bidirectional I/O terminals, and up to 3 internal states.

More Details

WearGP: A computationally efficient machine learning framework for local erosive wear predictions via nodal Gaussian processes

Wear

Laros, James H.; Furlan, John M.; Pagalthivarthi, Krishnan V.; Visintainer, Robert J.; Wildey, Timothy M.; Wang, Yan

Computational fluid dynamics (CFD)-based wear predictions are computationally expensive to evaluate, even with a high-performance computing infrastructure. Thus, it is difficult to provide accurate local wear predictions in a timely manner. Data-driven approaches provide a more computationally efficient way to approximate the CFD wear predictions without running the actual CFD wear models. In this paper, a machine learning (ML) approach, termed WearGP, is presented to approximate the 3D local wear predictions, using numerical wear predictions from steady-state CFD simulations as training and testing datasets. The proposed framework is built on Gaussian process (GP) and utilized to predict wear in a much shorter time. The WearGP framework can be segmented into three stages. At the first stage, the training dataset is built by using a number of CFD simulations in the order of O(102). At the second stage, the data cleansing and data mining processes are performed, where the nodal wear solutions are extracted from the solution database to build a training dataset. At the third stage, the wear predictions are made, using trained GP models. Two CFD case studies including 3D slurry pump impeller and casing are used to demonstrate the WearGP framework, in which 144 training and 40 testing data points are used to train and test the proposed method, respectively. The numerical accuracy, computational efficiency and effectiveness between the WearGP framework and CFD wear model for both slurry pump impellers and casings are compared. It is shown that the WearGP framework can achieve highly accurate results that are comparable with the CFD results, with a relatively small size training dataset, with a computational time reduction on the order of 105 to 106.

More Details

Single and double hole quantum dots in strained Ge/SiGe quantum wells

Nanotechnology

Hardy, Will H.; Harris, Charles T.; Su, Yi H.; Chuang, Yen; Moussa, Jonathan; Maurer, Leon M.; Li, Jiun Y.; Lu, Tzu-Ming L.; Luhman, Dwight R.

Even as today's most prominent spin-based qubit technologies are maturing in terms of capability and sophistication, there is growing interest in exploring alternate material platforms that may provide advantages, such as enhanced qubit control, longer coherence times, and improved extensibility. Recent advances in heterostructure material growth have opened new possibilities for employing hole spins in semiconductors for qubit applications. Undoped, strained Ge/SiGe quantum wells are promising candidate hosts for hole spin-based qubits due to their low disorder, large intrinsic spin-orbit coupling strength, and absence of valley states. Here, we use a simple one-layer gated device structure to demonstrate both a single quantum dot as well as coupling between two adjacent quantum dots. The hole effective mass in these undoped structures, m∗ ∼ 0.08 m 0, is significantly lower than for electrons in Si/SiGe, pointing to the possibility of enhanced tunnel couplings in quantum dots and favorable qubit-qubit interactions in an industry-compatible semiconductor platform.

More Details

Global Solution Strategies for the Network-Constrained Unit Commitment Problem with AC Transmission Constraints

IEEE Transactions on Power Systems

Castillo, Anya; Watson, Jean-Paul W.; Laird, Carl D.

We propose a novel global solution algorithm for the network-constrained unit commitment problem that incorporates a nonlinear alternating current (ac) model of the transmission network, which is a nonconvex mixed-integer nonlinear programming problem. Our algorithm is based on the multi-tree global optimization methodology, which iterates between a mixed-integer lower-bounding problem and a nonlinear upper-bounding problem. We exploit the mathematical structure of the unit commitment problem with ac power flow constraints and leverage second-order cone relaxations, piecewise outer approximations, and optimization-based bounds tightening to provide a globally optimal solution at convergence. Numerical results on four benchmark problems illustrate the effectiveness of our algorithm, both in terms of convergence rate and solution quality.

More Details

Small scale to extreme: Methods for characterizing energy efficiency in supercomputing applications

Sustainable Computing: Informatics and Systems

Younge, Andrew J.

Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.

More Details
Results 2301–2400 of 9,998
Results 2301–2400 of 9,998