MFiX, a general-purpose Fortran-based suite, simulates the complex flow in fluidized bed applications via BiCGStab and GMRES methods along with plane relaxation preconditioners. Trilinos, an object-oriented framework, contains various first- and second-generation Krylov subspace solvers and preconditioners. We developed a framework to integrate MFiX with Trilinos as MFiX does not possess advanced linear methods. The framework allows MFiX to access advanced linear solvers and preconditioners in Trilinos. The integrated solver is called MFiX–Trilinos, here after. In the present work, we study the performance of variants of GMRES and CGS methods in MFiX–Trilinos and BiCGStab and GMRES solvers in MFiX for a 3D gas–solid fluidized bed problem. Two right preconditioners employed along with various solvers in MFiX–Trilinos are Jacobi and smoothed aggregation. The flow from MFiX–Trilinos is validated against the same from MFiX for BiCGStab and GMRES methods. And, the effect of the preconditioning on the iterative solvers in MFiX–Trilinos is also analyzed. In addition, the effect of left and right smoothed aggregation preconditioning on the solvers is studied. The performance of the first- and second-generation solver stacks in MFiX–Trilinos is studied as well for two different problem sizes.
The search for multivariate quadrature rules of minimal size with a specified polynomial accuracy has been the topic of many years of research. Finding such a rule allows accurate integration of moments, which play a central role in many aspects of scientific computing with complex models. The contribution of this paper is twofold. First, we provide novel mathematical analysis of the polynomial quadrature problem that provides a lower bound for the minimal possible number of nodes in a polynomial rule with specified accuracy. We give concrete but simplistic multivariate examples where a minimal quadrature rule can be designed that achieves this lower bound, along with situations that showcase when it is not possible to achieve this lower bound. Our second contribution is the formulation of an algorithm that is able to efficiently generate multivariate quadrature rules with positive weights on non-tensorial domains. Our tests show success of this procedure in up to 20 dimensions. We test our method on applications to dimension reduction and chemical kinetics problems, including comparisons against popular alternatives such as sparse grids, Monte Carlo and quasi Monte Carlo sequences, and Stroud rules. The quadrature rules computed in this paper outperform these alternatives in almost all scenarios.
Efficiently performing predictive studies of irradiated particle-laden turbulent flows has the potential of providing significant contributions towards better understanding and optimizing, for example, concentrated solar power systems. As there are many uncertainties inherent in such flows, conducting uncertainty quantification analyses is fundamental to improve the predictive capabilities of the numerical simulations. For largescale, multi-physics problems exhibiting high-dimensional uncertainty, characterizing the stochastic solution presents a significant computational challenge as many methods require a large number of high-fidelity, forward model solves. This requirement results in the need for a possibly infeasible number of simulations when a typical converged high-fidelity simulation requires intensive computational resources. To reduce the cost of quantifying high-dimensional uncertainties, we investigate the application of a non-intrusive, bi-fidelity approximation to estimate statistics of quantities of interest associated with an irradiated particle-laden turbulent flow. This method relies on exploiting the low-rank structure of the solution to accelerate the stochastic sampling and approximation processes by means of cheaper-to-run, lower fidelity representations. The application of this bi-fidelity approximation results in accurate estimates of the QoI statistics while requiring a small number of high-fidelity model evaluations. It also enables efficient computation of sensitivity analyses which highlight that epistemic uncertainty plays an important role in the solution of irradiated, particle-laden turbulent flow.
The classical problem of calculating the volume of the union of d-dimensional balls is known as "Union Volume." We present line-sampling approximation algorithms for Union Volume. Our methods may be extended to other Boolean operations, such as setminus; or to other shapes, such as hyper-rectangles. The deterministic, exact approaches for Union Volume do not scale well to high dimensions. However, we adapt several of these exact approaches to approximation algorithms based on sampling. We perform local sampling within each ball using lines. We have several variations, depending on how the overlapping volume is partitioned, and depending on whether radial, axis-aligned, or other line patterns are used. Our variations fall within the family of Monte Carlo sampling, and hence have about the same theoretical convergence rate, 1 /$\sqrt{M}$, where M is the number of samples. In our limited experiments, line-sampling proved more accurate per unit work than point samples, because a line sample provides more information, and the analytic equation for a sphere makes the calculation almost as fast. We performed a limited empirical study of the efficiency of these variations. We suggest a more extensive study for future work. We speculate that different ball arrangements, differentiated by the distribution of overlaps in terms of volume and degree, will benefit the most from patterns of line samples that preferentially capture those overlaps. Acknowledgement We thank Karl Bringman for explaining his BF-ApproxUnion (ApproxUnion) algorithm [3] to us. We thank Josiah Manson for pointing out that spoke darts oversample the center and we might get a better answer by uniform sampling. We thank Vijay Natarajan for suggesting random chord sampling. The authors are grateful to Brian Adams, Keith Dalbey, and Vicente Romero for useful technical discussions. This work was sponsored by the Laboratory Directed Research and Development (LDRD) Program at Sandia National Laboratories. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR), Applied Mathematics Program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.
Will quantum computation become an important milestone in human progress? Passionate advocates and equally passionate skeptics abound. IEEE already provides useful, neutral forums for state-of-the-art science and engineering knowledge as well as practical benchmarks for quantum computation evaluation. But could the organization do more.
This report documents the completion of milestone STPM12-4 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held on-site at Oak Ridge National Lab from July 24 — July 27, 2018. There were over 40 registered participants from 12 institutions, including 7 Kokkos project staff from SNL, LANL, and ORNL. The event consisted of a roughly a two-day tutorial session including hands exercises, followed by 1.5 days of intensive porting work on codes that the participants brought explore, port, and optimize the use of Kokkos with the help of Kokkos project experts.
Here, we provide a demonstration that gas-kinetic methods incorporating molecular chaos can simulate the sustained turbulence that occurs in wall-bounded turbulent shear flows. The direct simulation Monte Carlo method, a gas-kinetic molecular method that enforces molecular chaos for gas-molecule collisions, is used to simulate the minimal Couette flow at Re = 500 . The resulting law of the wall, the average wall shear stress, the average kinetic energy, and the continually regenerating coherent structures all agree closely with corresponding results from direct numerical simulation of the Navier-Stokes equations. Finally, these results indicate that molecular chaos for collisions in gas-kinetic methods does not prevent development of molecular-scale long-range correlations required to form hydrodynamic-scale turbulent coherent structures.
The retina plays an important role in animal vision - namely preprocessing visual information before sending it to the brain through the optic nerve. Understanding howthe retina does this is of particular relevance for development and design of neuromorphic sensors, especially those focused towards image processing. Our research focuses on examining mechanisms of motion processing in the retina. We are specifically interested in detection of moving targets under challenging conditions, specifically small or low-contrast (dim) targets amidst high quantities of clutter or distractor signals. In this paper we compare a classic motion-sensitive cell model, the Hassenstein-Reichardt model, to a model of the OMS (object motion-sensitive) cell, that relies primarily on change-detection, and describe scenarios for which each model is better suited. We also examine mechanisms, inspired by features of retinal circuitry, by which performance may be enhanced. For example, lateral inhibition (mediated by amacrine cells) conveys selectivity for small targets to the W3 ganglion cell - we demonstrate that a similar mechanism can be combined with the previously mentioned motion-processing cell models to select small moving targets for further processing.
The retina plays an important role in animal vision - namely preprocessing visual information before sending it to the brain through the optic nerve. Understanding howthe retina does this is of particular relevance for development and design of neuromorphic sensors, especially those focused towards image processing. Our research focuses on examining mechanisms of motion processing in the retina. We are specifically interested in detection of moving targets under challenging conditions, specifically small or low-contrast (dim) targets amidst high quantities of clutter or distractor signals. In this paper we compare a classic motion-sensitive cell model, the Hassenstein-Reichardt model, to a model of the OMS (object motion-sensitive) cell, that relies primarily on change-detection, and describe scenarios for which each model is better suited. We also examine mechanisms, inspired by features of retinal circuitry, by which performance may be enhanced. For example, lateral inhibition (mediated by amacrine cells) conveys selectivity for small targets to the W3 ganglion cell - we demonstrate that a similar mechanism can be combined with the previously mentioned motion-processing cell models to select small moving targets for further processing.
Malware detection and remediation is an on-going task for computer security and IT professionals. Here, we examine the use of neural algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored. We examine several deep learning techniques, and liquid state machines baselined against a random forest. The experiments examine the effects of concept drift to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to better understand what differentiates malware samples from goodware, which can further be used as a forensics tool to provide directions for investigation and remediation.
This paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.
Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis
Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.
Proceedings of ScalA 2018: 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis
Sparse matrix-matrix multiplication is a critical kernel for several scientific computing applications, especially the setup phase of algebraic multigrid. The MPI+X programming model, which is growing in popularity, requires that such kernels be implemented in a way that exploits on-node parallelism. We present a single-pass OpenMP variant of Gustavson's sparse matrix matrix multiplication algorithm designed for architectures (e.g. CPU or Intel Xeon Phi) with reasonably large memory and modest thread counts (tens of threads, not thousands). These assumptions allow us to exploit perfect hashing and dynamic memory allocation to achieve performance improvements of up to 2x over third-party kernels for matrices derived from algebraic multigrid setup.
2018 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2018
Baseman, Elisabeth; Debardeleben, Nathan; Blanchard, Sean; Moore, Juston; Tkachenko, Olena; Ferreira, Kurt B.; Siddiqua, Taniya; Sridharan, Vilas
As the scale of high performance computing facilities approaches the exascale era, gaining a detailed understanding of hardware failures becomes important. In particular, the extreme memory capacity of modern supercomputers means that data corruption errors which were statistically negligible at smaller scales will become more prevalent. In order to understand hardware faults and mitigate their adverse effects on exascale workloads, we must learn from the behavior of current hardware. In this work, we investigate the predictability of DRAM errors using field data from two recently decommissioned supercomputers: Cielo, at Los Alamos National Laboratory, and Hopper, at Lawrence Berkeley National Laboratory. Due to the volume and complexity of the field data, we apply statistical machine learning to predict the probability of DRAM errors at previously un-accessed locations. We compare the predictive performance of six machine learning algorithms, and find that a model incorporating physical knowledge of DRAM spatial structure outperforms purely statistical methods. Our findings both support expected physical behavior of DRAM hardware as well as providing a mechanism for real-time error prediction. We demonstrate real-world feasibility by training an error model on one supercomputer and effectively predicting errors on another. Our methods demonstrate the importance of spatial locality over temporal locality in DRAM errors, and show that relatively simple statistical models are effective at predicting future errors based on historical data, allowing proactive error mitigation.
Logic-memory integration helps mitigate the von Neumann bottleneck, and this has enabled a new class of architectures that helps accelerate graph analytics and operations on sparse data streams. These utilize merge networks as a key unit of computation. Such networks are highly parallel and their performance increases with tighter coupling between logic and memory when a bitonic algorithm is used. This paper presents energy-efficient on-chip network architectures for merging key-value pairs using both word-parallel and bit-serial paradigms. The proposed architectures are capable of merging two rows of high bandwidth memory (HBM)worth of data in a manner that is completely overlapped with the reading from and writing back to such a row. Furthermore, their energy consumption is about an order of magnitude lower when compared to a naive crossbar based design.
Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis
Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.