Memory Reliability and Performance Degradation
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the complex interplay between applications competing for shared resources. To gain new insights, monitoring tools must run continuously, system wide, at frequencies appropriate to the metrics of interest while having minimal impact on application performance. We introduce the Lightweight Distributed Metric Service for scalable, lightweight monitoring of large scale computing systems and applications. We describe issues and constraints guiding deployment in Sandia National Laboratories' capacity computing environment and on the National Center for Supercomputing Applications' Blue Waters platform including motivations, metrics of choice, and requirements relating to the scale and specialized nature of Blue Waters. We address monitoring overhead and impact on application performance and provide illustrative profiling results.
Abstract not provided.
We report on the use and design of a portable, extensible performance data collection tool motivated by modeling needs of the high performance computing systems co-design com- munity. The lightweight performance data collectors with Eiger support is intended to be a tailorable tool, not a shrink-wrapped library product, as pro ling needs vary widely. A single code markup scheme is reported which, based on compilation ags, can send perfor- mance data from parallel applications to CSV les, to an Eiger mysql database, or (in a non-database environment) to at les for later merging and loading on a host with mysql available. The tool supports C, C++, and Fortran applications.
Abstract not provided.
Abstract not provided.
Abstract not provided.
I report the progress to date of my work on scaling the CPAPR algorithm and necessary supporting code to enable processing large (gigabyte to 100 gigabyte) data sets and benchmarking the same. Where possible, I also report background information possibly of relevance in future modifications of the code. The results include: minor repairs and additions to the TTB library for portability, algorithmic improvements relevant to both serial and multithreaded implementations, algorithmic improvements taking advantage of multithreading hardware, support library additions (binary IO routines) needed for efficiently and reproducibly benchmarking the algorithms. For this optimization work, no large scale data sets are available. Therefore, scalability of data synthesis algorithms is addressed as well.
Abstract not provided.
The goal of this research was to explore first principles associated with mixing of diverse implementations in a redundant fashion to increase the security and/or reliability of information systems. Inspired by basic results in computer science on the undecidable behavior of programs and by previous work on fault tolerance in hardware and software, we have investigated the problem and solution space for addressing potentially unknown and unknowable vulnerabilities via ensembles of implementations. We have obtained theoretical results on the degree of security and reliability benefits from particular diverse system designs, and mapped promising approaches for generating and measuring diversity. We have also empirically studied some vulnerabilities in common implementations of the Linux operating system and demonstrated the potential for diversity to mitigate these vulnerabilities. Our results provide foundational insights for further research on diversity and redundancy approaches for information systems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Scientific Programming
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.