Topology Mapping for Unstructured Communication Patterns
Abstract not provided.
Abstract not provided.
As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowed this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
We examine task mapping algorithms for systems that allocate jobs non-contiguously. Several studies have shown that task placement affects job running time. We focus on jobs with a stencil communication pattern and use experiments on a Cray XE to evaluate novel task mapping algorithms as well as some adapted to this setting. This is done with the miniGhost miniApp which mimics the behavior of CTH, a shock physics application. Our strategies improve average and single-run times by as much as 28% and 36% over a baseline strategy, respectively.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Sustainable Computing
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This paper examines task mapping algorithms for non-contiguously allocated parallel jobs. Several studies have shown that task placement affects job running time for both contiguously and non-contiguously allocated jobs. Traditionally, work on task mapping either uses a very general model where the job has an arbitrary communication pattern or assumes that jobs are allocated contiguously, making them completely isolated from each other. A middle ground between these two cases is the mapping problem for non-contiguous jobs having a specific communication pattern. We propose several task mapping algorithms for jobs with a stencil communication pattern and evaluate them using experiments and simulations. Our strategies improve the running time of a MiniApp by as much as 30% over a baseline strategy. Furthermore, this improvement increases markedly with the job size, demonstrating the importance of task mapping as systems grow toward exascale.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Annual ACM Symposium on Parallelism in Algorithms and Architectures
Proposed for publication in Concurrency and Computation: Practice and Experience.
Abstract not provided.