Publications

Results 101–113 of 113

Search results

Jump to search filters

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George W.; Olivier, Stephen L.; Stark, Dylan T.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

Design issues in the semantics and scheduling of asynchronous tasks

Olivier, Stephen L.

The asynchronous task model serves as a useful vehicle for shared memory parallel programming, particularly on multicore and manycore processors. As adoption of model among programmers has increased, support has emerged for the integration of task parallel language constructs into mainstream programming languages, e.g., C and C++. This paper examines some of the design decisions in Cilk and OpenMP concerning semantics and scheduling of asynchronous tasks with the aim of informing the efforts of committees considering language integration, as well as developers of new task parallel languages and libraries.

More Details
Results 101–113 of 113
Results 101–113 of 113