Results 26–40 of 40
Skip to search filters

RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance

Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Dosanjh, Matthew D.; Groves, Taylor G.; Grant, Ryan E.; Brightwell, Ronald B.; Bridges, Patrick G.

Reaching Exascale will require leveraging massive parallelism while potentially leveraging asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms to support communication at such large scales. Two existing MPI mechanisms are particularly relevant to Exascale: multi-threading, to support massive concurrency, and Remote Memory Access (RMA), to support asynchronous communication. Unfor-tunately, multi-threaded MPI RMA code has not been extensively studied. Part of the reason for this is that no public benchmarks or proxy applications exist to assess its performance. The contributions of this paper are the design and demonstration of the first available proxy applications and micro-benchmark suite for multi-threaded RMA in MPI, a study of multi-threaded RMA performance of different MPI implementations, and an evaluation of how these benchmarks can be used to test development for both performance and correctness.

More Details

Preparing for exascale: Modeling MPI for Many-core systems using fine-grain queues

Proceedings of the 3rd ExaMPI Workshop at the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2015

Bridges, Patrick G.; Dosanjh, Matthew D.; Grant, Ryan E.; Skjellum, Anthony; Farmer, Shane; Brightwell, Ronald B.

This paper presents a fine-grain queueing model of MPI point-To-point messaging performance for use in the design and analysis of current and future large-scale computing sys-Tems. In particular, the model seeks to capture key perfor-mance behavior of MPI communication on many-core sys-Tems. We demonstrate that this model encompasses key MPI performance characteristics, such as short/long proto-col and offoad/onload protocol tradeos, and demonstrate its use in predicting the potential impact of architectural and software changes for many-core systems on communication performance. In addition, we also discuss the limitations of this model and potential directions for enhancing its fi-delity.

More Details

Re-evaluating network Onload vs. Offload for the many-core era

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Dosanjh, Matthew D.; Grant, Ryan E.; Bridges, Patrick G.; Brightwell, Ronald B.

This paper explores the trade-offs between on-loaded versus offloaded network stack processing for systems with varying CPU frequencies. This study explores the differences of onload and offload using experiments run at different DVFS settings to change the frequency, while measuring performance and power. This allows for a quantitative comparison of the the performance and power and trade-offs between onload and offload cards, with a wide range of CPU performances. The results show that there is often a significant performance increase in using offloaded cards especially at lower CPU frequencies, with only a small increase in power usage. This study also uses MPI profiling to analyze why some applications see a larger benefit than others. This paper's contributions are an analytical, quantitative analysis of the trade-offs between onload and offload. While there has been debate to this question, this is the first, to the authors' knowledge, analytical evaluation of the performance difference. The range of frequencies analyzed give insight on how this MPI might perform on different architectures, such as the low frequency, many-core CPUs. Finally, the power measurements allow for the study to provide further depth in the analysis.

More Details
Results 26–40 of 40
Results 26–40 of 40