Publications

Publications / Conference Proceeding

Evaluating MPI Message Size Summary Statistics

Ferreira, Kurt B.; Levy, Scott

The Message Passing Interface (MPI) remains the dominant programming model for scientific applications running on today's high-performance computing (HPC) systems. This dominance stems from MPI's powerful semantics for inter-process communication that has enabled scientists to write applications for simulating important physical phenomena. MPI does not, however, specify how messages and synchronization should be carried out. Those details are typically dependent on low-level architecture details and the message characteristics of the application. Therefore, analyzing an applications MPI usage is critical to tuning MPI's performance on a particular platform. The results of this analysis is typically a discussion of average message sizes for a workload or set of workloads. While a discussion of the message average might be the most intuitive summary statistic, it might not be the most useful in terms of representing the entire message size dataset for an application. Using a previously developed MPI trace collector, we analyze the MPI message traces for a number of key MPI workloads. Through this analysis, we demonstrate that the average, while easy and efficient to calculate, may not be a good representation of all subsets of application messages sizes, with median and mode of message sizes being a superior choice in most cases. We show that the problem with using the average relate to the multi-modal nature of the distribution of point-to-point messages. Finally, we show that while scaling a workload has little discernible impact on which measures of central tendency are representative of the underlying data, different input descriptions can significantly impact which metric is most effective. The results and analysis in this paper have the potential for providing valuable guidance on how we as a community should discuss and analyze MPI message data for scientific applications.