In many applications the resolution of small-scale heterogeneities remains a significant hurdle to robust and reliable predictive simulations. In particular, while material variability at the mesoscale plays a fundamental role in processes such as material failure, the resolution required to capture mechanisms at this scale is often computationally intractable. Multiscale methods aim to overcome this difficulty through judicious choice of a subscale problem and a robust manner of passing information between scales. One promising approach is the multiscale finite element method, which increases the fidelity of macroscale simulations by solving lower-scale problems that produce enriched multiscale basis functions. In this study, we present the first work toward application of the multiscale finite element method to the nonlocal peridynamic theory of solid mechanics. This is achieved within the context of a discontinuous Galerkin framework that facilitates the description of material discontinuities and does not assume the existence of spatial derivatives. Analysis of the resulting nonlocal multiscale finite element method is achieved using the ambulant Galerkin method, developed here with sufficient generality to allow for application to multiscale finite element methods for both local and nonlocal models that satisfy minimal assumptions. We conclude with preliminary results on a mixed-locality multiscale finite element method in which a nonlocal model is applied at the fine scale and a local model at the coarse scale.
Evaluating the effectiveness of data visualizations is a challenging undertaking and often relies on one-off studies that test a visualization in the context of one specific task. Researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles that could be applied to assessing the effectiveness of data visualizations in a more rapid and generalizable manner. One possibility for such a tool is a model of visual saliency for data visualizations. Visual saliency models are typically based on the properties of the human visual cortex and predict which areas of a scene have visual features (e.g. color, luminance, edges) that are likely to draw a viewer's attention. While these models can accurately predict where viewers will look in a natural scene, they typically do not perform well for abstract data visualizations. In this paper, we discuss the reasons for the poor performance of existing saliency models when applied to data visualizations. We introduce the Data Visualization Saliency (DVS) model, a saliency model tailored to address some of these weaknesses, and we test the performance of the DVS model and existing saliency models by comparing the saliency maps produced by the models to eye tracking data obtained from human viewers. Finally, we describe how modified saliency models could be used as general tools for assessing the effectiveness of visualizations, including the strengths and weaknesses of this approach.
When very few samples of a random quantity are available from a source distribution or probability density function (PDF) of unknown shape, it is usually not possible to accurately infer the PDF from which the data samples come. Then a significant component of epistemic uncertainty exists concerning the source distribution of random or aleatory variability. For many engineering purposes, including design and risk analysis, one would normally want to avoid inference related under-estimation of important quantities such as response variance, and failure probabilities. Recent research has established the practicality and effectiveness of a class of simple and inexpensive UQ Methods for reasonable conservative estimation of such quantities when only sparse samples of a random quantity are available. This class of UQ methods is explained, demonstrated, and analyzed in this paper within the context of the Sandia Cantilever Beam End-to-End UQ Problem, Part A.1. Several sets of sparse replicate data are involved and several representative uncertainty quantities are to be estimated: A) beam deflection variability, in particular the 2.5 to 97.5 percentile “central 95%” range of the sparsely sampled PDF of deflection; and B) a small exceedance probability associated with a tail of the PDF integrated beyond a specified deflection tolerance.
Park, Michael A.; Barral, Nicolas; Ibanez-Granados, Daniel A.; Kamenetskiy, Dmitry S.; Krakos, Joshua A.; Michal, Todd; Loseille, Adrien
Unstructured grid adaptation is a tool to control Computational Fluid Dynamics (CFD) discretization error. However, adaptive grid techniques have made limited impact on production analysis workflows where the control of discretization error is critical to obtaining reliable simulation results. Issues that prevent the use of adaptive grid methods are identified by applying unstructured grid adaptation methods to a series of benchmark cases. Once identified, these challenges to existing adaptive workflows can be addressed. Unstructured grid adaptation is evaluated for test cases described on the Turbulence Modeling Resource (TMR) web site, which documents uniform grid refinement of multiple schemes. The cases are turbulent flow over a Hemisphere Cylinder and an ONERA M6 Wing. Adaptive grid force and moment trajectories are shown for three integrated grid adaptation processes with Mach interpolation control and output error based metrics. The integrated grid adaptation process with a finite element (FE) discretization produced results consistent with uniform grid refinement of fixed grids. The integrated grid adaptation processes with finite volume schemes were slower to converge to the reference solution than the FE method. Metric conformity is documented on grid/metric snapshots for five grid adaptation mechanics implementations. These tools produce anisotropic boundary conforming grids requested by the adaptation process.
When very few samples of a random quantity are available from a source distribution or probability density function (PDF) of unknown shape, it is usually not possible to accurately infer the PDF from which the data samples come. Then a significant component of epistemic uncertainty exists concerning the source distribution of random or aleatory variability. For many engineering purposes, including design and risk analysis, one would normally want to avoid inference related under-estimation of important quantities such as response variance, and failure probabilities. Recent research has established the practicality and effectiveness of a class of simple and inexpensive UQ Methods for reasonable conservative estimation of such quantities when only sparse samples of a random quantity are available. This class of UQ methods is explained, demonstrated, and analyzed in this paper within the context of the Sandia Cantilever Beam End-to-End UQ Problem, Part A.1. Several sets of sparse replicate data are involved and several representative uncertainty quantities are to be estimated: A) beam deflection variability, in particular the 2.5 to 97.5 percentile “central 95%” range of the sparsely sampled PDF of deflection; and B) a small exceedance probability associated with a tail of the PDF integrated beyond a specified deflection tolerance.
In this work we present a computational capability featuring a hierarchy of models with different fidelities for the solution of electrokinetics problems at the micro-/nano-scale. A multifidelity approach allows the selection of the most appropriate model, in terms of accuracy and computational cost, for the particular application at hand. We demonstrate the proposed multifidelity approach by studying the mobility of a colloid in a micro-channel as a function of the colloid charge and of the size of the ions dissolved in the fluid.
Independent meshing of subdomains separated by an interface can lead to spatially non-coincident discrete interfaces. We present an optimization-based coupling method for such problems, which does not require a common mesh refinement of the interface, has optimal H1 convergence rates, and passes a patch test. The method minimizes the mismatch of the state and normal stress extensions on discrete interfaces subject to the subdomain equations, while interface “fluxes” provide virtual Neumann controls.
Maintaining the performance of high-performance computing (HPC) applications with the expected increase in failures is a major challenge for next-generation extreme-scale systems. With increasing scale, resilience activities (e.g. checkpointing) are expected to become more diverse, less tightly synchronized, and more computationally intensive. Few existing studies, however, have examined how decisions about scheduling resilience activities impact application performance. In this work, we examine the relationship between the duration and frequency of resilience activities and application performance. Our study reveals several key findings: (i) the aggregate amount of time consumed by resilience activities is not an effective metric for predicting application performance; (ii) the duration of the interruptions due to resilience activities has the greatest influence on application performance; shorter, but more frequent, interruptions are correlated with better application performance; and (iii) the differential impact of resilience activities across applications is related to the applications’ inter-collective frequencies; the performance of applications that perform infrequent collective operations scales better in the presence of resilience activities than the performance of applications that perform more frequent collective operations. This initial study demonstrates the importance of considering how resilience activities are scheduled. We provide critical analysis and direct guidance on how the resilience challenges of future systems can be met while minimizing the impact on application performance.
We review the physical foundations of Landauer’s Principle, which relates the loss of information from a computational process to an increase in thermodynamic entropy. Despite the long history of the Principle, its fundamental rationale and proper interpretation remain frequently misunderstood. Contrary to some misinterpretations of the Principle, the mere transfer of entropy between computational and non-computational subsystems can occur in a thermodynamically reversible way without increasing total entropy. However, Landauer’s Principle is not about general entropy transfers; rather, it more specifically concerns the ejection of (all or part of) some correlated information from a controlled, digital form (e.g., a computed bit) to an uncontrolled, non-computational form, i.e., as part of a thermal environment. Any uncontrolled thermal system will, by definition, continually re-randomize the physical information in its thermal state, from our perspective as observers who cannot predict the exact dynamical evolution of the microstates of such environments. Thus, any correlations involving information that is ejected into and subsequently thermalized by the environment will be lost from our perspective, resulting directly in an irreversible increase in thermodynamic entropy. Avoiding the ejection and thermalization of correlated computational information motivates the reversible computing paradigm, although the requirements for computations to be thermodynamically reversible are less restrictive than frequently described, particularly in the case of stochastic computational operations. There are interesting possibilities for the design of computational processes that utilize stochastic, many-to-one computational operations while nevertheless avoiding net entropy increase that remain to be fully explored.
In this work we present a computational capability featuring a hierarchy of models with different fidelities for the solution of electrokinetics problems at the micro-/nano-scale. A multifidelity approach allows the selection of the most appropriate model, in terms of accuracy and computational cost, for the particular application at hand. We demonstrate the proposed multifidelity approach by studying the mobility of a colloid in a micro-channel as a function of the colloid charge and of the size of the ions dissolved in the fluid.
The development of scramjet engines is an important research area for advancing hypersonic and orbital flights. Progress toward optimal engine designs requires accurate flow simulations together with uncertainty quantification. However, performing uncertainty quantification for scramjet simulations is challenging due to the large number of uncertainparameters involvedandthe high computational costofflow simulations. These difficulties are addressedin this paper by developing practical uncertainty quantification algorithms and computational methods, and deploying themin the current studyto large-eddy simulations ofajet incrossflow inside a simplified HIFiRE Direct Connect Rig scramjet combustor. First, global sensitivity analysis is conducted to identify influential uncertain input parameters, which can help reduce the system's stochastic dimension. Second, because models of different fidelity are used in the overall uncertainty quantification assessment, a framework for quantifying and propagating the uncertainty due to model error is presented. These methods are demonstrated on a nonreacting jet-in-crossflow test problem in a simplified scramjet geometry, with parameter space up to 24 dimensions, using static and dynamic treatments of the turbulence subgrid model, and with two-dimensional and three-dimensional geometries.
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162-C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble when applying iterative linear solvers to parameterized and stochastic linear systems. In this work we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen-Loève expansions. We demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
The development of scramjet engines is an important research area for advancing hypersonic and orbital flights. Progress towards optimal engine designs requires accurate and computationally affordable flow simulations, as well as uncertainty quantification (UQ). While traditional UQ techniques can become prohibitive under expensive simulations and high-dimensional parameter spaces, polynomial chaos (PC) surrogate modeling is a useful tool for alleviating some of the computational burden. However, non-intrusive quadrature-based constructions of PC expansions relying on a single high-fidelity model can still be quite expensive. We thus introduce a two-stage numerical procedure for constructing PC surrogates while making use of multiple models of different fidelity. The first stage involves an initial dimension reduction through global sensitivity analysis using compressive sensing. The second stage utilizes adaptive sparse quadrature on a multifidelity expansion to compute PC surrogate coefficients in the reduced parameter space where quadrature methods can be more effective. The overall method is used to produce accurate surrogates and to propagate uncertainty induced by uncertain boundary conditions and turbulence model parameters, for performance quantities of interest from large eddy simulations of supersonic reactive flows inside a scramjet engine.
Within the SEQUOIA project, funded by the DARPA EQUiPS program, we pursue algorithmic approaches that enable comprehensive design under uncertainty, through inclusion of aleatory/parametric and epistemic/model form uncertainties within scalable forward/inverse UQ approaches. These statistical methods are embedded within design processes that manage computational expense through active subspace, multilevel-multifidelity, and reduced-order modeling approximations. To demonstrate these methods, we focus on the design of devices that involve multi-physics interactions in advanced aerospace vehicles. A particular problem of interest is the shape design of nozzles for advanced vehicles such as the Northrop Grumman UCAS X-47B, involving coupled aero-structural-thermal simulations for nozzle performance. In this paper, we explore a combination of multilevel and multifidelity forward and inverse UQ algorithms to reduce the overall computational cost of the analysis by leveraging hierarchies of model form (i.e., multifidelity hierarchies) and solution discretization (i.e., multilevel hierarchies) in order of exploit trade offs between solution accuracy and cost. In particular, we seek the most cost effective fusion of information across complex multi-dimensional modeling hierarchies. Results to date indicate the utility of multiple approaches, including methods that optimally allocate resources when estimator variance varies smoothly across levels, methods that allocate sufficient sampling density based on sparsity estimates, and methods that employ greedy multilevel refinement.
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features across the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.
The design of satellites usually includes the objective of minimizing mass due to high launch costs, which is complicated by the need to protect sensitive electronics from the space radiation environment. There is growing interest in automated design optimization techniques to help achieve that objective. Traditional optimization approaches that rely exclusively on response functions (e.g. dose calculations) can be quite expensive when applied to transport problems. Previously we showed how adjoint-based transport sensitivities used in conjunction with gradient-based optimization algorithms can be quite effective in designing mass-efficient electron/proton shields in one-dimensional slab geometries. In this paper we extend that work to two-dimensional Cartesian geometries. This consists primarily of deriving the sensitivities to geometric changes, given a particular prescription for parametrizing the shield geometry. We incorporate these sensitivities into our optimization process and demonstrate their effectiveness in such design calculations.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Klinkenberg, Jannis; Samfass, Philipp; Terboven, Christian; Duran, Alejandro; Klemm, Michael; Teruel, Xavier; Mateo, Sergi; Olivier, Stephen L.; Muller, Matthias S.
In modern shared-memory NUMA systems which typically consist of two or more multi-core processor packages with local memory, affinity of data to computation is crucial for achieving high performance with an OpenMP program. OpenMP* 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks is missing. In this paper, we investigate several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing. We introduce the task affinity clause that will be part of OpenMP 5.0 and provide the reasoning behind its design. Evaluation with our experimental implementation in the LLVM OpenMP runtime shows that task affinity improves execution performance up to 4.5x on an 8-socket NUMA machine and significantly reduces runtime variability of OpenMP tasks. Our results demonstrate that a variety of applications can benefit from task affinity and that the presented clause is closing the gap of task-to-data affinity in OpenMP 5.0.
Modern supercomputers are shared among thousands of users running a variety of applications. Knowing which applications are running in the system can bring substantial benefits: knowledge of applications that intensively use shared resources can aid scheduling; unwanted applications such as cryptocurrency mining or password cracking can be blocked; system architects can make design decisions based on system usage. However, identifying applications on supercomputers is challenging because applications are executed using esoteric scripts along with binaries that are compiled and named by users. This paper introduces a novel technique to identify applications running on supercomputers. Our technique, Taxonomist, is based on the empirical evidence that applications have different and characteristic resource utilization patterns. Taxonomist uses machine learning to classify known applications and also detect unknown applications. We test our technique with a variety of benchmarks and cryptocurrency miners, and also with applications that users of a production supercomputer ran during a 6 month period. We show that our technique achieves nearly perfect classification for this challenging data set.
Supercomputing hardware is undergoing a period of significant change. In order to cope with the rapid pace of hardware and, in many cases, programming model innovation, we have developed the Kokkos Programming Model – a C++-based abstraction that permits performance portability across diverse architectures. Our experience has shown that the abstractions developed can significantly frustrate debugging and profiling activities because they break expected code proximity and layout assumptions. In this paper we present the Kokkos Profiling interface, a lightweight, suite of hooks to which debugging and profiling tools can attach to gain deep insights into the execution and data structure behaviors of parallel programs written to the Kokkos interface.
We discuss uncertainty quantification in multisensor data integration and analysis, including estimation methods and the role of uncertainty in decision making and trust in automated analytics. The challenges associated with automatically aggregating information across multiple images, identifying subtle contextual cues, and detecting small changes in noisy activity patterns are well-established in the intelligence, surveillance, and reconnaissance (ISR) community. In practice, such questions cannot be adequately addressed with discrete counting, hard classifications, or yes/no answers. For a variety of reasons ranging from data quality to modeling assumptions to inadequate definitions of what constitutes "interesting" activity, variability is inherent in the output of automated analytics, yet it is rarely reported. Consideration of these uncertainties can provide nuance to automated analyses and engender trust in their results. In this work, we assert the importance of uncertainty quantification for automated data analytics and outline a research agenda. We begin by defining uncertainty in the context of machine learning and statistical data analysis, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues and discuss methods for data-driven uncertainty quantification in the context of a multi-source image analysis example. We conclude by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics, including sensor tasking and analyst trust in automated analytics.
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.
Real-time energy pricing has caused a paradigm shift for process operations with flexibility becoming a critical driver of economics. As such, incorporating real-time pricing into planning and scheduling optimization formulations has received much attention over the past two decades (Zhang and Grossman, 2016). These formulations, however, focus on 1-hour or longer time discretizations and neglect process dynamics. Recent analysis of historical price data from the California electricity market (CAISO) reveals that a majority of economic opportunities come from fast market layers, i.e., real-time energy market and ancillary services (Dowling et al., 2017). We present a dynamic optimization framework to quantify the revenue opportunities of chemical manufacturing systems providing frequency regulation (FR). Recent analysis of first order systems finds that slow process dynamics naturally dampen high frequency harmonics in FR signals (Dowling and Zavala, 2017). As a consequence, traditional chemical processes with long time constants may be able to provide fast flexibility without disrupting product quality, performance of downstream unit operations, etc. This study quantifies the ability of a distillation system to provide sufficient dynamic flexibility to adjust energy demands every 4 seconds in response to market signals. Using a detailed differential algebraic equation (DAE) model (Hahn and Edgar, 2002) and historic data from the Texas electricity market (ECROT), we estimate revenue opportunities for different column designs. We implement our model using the algebraic modeling language Pyomo (Hart et al., 2011) and its dynamic optimization extension Pyomo.DAE (Nicholson et al., 2017). These software packages enable rapid development of complex optimization models using high-level modelling constructs and provide flexible tools for initializing and discretizing DAE models.
Multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physics compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.
The solution of the Optimal Power Flow (OPF) and Unit Commitment (UC) problems (i.e., determining generator schedules and set points that satisfy demands) is critical for efficient and reliable operation of the electricity grid. For computational efficiency, the alternating current OPF (ACOPF) problem is usually formulated with a linearized transmission model, often referred to as the DCOPF problem. However, these linear approximations do not guarantee global optimality or even feasibility for the true nonlinear alternating current (AC) system. Nonlinear AC power flow models can and should be used to improve model fidelity, but successful global solution of problems with these models requires the availability of strong relaxations of the AC optimal power flow constraints. In this paper, we use McCormick envelopes to strengthen the well-known second-order cone (SOC) relaxation of the ACOPF problem. With this improved relaxation, we can further include tight bounds on the voltages at the reference bus, and this paper demonstrates the effectiveness of this for improved bounds tightening. We present results on the optimality gap of both the base SOC relaxation and our Strengthened SOC (SSOC) relaxation for the National Information and Communications Technology Australia (NICTA) Energy System Test Case Archive (NESTA). For the cases where the SOC relaxation yields an optimality gap more than 0.1 %, the SSOC relaxation with bounds tightening further reduces the optimality gap by an average of 67 % and ultimately reduces the optimality gap to less than 0.1 % for 58 % of all the NESTA cases considered. Stronger relaxations enable more efficient global solution of the ACOPF problem and can improve computational efficiency of MINLP problems with AC power flow constraints, e.g., unit commitment.
MPI usage patterns are changing as applications move towards fully-multithreaded runtimes. However, the impact of these patterns on MPI message matching is not well-studied. In particular, MPI’s mechanic for receiver-side data placement, message matching, can be impacted by increased message volume and nondeterminism incurred by multithreading. While there has been significant developer interest and work to provide an efficient MPI interface for multithreaded access, there has not been a study showing how these patterns affect messaging patterns and matching behavior. In this paper, we present a framework for studying the effects of multithreading on MPI message matching. This framework allows us to explore the implications of different common communication patterns and thread-level decompositions. We present a study of these impacts on the architecture of two of the Top 10 supercomputers (NERSC’s Cori and LANL’s Trinity). This data provides a baseline to evaluate reasonable matching engine queue lengths, search depths, and queue drain times under the multithreaded model. Furthermore, the study highlights surprising results on the challenge posed by message matching for multithreaded application performance.