Publications

125 Results
Skip to search filters

Performance of fully-coupled algebraic multigrid preconditioners for large-scale VMS resistive MHD

Journal of Computational and Applied Mathematics

Lin, Paul L.; Shadid, John N.; Hu, J.J.; Pawlowski, Roger P.; Cyr, E.C.

This work explores the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. This study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of the original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.

More Details

ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE

Bettencourt, Matthew T.; Kramer, Richard M.; Cartwright, Keith C.; Phillips, Edward G.; Ober, Curtis C.; Pawlowski, Roger P.; Swan, Matthew S.; Kalashnikova, Irina; Phipps, Eric T.; Conde, Sidafa C.; Cyr, Eric C.; Ulmer, Craig D.; Kordenbrock, Todd H.; Levy, Scott L.; Templet, Gary J.; Hu, Jonathan J.; Lin, Paul L.; Glusa, Christian A.; Siefert, Christopher S.; Glass, Micheal W.

This report documents the outcome from the ASC ATDM Level 2 Milestone 6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. This Milestone is an assessment of the EMPIRE (ElectroMagnetic Plasma In Realistic Environments) application and three software components. The assessment focuses on the electromagnetic and electrostatic particle-in-cell solu- tions for EMPIRE and its associated solver, time integration, and checkpoint-restart components. This information provides a clear understanding of the current status of the EMPIRE application and will help to guide future work in FY19 in order to ready the application for the ASC ATDM L 1 Milestone in FY20. It is clear from this assessment that performance of the linear solver will have to be a focus in FY19.

More Details

ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms

Baker, Gavin M.; Bettencourt, Matthew T.; Bova, S.W.; franko, ken f.; Gamell, Marc G.; Grant, Ryan E.; Hammond, Simon D.; Hollman, David S.; Knight, Samuel K.; Kolla, Hemanth K.; Lin, Paul L.; Olivier, Stephen O.; Sjaardema, Gregory D.; Slattengren, Nicole L.; Teranishi, Keita T.; Wilke, Jeremiah J.; Bennett, Janine C.; Clay, Robert L.; kale, laxkimant k.; Jain, Nikhil J.; Mikida, Eric M.; Aiken, Alex A.; Bauer, Michael B.; Lee, Wonchan L.; Slaughter, Elliott S.; Treichler, Sean T.; Berzins, Martin B.; Harman, Todd H.; humphreys, alan h.; schmidt, john s.; sunderland, dan s.; Mccormick, Pat M.; gutierrez, samuel g.; shulz, martin s.; Gamblin, Todd G.; Bremer, Peer-Timo B.

Abstract not provided.

ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms

Baker, Gavin M.; Bettencourt, Matthew T.; Bova, S.W.; franko, ken f.; Gamell, Marc G.; Grant, Ryan E.; Hammond, Simon D.; Hollman, David S.; Knight, Samuel K.; Kolla, Hemanth K.; Lin, Paul L.; Olivier, Stephen O.; Sjaardema, Gregory D.; Slattengren, Nicole L.; Teranishi, Keita T.; Wilke, Jeremiah J.; Bennett, Janine C.; Clay, Robert L.; kale, laxkimant k.; Jain, Nikhil J.; Mikida, Eric M.; Aiken, Alex A.; Bauer, Michael B.; Lee, Wonchan L.; Slaughter, Elliott S.; Treichler, Sean T.; Berzins, Martin B.; Harman, Todd H.; humphreys, alan h.; schmidt, john s.; sunderland, dan s.; Mccormick, Pat M.; gutierrez, samuel g.; shulz, martin s.; Gamblin, Todd G.; Bremer, Peer-Timo B.

This report provides in-depth information and analysis to help create a technical road map for developing next-generation programming models and runtime systems that support Advanced Simulation and Computing (ASC) work- load requirements. The focus herein is on asynchronous many-task (AMT) model and runtime systems, which are of great interest in the context of "Oriascale7 computing, as they hold the promise to address key issues associated with future extreme-scale computer architectures. This report includes a thorough qualitative and quantitative examination of three best-of-class AIM] runtime systems – Charm-++, Legion, and Uintah, all of which are in use as part of the Centers. The studies focus on each of the runtimes' programmability, performance, and mutability. Through the experiments and analysis presented, several overarching Predictive Science Academic Alliance Program II (PSAAP-II) Asc findings emerge. From a performance perspective, AIV runtimes show tremendous potential for addressing extreme- scale challenges. Empirical studies show an AM runtime can mitigate performance heterogeneity inherent to the machine itself and that Message Passing Interface (MP1) and AM11runtimes perform comparably under balanced conditions. From a programmability and mutability perspective however, none of the runtimes in this study are currently ready for use in developing production-ready Sandia ASC applications. The report concludes by recommending a co- design path forward, wherein application, programming model, and runtime system developers work together to define requirements and solutions. Such a requirements-driven co-design approach benefits the community as a whole, with widespread community engagement mitigating risk for both application developers developers. and high-performance computing runtime systein

More Details

ASC Trilab L2 Codesign Milestone 2015

Trott, Christian R.; Hammond, Simon D.; Dinge, Dennis D.; Lin, Paul L.; Vaughan, Courtenay T.; Cook, Jeanine C.; Rajan, Mahesh R.; Edwards, Harold C.; Hoekstra, Robert J.

For the FY15 ASC L2 Trilab Codesign milestone Sandia National Laboratories performed two main studies. The first study investigated three topics (performance, cross-platform portability and programmer productivity) when using OpenMP directives and the RAJA and Kokkos programming models available from LLNL and SNL respectively. The focus of this first study was the LULESH mini-application developed and maintained by LLNL. In the coming sections of the report the reader will find performance comparisons (and a demonstration of portability) for a variety of mini-application implementations produced during this study with varying levels of optimization. Of note is that the implementations utilized including optimizations across a number of programming models to help ensure claims that Kokkos can provide native-class application performance are valid. The second study performed during FY15 is a performance assessment of the MiniAero mini-application developed by Sandia. This mini-application was developed by the SIERRA Thermal-Fluid team at Sandia for the purposes of learning the Kokkos programming model and so is available in only a single implementation. For this report we studied its performance and scaling on a number of machines with the intent of providing insight into potential performance issues that may be experienced when similar algorithms are deployed on the forthcoming Trinity ASC ATS platform.

More Details

Assessing a mini-application as a performance proxy for a finite element method engineering application

Concurrency and Computation. Practice and Experience

Lin, Paul L.; Heroux, Michael A.; Williams, Alan B.; Barrett, Richard F.

The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community. However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.

More Details

Lessons Learned from Porting the MiniAero Application to Charm++

Hollman, David S.; Hollman, David S.; Bennett, Janine C.; Bennett, Janine C.; Wilke, Jeremiah J.; Wilke, Jeremiah J.; Kolla, Hemanth K.; Kolla, Hemanth K.; Lin, Paul L.; Lin, Paul L.; Slattengren, Nicole S.; Slattengren, Nicole S.; Teranishi, Keita T.; Teranishi, Keita T.; franko, ken f.; franko, ken f.; Jain, Nikhil J.; Jain, Nikhil J.; Mikida, Eric M.; Mikida, Eric M.

Abstract not provided.

Assessing the role of mini-applications in predicting key performance characteristics of scientific and engineering applications

Journal of Parallel and Distributed Computing

Barrett, R.F.; Crozier, Paul C.; Doerfler, Douglas W.; Heroux, Michael A.; Lin, Paul L.; Thornquist, Heidi K.; Trucano, Timothy G.; Vaughan, Courtenay T.

Computational science and engineering application programs are typically large, complex, and dynamic, and are often constrained by distribution limitations. As a means of making tractable rapid explorations of scientific and engineering application programs in the context of new, emerging, and future computing architectures, a suite of "miniapps" has been created to serve as proxies for full scale applications. Each miniapp is designed to represent a key performance characteristic that does or is expected to significantly impact the runtime performance of an application program. In this paper we introduce a methodology for assessing the ability of these miniapps to effectively represent these performance issues. We applied this methodology to three miniapps, examining the linkage between them and an application they are intended to represent. Herein we evaluate the fidelity of that linkage. This work represents the initial steps required to begin to answer the question, "Under what conditions does a miniapp represent a key performance characteristic in a full app?"

More Details

Towards extreme-scale simulations for low mach fluids with second-generation trilinos

Parallel Processing Letters

Lin, Paul L.; Bettencourt, Matthew T.; Domino, Stefan P.; Fisher, Travis C.; Hoemmen, Mark F.; Hu, Jonathan J.; Phipps, Eric T.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran R.; Siefert, Christopher S.; Kennon, Stephen

Trilinos is an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. While Trilinos was originally designed for scalable solutions of large problems, the fidelity needed by many simulations is significantly greater than what one could have envisioned two decades ago. When problem sizes exceed a billion elements even scalable applications and solver stacks require a complete revision. The second-generation Trilinos employs C++ templates in order to solve arbitrarily large problems. We present a case study of the integration of Trilinos with a low Mach fluids engineering application (SIERRA low Mach module/Nalu). Through the use of improved algorithms and better software engineering practices, we demonstrate good weak scaling for up to a nine billion element large eddy simulation (LES) problem on unstructured meshes with a 27 billion row matrix on 524,288 cores of an IBM Blue Gene/Q platform.

More Details

Simulation information regarding Sandia National Laboratories trinity capability improvement metric

Agelastos, Anthony M.; Lin, Paul L.

Sandia National Laboratories, Los Alamos National Laboratory, and Lawrence Livermore National Laboratory each selected a representative simulation code to be used as a performance benchmark for the Trinity Capability Improvement Metric. Sandia selected SIERRA Low Mach Module: Nalu, which is a uid dynamics code that solves many variable-density, acoustically incompressible problems of interest spanning from laminar to turbulent ow regimes, since it is fairly representative of implicit codes that have been developed under ASC. The simulations for this metric were performed on the Cielo Cray XE6 platform during dedicated application time and the chosen case utilized 131,072 Cielo cores to perform a canonical turbulent open jet simulation within an approximately 9-billion-elementunstructured- hexahedral computational mesh. This report will document some of the results from these simulations as well as provide instructions to perform these simulations for comparison.

More Details

Algebraic multilevel preconditioners for nonsymmetric PDEs on stretched grids

Lecture Notes in Computational Science and Engineering

Sala, Marzio; Lin, Paul L.; Shadid, John N.; Tuminaro, Raymond S.

We report on algebraic multilevel preconditioners for the parallel solution of linear systems arising from a Newton procedure applied to the finite-element (FE) discretization of the incompressible Navier-Stokes equations. We focus on the issue of how to coarsen FE operators produced from high aspect ratio elements.

More Details

Large-scale stabilized FE computational analysis of nonlinear steady state transport/reaction systems

Proposed for publication in Computer Methods in Applied Mechanics and Engineering.

Shadid, John N.; Salinger, Andrew G.; Pawlowski, Roger P.; Lin, Paul L.; Hennigan, Gary L.; Tuminaro, Raymond S.; Lehoucq, Richard B.

The solution of the governing steady transport equations for momentum, heat and mass transfer in fluids undergoing non-equilibrium chemical reactions can be extremely challenging. The difficulties arise from both the complexity of the nonlinear solution behavior as well as the nonlinear, coupled, non-symmetric nature of the system of algebraic equations that results from spatial discretization of the PDEs. In this paper, we briefly review progress on developing a stabilized finite element (FE) capability for numerical solution of these challenging problems. The discussion considers the stabilized FE formulation for the low Mach number Navier-Stokes equations with heat and mass transport with non-equilibrium chemical reactions, and the solution methods necessary for detailed analysis of these complex systems. The solution algorithms include robust nonlinear and linear solution schemes, parameter continuation methods, and linear stability analysis techniques. Our discussion considers computational efficiency, scalability, and some implementation issues of the solution methods. Computational results are presented for a CFD benchmark problem as well as for a number of large-scale, 2D and 3D, engineering transport/reaction applications.

More Details

Large-scale stabilized FE computational analysis of nonlinear steady state transport/reaction systems

Proposed for publication in Computation Methods in Applied Mechanics and Engineering.

Shadid, John N.; Salinger, Andrew G.; Pawlowski, Roger P.; Lin, Paul L.; Hennigan, Gary L.; Tuminaro, Raymond S.; Lehoucq, Richard B.

The solution of the governing steady transport equations for momentum, heat and mass transfer in fluids undergoing non-equilibrium chemical reactions can be extremely challenging. The difficulties arise from both the complexity of the nonlinear solution behavior as well as the nonlinear, coupled, non-symmetric nature of the system of algebraic equations that results from spatial discretization of the PDEs. In this paper, we briefly review progress on developing a stabilized finite element ( FE) capability for numerical solution of these challenging problems. The discussion considers the stabilized FE formulation for the low Mach number Navier-Stokes equations with heat and mass transport with non-equilibrium chemical reactions, and the solution methods necessary for detailed analysis of these complex systems. The solution algorithms include robust nonlinear and linear solution schemes, parameter continuation methods, and linear stability analysis techniques. Our discussion considers computational efficiency, scalability, and some implementation issues of the solution methods. Computational results are presented for a CFD benchmark problem as well as for a number of large-scale, 2D and 3D, engineering transport/reaction applications.

More Details
125 Results
125 Results