Publications

26 Results
Skip to search filters

Demonstrate multi-turbine simulation with hybrid-structured / unstructured-moving-grid software stack running primarily on GPUs and propose improvements for successful KPP-2

Bidadi, Shreyas B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Henry de Frahan, Marc T.; Lee, Dong H.; Hu, Jonathan J.; Melvin, Jeremy M.; Mullowney, Paul M.; Vijayakumar, Ganesh V.; Moser, Robert D.; Rood, Jon R.; Sakievich, Philip S.; Sharma, Ashesh S.; Williams, Alan B.; Sprague, Michael A.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

Harnessing exascale for whole wind farm high-fidelity simulations to improve wind farm efficiency

Crozier, Paul C.; Adcock, Christiane A.; Ananthan, Shreyas A.; Berger-Vergiat, Luc B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Henry de Frahan, Marc T.; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy M.; Moser, Bob M.; Mullowney, Paul M.; Rood, Jon R.; Sharma, Ashesh S.; Thomas, Stephen T.; Vijayakumar, Ganesh V.; Williams, Alan B.; Wilson, Robert V.; Yamazaki, Ichitaro Y.; Sprague, Michael S.

Abstract not provided.

FY2021 Q4: Demonstrate moving-grid multi-turbine simulations primarily run on GPUs and propose improvements for successful KPP-2 [Slides]

Adcock, Christiane A.; Ananthan, Shreyas A.; Berger-Vergiat, Luc B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy M.; Moser, Bob M.; Mullowney, Paul M.; Rood, Jon R.; Sharma, Ashesh S.; Thomas, Stephen T.; Vijayakumar, Ganesh V.; Williams, Alan B.; Wilson, Robert V.; Yamazaki, Ichitaro Y.; Sprague, Michael S.

Isocontours of Q-criterion with velocity visualized in the wake for two NREL 5-MW turbines operating under uniform-inflow wind speed of 8 m/s. Simulation performed with the hybrid-Nalu-Wind/AMR-Wind solver.

More Details

Demonstrate moving-grid multi-turbine simulations primarily run on GPUs and propose improvements for successful KPP-2

Adcock, Christiane A.; Ananthan, Shreyas A.; Berget-Vergiat, Luc B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy M.; Moser, Bob M.; Mullowney, Paul M.; Rood, Jon R.; Sharma, Ashesh S.; Thomas, Stephen T.; Vijayakumar, Ganesh V.; Williams, Alan B.; Wilson, Robert V.; Yamazaki, Ichitaro Y.; Sprague, Michael S.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, Michael S.; Ananthan, Shreyas A.; Binyahib, Roba B.; Brazell, Michael B.; de Frahan, Marc H.; King, Ryan N.; Mullowney, Paul M.; Rood, Jon R.; Sharma, Ashesh S.; Thomas, Stephen T.; Vijayakumar, Ganesh V.; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Dement, David C.; deVelder, Nathaniel d.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, William J.; Prokopenko, Andrey P.; Wilson, Robert V.; Moser, &.; Melvin, Jeremy M.; Sitaraman, &.

Abstract not provided.

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, M.S.; Ananthan, S.A.; Brazell, M.x.; Glaws, A.G.; De Frahan, M.D.; King, R.K.; Natarajan, M.N.; Rood, J.R.; Sharma, A.L.; Sirydowicz, K.S.; S., Thomas S.; Vijaykumar, G.V.; Yellapantula, S.Y.; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, J.H.; Prokopenko, A.P.; Wilson, R.W.; Moser, R.M.; Melvin, J.M.; Sitaraman, J S.

Abstract not provided.

Deploy threading in Nalu solver stack

Prokopenko, Andrey P.; Thomas, Stephen T.; Swirydowicz, Kasia S.; Ananthan, Shreyas A.; Hu, Jonathan J.; Williams, Alan B.; Sprague, Michael S.

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary code in the ExaWind project is Nalu, which is an unstructured-grid solver for the acousticallyincompressible Navier-Stokes equations, and mass continuity is maintained through pressure projection. The model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as re-initialization of matrices and re-computation of preconditioners is required at every time step In this Milestone, we examine the effect of threading on the solver stack performance against flat-MPI results obtained from previous milestones using Haswell performance data full-turbine simulations. Whereas the momentum equations are solved only with the Trilinos solvers, we investigate two algebraic-multigrid preconditioners for the continuity equations: Trilinos/Muelu and HYPRE/BoomerAMG. These two packages embody smoothed-aggregation and classical Ruge-Stiiben AMG methods, respectively. In our FY18 Q2 report, we described our efforts to improve setup and solve of the continuity equations under flat-MPI parallelism. While significant improvement was demonstrated in the solve phase, setup times remained larger than expected. Starting with the optimized settings described in the Q2 report, we explore here simulation performance where OpenMP threading is employed in the solver stack. For Trilinos, threading is acheived through the Kokkos abstraction where, whereas HYPRE/BoomerAMG employs straight OpenMP. We examined results for our mid-resolution baseline turbine simulation configuration (229M DOF). Simulations on 2048 Haswell cores explored the effect of decreasing the number of MPI ranks while increasing the number of threads. Both HYPRE and Trilinos exhibited similar overal solution times, and both showed dramatic increases in simulation time in the shift from MPI ranks to OpenMP threads. This increase is attributed to the large amount of work per MPI rank starting at the single-thread configuration. Decreasing MPI ranks, while increasing threads, may be increasing simulation time due to thread synchronization and start-up overhead contributing to the latency and serial time in the model. These result showed that an MPI+OpenMP parallel decomposition will be more effective as the amount per MPI rank computation per MPI rank decreases and the communication latency increases. This idea was demonstrated in a strong scaling study of our low-resolution baseline model (29M DOF) with the Trilinos-HYPRE configuration. While MPI-only results showed scaling improvement out to about 1536 cores, engaging threading carried scaling improvements out to 4128 cores — roughly 7000 DOF per core. This is an important result as improved strong scaling is needed for simulations to be executed over sufficiently long simulated durations (i.e., for many timesteps). In addition to threading work described above, the team examined solver-performance improvements by exploring communication-overhead in the HYPRE-GMRES implementation through a communicationoptimal- GMRE algorithm (CO-GMRES), and offloading compute-intensive solver actions to GPUs. To those ends, a HYPRE mini-app was allow us to easily test different solver approaches and HYPRE parameter settings without running the entire Nalu code. With GPU acceleration on the Summitdev supercomputer, a 20x speedup was achieved for the overall preconditioner and solver execution time for the mini-app. A study on Haswell processors showed that CO-GMRES provides benefits as one increases MPI ranks.

More Details

Decrease time-to-solution through improved linear-system setup and solve

Hu, Jonathan J.; Thomas, Stephen T.; Dohrmann, Clark R.; Ananthan, Shreyas A.; Domino, Stefan P.; Williams, Alan B.; Sprague, Michael S.

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary code in the ExaWind project is Nalu, which is an unstructured-grid solver for the acoustically-incompressible Navier-Stokes equations, and mass continuity is maintained through pressure projection. The model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as re-initialization of matrices and re-computation of preconditioners is required at every time step We describe in this report our efforts to decrease the setup and solution time for the mass-continuity Poisson system with respect to the benchmark timing results reported in FY18 Q1. In particular, we investigate improving and evaluating two types of algebraic multigrid (AMG) preconditioners: Classical Ruge-Stfiben AMG (C-AMG) and smoothed-aggregation AMG (SA-AMG), which are implemented in the Hypre and Trilinos/MueLu software stacks, respectively. Preconditioner performance was optimized through existing capabilities and settings.

More Details

Decrease time-to-solution through improved linear-system setup and solve

Hu, Jonathan J.; Thomas, Stephen T.; Dohrmann, Clark R.; Ananthan, Shreyas A.; Domino, Stefan P.; Williams, Alan B.; Sprague, Michael S.

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

Assessing a mini-application as a performance proxy for a finite element method engineering application

Concurrency and Computation. Practice and Experience

Lin, Paul L.; Heroux, Michael A.; Williams, Alan B.; Barrett, Richard F.

The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community. However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.

More Details

Improving performance via mini-applications

Doerfler, Douglas W.; Crozier, Paul C.; Edwards, Harold C.; Williams, Alan B.; Rajan, Mahesh R.; Keiter, Eric R.; Thornquist, Heidi K.

Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.

More Details

An overview of Trilinos

Heroux, Michael A.; Kolda, Tamara G.; Long, Kevin R.; Hoekstra, Robert J.; Pawlowski, Roger P.; Phipps, Eric T.; Salinger, Andrew G.; Williams, Alan B.; Heroux, Michael A.; Hu, Jonathan J.; Lehoucq, Richard B.; Thornquist, Heidi K.; Tuminaro, Raymond S.; Willenbring, James M.; Bartlett, Roscoe B.; Howle, Victoria E.

The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. In particular, our goal is to develop parallel solver algorithms and libraries within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific applications. Our emphasis is on developing robust, scalable algorithms in a software framework, using abstract interfaces for flexible interoperability of components while providing a full-featured set of concrete classes that implement all abstract interfaces. Trilinos uses a two-level software structure designed around collections of packages. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Packages exist underneath the Trilinos top level, which provides a common look-and-feel, including configuration, documentation, licensing, and bug-tracking. Trilinos packages are primarily written in C++, but provide some C and Fortran user interface support. We provide an open architecture that allows easy integration with other solver packages and we deliver our software to the outside community via the Gnu Lesser General Public License (LGPL). This report provides an overview of Trilinos, discussing the objectives, history, current development and future plans of the project.

More Details
26 Results
26 Results