Publications

Results 1–25 of 54

Search results

Jump to search filters

ExaWind: Then and now

Crozier, Paul C.; Berger-Vergiat, Luc B.; Dement, David C.; deVelder, Nathaniel d.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Overfelt, James R.; Sakievich, Philip S.; Smith, Timothy A.; Williams, Alan B.; Prokopenko, Andrey; Moser, Robert; Melvin, Jeremy; Sprague, Michael; Bidadi, Shreyas; Brazell, Michael; Brunhart-Lupo, Nicholas; Henry De Frahan, Marc; Rood, Jon; Sharma, Ashesh; Topcuoglu, Ilker; Vijayakumar, Ganesh

Abstract not provided.

Demonstrate multi-turbine simulation with hybrid-structured / unstructured-moving-grid software stack running primarily on GPUs and propose improvements for successful KPP-2

Bidadi, Shreyas; Brazell, Michael; Brunhart-Lupo, Nicholas; Henry De Frahan, Marc T.; Lee, Dong H.; Hu, Jonathan J.; Melvin, Jeremy; Mullowney, Paul; Vijayakumar, Ganesh; Moser, Robert D.; Rood, Jon; Sakievich, Philip S.; Sharma, Ashesh; Williams, Alan B.; Sprague, Michael A.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, Michael A.; Brazell, Michael; Brunhart-Lupo, Nicholas; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen; Vijayakumar, Ganesh; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; deVelder, Nathaniel d.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Overfelt, James R.; Sakievich, Philip S.; Smith, Timothy A.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, John A.; Prokopenko, Andrey; Wilson, Robert; Moser, Robert; Melvin, Jeremy

Abstract not provided.

Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulations

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Mullowney, Paul; Li, Ruipeng; Thomas, Stephen; Ananthan, Shreyas; Sharma, Ashesh; Rood, Jon S.; Williams, Alan B.; Sprague, Michael A.

The U.S. Department of Energy has identified exascale-class wind farm simulation as critical to wind energy scientific discovery. A primary objective of the ExaWind project is to build high-performance, predictive computational fluid dynamics (CFD) tools that satisfy these modeling needs. GPU accelerators will serve as the computational thoroughbreds of next-generation, exascale-class supercomputers. Here, we report on our efforts in preparing the ExaWind unstructured mesh solver, Nalu-Wind, for exascale-class machines. For computing at this scale, a simple port of the incompressible-flow algorithms to GPUs is insufficient. To achieve high performance, one needs novel algorithms that are application aware, memory efficient, and optimized for the latest-generation GPU devices the result of our efforts are unstructured-mesh simulations of wind turbines that can effectively leverage thousands of GPUs. In particular, we demonstrate a first-of-its-kind, incompressible-flow simulation using Algebraic Multigrid solvers that strong scales to more than 4000 GPUs on the Summit supercomputer.

More Details

Harnessing exascale for whole wind farm high-fidelity simulations to improve wind farm efficiency

Crozier, Paul C.; Adcock, Christiane; Ananthan, Shreyas; Berger-Vergiat, Luc B.; Brazell, Michael; Brunhart-Lupo, Nicholas; Henry De Frahan, Marc T.; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy; Moser, Bob; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen; Vijayakumar, Ganesh; Williams, Alan B.; Wilson, Robert; Yamazaki, Ichitaro Y.; Sprague, Michael A.

Abstract not provided.

Demonstrate moving-grid multi-turbine simulations primarily run on GPUs and propose improvements for successful KPP-2

Adcock, Christiane; Ananthan, Shreyas; Berget-Vergiat, Luc; Brazell, Michael; Brunhart-Lupo, Nicholas; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy; Moser, Bob; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen; Vijayakumar, Ganesh; Williams, Alan B.; Wilson, Robert; Yamazaki, Ichitaro Y.; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

FY2021 Q4: Demonstrate moving-grid multi-turbine simulations primarily run on GPUs and propose improvements for successful KPP-2 [Slides]

Adcock, Christiane; Ananthan, Shreyas; Berger-Vergiat, Luc B.; Brazell, Michael; Brunhart-Lupo, Nicholas; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy; Moser, Bob; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen; Vijayakumar, Ganesh; Williams, Alan B.; Wilson, Robert; Yamazaki, Ichitaro Y.; Sprague, Michael

Isocontours of Q-criterion with velocity visualized in the wake for two NREL 5-MW turbines operating under uniform-inflow wind speed of 8 m/s. Simulation performed with the hybrid-Nalu-Wind/AMR-Wind solver.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, Michael; Ananthan, Shreyas; Binyahib, Roba; Brazell, Michael; De Frahan, Marc H.; King, Ryan A.; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen A.; Vijayakumar, Ganesh; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Dement, David C.; deVelder, Nathaniel d.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, William J.; Prokopenko, Andrey; Wilson, Robert V.; Moser, Robert; Melvin, Jeremy; Sitaraman, Jay

Abstract not provided.

Demonstration and performance testing of extreme-resolution simulations with static meshes on Summit (CPU & GPU) for a parked-turbine configuration and an actuator-line (mid-fidelity model) wind farm configuration (ECP-Q4 FY2020 Milestone Report)

Anathan, Sheryas; Williams, Alan B.; Overfelt, James R.; Vo, Johnathan V.; Sakievich, Philip S.; Smith, Timothy A.; Hu, Jonathan J.; Berger-Vergiat, Luc B.; Mullowney, Paul; Thomas, Stephen; Henry De Frahan, Marc; Melvin, Jeremy; Moser, Robert; Brazell, Michael; Sprague, Michael A.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary physics codes in the ExaWind simulation environment are Nalu-Wind, an unstructured-grid solver for the acoustically incompressible Navier-Stokes equations, AMR-Wind, a block-structured-grid solver with adaptive mesh refinement capabilities, and OpenFAST, a wind-turbine structural dynamics solver. The Nalu-Wind model consists of the mass-continuity Poisson-type equation for pressure and Helmholtz-type equations for transport of momentum and other scalars. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as reinitialization of matrices and recomputation of preconditioners is required at every time step. The choice of overset-mesh methodology to model the moving and non-moving parts of the computational domain introduces constraint equations in the elliptic pressure-Poisson solver. The presence of constraints greatly affects the performance of algebraic multigrid preconditioners.

More Details

FY20 ASC IC L2 Milestone 7180: Performance Portability of SIERRA Mechanics Applications to ATS-1 and ATS-2. Executive Summary

Mosby, Matthew D.; Clausen, Jonathan C.; Crane, Nathan K.; Drake, Richard R.; Thomas, Jesse D.; Williams, Alan B.; Pierson, Kendall H.

The overall goal of this work was to accelerate simulations supporting the nuclear deterrence (ND) mission through improved performance of key algorithms in the ASC IC Sierra multi-physics application suite. This work focused on porting and optimizing algorithms for the graphics processing units (GPU) on the second ASC advanced technology system (ATS-2), while maintaining or improving performance on commodity technology systems (CTS) and ATS-1. Furthermore, these algorithmic developments used the ASC developed Kokkos performance portability abstraction library to maintain high performance across platforms using identical code, and enable sustainable reduced-cost migration and performance optimization to emerging hardware.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, M.; Ananthan, S.; Brazell, M.; Glaws, A.; De Frahan, M.; King, R.; Natarajan, M.; Rood, J.; Sharma, A.; Sirydowicz, K.; Thomas, S.; Vijaykumar, G.; Yellapantula, S.; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, J.; Prokopenko, A.; Wilson, R.; Moser, R.; Melvin, J.; Sitaraman, J.

Abstract not provided.

Nalu-Wind and OpenFAST: A high-fidelity modeling and simulation environment for wind energy. Milestone ECP-Q2-FY19

Ananthan, Shreyas; Capone, Luigi; Henry De Frahan, Marc; Hu, Jonathan J.; Melvin, Jeremy; Overfelt, James R.; Sharma, Ashesh; Sitaraman, Jay; Swirydowicz, Katarzyna; Thomas, Stephen; Vijayakumar, Ganesh; Williams, Alan B.; Yellapantula, Shashank; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary physics codes in the ExaWind project are Nalu-Wind, which is an unstructured-grid solver for the acoustically incompressible Navier-Stokes equations, and OpenFAST, which is a whole-turbine simulation code. The Nalu-Wind model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as reinitialization of matrices and recomputation of preconditioners is required at every time step. This milestone represents the culmination of several parallel development activities towards the goal of establishing a full-physics simulation capability for modeling wind turbines operating in turbulent atmospheric inflow conditions. The demonstration simulation performed in this milestone is the first step towards the "ground truth" simulation and includes the following components: neutral atmospheric boundary layer inflow conditions generated using a precursor simulation, a hybrid RANS/LES simulation of the wall-resolved turbine geometry, hybridization of the turbulence equations using a blending function approach to transition from the atmospheric scales to the blade boundary layer scales near the turbine, fluid-structure interaction (FSI) that accounts for the complete set of blade deformations (bending, twisting and pitch motion, yaw and tower displacements) by coupling to a comprehensive turbine dynamics code (OpenFAST). The use of overset mesh methodology for the simulations in this milestone presents a significant deviation from the previous efforts where a sliding mesh approach was employed to model the rotation of the turbine blades. The choice of overset meshes was motivated by the need to handle arbitrarily large deformations of the blade and to allow for blade pitching in the presence of a controller and the ease of mesh generation compared to the sliding mesh approach. FSI and the new timestep algorithm used in the simulations were developed in partnership with the A2e High-Fidelity Modeling project. The individual physics components were verified and validated (V%V) through extensive code-to-code comparisons and with experiments where possible. The detailed V&V efforts provide confidence in the final simulation where these physics models were combined together even though no detailed experimental data is available to perform validation of the final configuration. Taken together, this milestone successfully demonstrated the most advanced simulation to date that has been performed with Nalu-Wind.

More Details

Nalu's Linear System Assembly using Tpetra

Domino, Stefan P.; Williams, Alan B.

The Nalu Exascale Wind application assembles linear systems using data structures provided by the Tpetra package in Trilinos. This note describes the initialization and assembly process. The purpose of this note is to help Nalu developers and maintainers to understand the code surrounding linear system assembly, in order to facilitate debugging, optimizations, and maintenance.

More Details

Deploy threading in Nalu solver stack

Prokopenko, Andrey; Thomas, Stephen; Swirydowicz, Kasia; Ananthan, Shreyas; Hu, Jonathan J.; Williams, Alan B.; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary code in the ExaWind project is Nalu, which is an unstructured-grid solver for the acousticallyincompressible Navier-Stokes equations, and mass continuity is maintained through pressure projection. The model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as re-initialization of matrices and re-computation of preconditioners is required at every time step In this Milestone, we examine the effect of threading on the solver stack performance against flat-MPI results obtained from previous milestones using Haswell performance data full-turbine simulations. Whereas the momentum equations are solved only with the Trilinos solvers, we investigate two algebraic-multigrid preconditioners for the continuity equations: Trilinos/Muelu and HYPRE/BoomerAMG. These two packages embody smoothed-aggregation and classical Ruge-Stiiben AMG methods, respectively. In our FY18 Q2 report, we described our efforts to improve setup and solve of the continuity equations under flat-MPI parallelism. While significant improvement was demonstrated in the solve phase, setup times remained larger than expected. Starting with the optimized settings described in the Q2 report, we explore here simulation performance where OpenMP threading is employed in the solver stack. For Trilinos, threading is acheived through the Kokkos abstraction where, whereas HYPRE/BoomerAMG employs straight OpenMP. We examined results for our mid-resolution baseline turbine simulation configuration (229M DOF). Simulations on 2048 Haswell cores explored the effect of decreasing the number of MPI ranks while increasing the number of threads. Both HYPRE and Trilinos exhibited similar overal solution times, and both showed dramatic increases in simulation time in the shift from MPI ranks to OpenMP threads. This increase is attributed to the large amount of work per MPI rank starting at the single-thread configuration. Decreasing MPI ranks, while increasing threads, may be increasing simulation time due to thread synchronization and start-up overhead contributing to the latency and serial time in the model. These result showed that an MPI+OpenMP parallel decomposition will be more effective as the amount per MPI rank computation per MPI rank decreases and the communication latency increases. This idea was demonstrated in a strong scaling study of our low-resolution baseline model (29M DOF) with the Trilinos-HYPRE configuration. While MPI-only results showed scaling improvement out to about 1536 cores, engaging threading carried scaling improvements out to 4128 cores — roughly 7000 DOF per core. This is an important result as improved strong scaling is needed for simulations to be executed over sufficiently long simulated durations (i.e., for many timesteps). In addition to threading work described above, the team examined solver-performance improvements by exploring communication-overhead in the HYPRE-GMRES implementation through a communicationoptimal- GMRE algorithm (CO-GMRES), and offloading compute-intensive solver actions to GPUs. To those ends, a HYPRE mini-app was allow us to easily test different solver approaches and HYPRE parameter settings without running the entire Nalu code. With GPU acceleration on the Summitdev supercomputer, a 20x speedup was achieved for the overall preconditioner and solver execution time for the mini-app. A study on Haswell processors showed that CO-GMRES provides benefits as one increases MPI ranks.

More Details

Deploy threading in Nalu solver stack

Prokopenko, Andrey; Thomas, Stephen; Swirydowicz, Kasia; Ananthan, Shreyas; Hu, Jonathan J.; Williams, Alan B.; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary code in the ExaWind project is Nalu, which is an unstructured-grid solver for the acoustically-incompressible Navier-Stokes equations, and mass continuity is maintained through pressure projection. The model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as re-initialization of matrices and re-computation of preconditioners is required at every time step.

More Details

Decrease time-to-solution through improved linear-system setup and solve (Milestone Report)

Hu, Jonathan J.; Thomas, Stephen; Dohrmann, Clark R.; Ananthan, Shreyas; Domino, Stefan P.; Williams, Alan B.; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. We describe in this report our efforts to decrease the setup and solution time for the mass-continuity Poisson system with respect to the benchmark timing results reported in FY18 Q1. In particular, we investigate improving and evaluating two types of algebraic multigrid (AMG) preconditioners: Classical Ruge-Stfiben AMG (C-AMG) and smoothed-aggregation AMG (SA-AMG), which are implemented in the Hypre and Trilinos/MueLu software stacks, respectively.

More Details

Decrease time-to-solution through improved linear-system setup and solve

Hu, Jonathan J.; Thomas, Stephen; Dohrmann, Clark R.; Ananthan, Shreyas; Domino, Stefan P.; Williams, Alan B.; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

Deploy production sliding mesh capability with linear solver benchmarking

Domino, Stefan P.; Barone, Matthew F.; Williams, Alan B.; Knaus, Robert C.

Wind applications require the ability to simulate rotating blades. To support this use-case, a novel design-order sliding mesh algorithm has been developed and deployed. The hybrid method combines the control volume finite element methodology (CVFEM) with concepts found within a discontinuous Galerkin (DG) finite element method (FEM) to manage a sliding mesh. The method has been demonstrated to be design-order for the tested polynomial basis (P=1 and P=2) and has been deployed to provide production simulation capability for a Vestas V27 (225 kW) wind turbine. Other stationary and canonical rotating ow simulations are also presented. As the majority of wind-energy applications are driving extensive usage of hybrid meshes, a foundational study that outlines near-wall numerical behavior for a variety of element topologies is presented. Results indicate that the proposed nonlinear stabilization operator (NSO) is an effective stabilization methodology to control Gibbs phenomena at large cell Peclet numbers. The study also provides practical mesh resolution guidelines for future analysis efforts. Application-driven performance and algorithmic improvements have been carried out to increase robustness of the scheme on hybrid production wind energy meshes. Specifically, the Kokkos-based Nalu Kernel construct outlined in the FY17/Q4 ExaWind milestone has been transitioned to the hybrid mesh regime. This code base is exercised within a full V27 production run. Simulation timings for parallel search and custom ghosting are presented. As the low-Mach application space requires implicit matrix solves, the cost of matrix reinitialization has been evaluated on a variety of production meshes. Results indicate that at low element counts, i.e., fewer than 100 million elements, matrix graph initialization and preconditioner setup times are small. However, as mesh sizes increase, e.g., 500 million elements, simulation time associated with \setup-up" costs can increase to nearly 50% of overall simulation time when using the full Tpetra solver stack and nearly 35% when using a mixed Tpetra- Hypre-based solver stack. The report also highlights the project achievement of surpassing the 1 billion element mesh scale for a production V27 hybrid mesh. A detailed timing breakdown is presented that again suggests work to be done in the setup events associated with the linear system. In order to mitigate these initialization costs, several application paths have been explored, all of which are designed to reduce the frequency of matrix reinitialization. Methods such as removing Jacobian entries on the dynamic matrix columns (in concert with increased inner equation iterations), and lagging of Jacobian entries have reduced setup times at the cost of numerical stability. Artificially increasing, or bloating, the matrix stencil to ensure that full Jacobians are included is developed with results suggesting that this methodology is useful in decreasing reinitialization events without loss of matrix contributions. With the above foundational advances in computational capability, the project is well positioned to begin scientific inquiry on a variety of wind-farm physics such as turbine/turbine wake interactions.

More Details

Deploy production sliding mesh capability with linear solver benchmarking (ECP Milestone Report, Ver. 1.0)

Domino, Stefan P.; Barone, Matthew F.; Williams, Alan B.; Knaus, Robert C.; Overfelt, James R.

Wind applications require the ability to simulate rotating blades. To support this use-case, a novel design-order sliding mesh algorithm has been developed and deployed. The hybrid method combines the control volume finite element methodology (CVFEM) with concepts found within a discontinuous Galerkin (DG) finite element method (FEM) to manage a sliding mesh. The method has been demonstrated to be design-order for the tested polynomial basis (P=1 and P=2) and has been deployed to provide production simulation capability for a Vestas V27 (225 kW) wind turbine. Other stationary and canonical rotating flow simulations are also presented. As the majority of wind-energy applications are driving extensive usage of hybrid meshes, a foundational study that outlines near-wall numerical behavior for a variety of element topologies is presented. Results indicate that the proposed nonlinear stabilization operator (NSO) is an effective stabilization methodology to control Gibbs phenomena at large cell Peclet numbers. The study also provides practical mesh resolution guidelines for future analysis efforts. Application-driven performance and algorithmic improvements have been carried out to increase robustness of the scheme on hybrid production wind energy meshes. Specifically, the Kokkos-based Nalu Kernel construct outlined in the FY17/Q4 ExaWind milestone has been transitioned to the hybrid mesh regime. This code base is exercised within a full V27 production run. Simulation timings for parallel search and custom ghosting are presented. As the low-Mach application space requires implicit matrix solves, the cost of matrix reinitialization has been evaluated on a variety of production meshes. Results indicate that at low element counts, i.e., fewer than 100 million elements, matrix graph initialization and preconditioner setup times are small. However, as mesh sizes increase, e.g., 500 million elements, simulation time associated with "setup-up" costs can increase to nearly 50% of overall simulation time when using the full Tpetra solver stack and nearly 35% when using a mixed Tpetra- Hypre-based solver stack. The report also highlights the project achievement of surpassing the 1 billion element mesh scale for a production V27 hybrid mesh. A detailed timing breakdown is presented that again suggests work to be done in the setup events associated with the linear system. In order to mitigate these initialization costs, several application paths have been explored, all of which are designed to reduce the frequency of matrix reinitialization. Methods such as removing Jacobian entries on the dynamic matrix columns (in concert with increased inner equation iterations), and lagging of Jacobian entries have reduced setup times at the cost of numerical stability. Artificially increasing, or bloating, the matrix stencil to ensure that full Jacobians are included is developed with results suggesting that this methodology is useful in decreasing reinitialization events without loss of matrix contributions. With the above foundational advances in computational capability, the project is well positioned to begin scientific inquiry on a variety of wind-farm physics such as turbine/turbine wake interactions.

More Details

Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking

Domino, Stefan P.; Williams, Alan B.; Knaus, Robert C.

The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability on many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.

More Details

Assessing a mini-application as a performance proxy for a finite element method engineering application

Concurrency and Computation. Practice and Experience

Lin, Paul L.; Heroux, Michael A.; Williams, Alan B.; Barrett, Richard F.

The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community. However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.

More Details
Results 1–25 of 54
Results 1–25 of 54