The role to which a realistic inflow turbulent boundary layer (TBL) influences transient and mean large-scale pool fire quantities of interest (QoIs) is numerically investigated. High-fidelity, low-Mach large-eddy simulations that activate low-dissipation, unstructured numerics are conducted using an unsteady flamelet combustion modeling approach with mutiphysics coupling to soot and participating media radiation transport. Three inlet profile configurations are exercised for a large-scale, high-aspect rectangular pool that is oriented perpendicular to the flow direction: a time-varying, TBL inflow profile obtained from a periodic precursor simulation, the time-mean of the transient TBL, and a steady power-law inflow profile that replicates the mean TBL crosswind velocity of 10.0 m/s at a vertical height of 10 m. Results include both qualitative transient flame evolution and quantitative flame shape with ground-level temperature and convective/radiative heat flux profiles. While transient fire events, which are driven by burst-sweep TBL coupling, such as blow-off and reattachment are vastly different in the TBL case (contributing to increased root mean square QoI fluctuation prediction and disparate flame lengths), mean surface QoI magnitudes are similar. Quadrant analysis demonstrates that the TBL configuration modifies burst-sweep phenomena at windward pool locations, while leeward recovery is found. Positive fluctuations of convective heat flux correlate with fast moving fluid away from the pool surface due to intermittent combustion events.
Carbon capture is essential to meeting climate change mitigation goals. One approach currently being commercialized utilizes liquid-based solvents to capture CO2 directly from the atmosphere but is limited by slow absorption of CO2 into the liquid. Improved air/solvent liquid mixing increases CO2 absorption rate, and this increased CO2 absorption efficiency allows for smaller carbon capture systems with lower capital costs and better economic viability. In this project, we study the use of passive micromixers fabricated by metal additive manufacturing. The micromixer’s small-scale surface geometric features perturb and mix the liquid film to enhance mass transfer and CO2 absorption. In this project, we evaluated this hypothesis through computational and experimental studies. Computational investigations focused on developing capabilities to simulate thin film (~ 100μm) fluid flow on rough surfaces. Such thin films are in a surface-tension dominated regime and simulations in this regime are prone to instabilities. Improvements to the Nalu code completed in this project resulted in a 10x timestep stability improvement for these problems.
A direct numerical simulation (DNS) campaign is deployed for a series of confined downward oriented, non-isothermal turbulent impinging jet configurations. A baseline Reynolds number of 9960 is obtained through a precursor DNS pipe flow simulation (Reτ=505). Three jet temperature configurations (confinement height to nozzle diameter of three) enter a cylindrical domain that share ambient and impingement plate temperatures (298.15K). The range of jet temperatures are crafted such that the ratio of inlet to ambient density varies from unity to 0.52, showcasing the effect of density disparity on flow characteristics such as core collapse, radial mixing of momentum and energy, near-wall stagnation behavior, wall-jet profiles, and large-scale vortical structures. Surface quantities provided include mean radial heat flux and wall-shear stress profiles, and heat flux histograms at select radial stations. Results showcase increased radial normal stresses for higher temperature jets that support increased mixing, resulting in large-scale recirculation structures that are smaller, while retaining similar normalized radial wall profiles for shear stress, heat flux and pressure. Radial plots for wall shear stress and Nusselt number showcase strong radial decay as compared to previous configurations that share similar jet and ambient temperatures. For the 373.15 K case, a Gaussian-like histogram for heat fluxes at the impingement plate transitions to a log-normal profile as radial distances increase. In contrast, the 573.15 K configuration displays a bi-modal heat flux characteristic at the impingement plate, and in similar manner to the moderate temperature counterpart, transitions to a log-normal profile at larger radial distances.
We examine the application of neural network-based methods to improve the accuracy of large eddy simulations of incompressible turbulent flows. The networks are trained to learn a mapping between flow features and the subgrid scales, and applied locally and instantaneously—in the same way as traditional physics-based subgrid closures. Models that use only the local resolved strain rate are poorly correlated with the actual subgrid forces obtained from filtering direct numerical simulation data. We see that highly accurate models in a priori testing are inaccurate in forward calculations, owing to the preponderance of numerical errors in implicitly filtered large eddy simulations. A network that accounts for the discretization errors is trained and found to be unstable in a posteriori testing. We identify a number of challenges that the approach faces, including a distribution shift that affects networks that fail to account for numerical errors.
Medium scale (30 cm diameter) methanol pool fires were simulated using the latest fire modeling suite implemented in Sierra/Fuego, a low Mach number multiphysics reacting flow code. The sensitivity of model outputs to various model parameters was studied with the objective of providing model validation. This work also assesses model performance relative to other recently published large eddy simulations (LES) of the same validation case. Two pool surface boundary conditions were simulated. The first was a prescribed fuel mass flux and the second used an algorithm to predict mass flux based on a mass and energy balance at the fuel surface. Gray gas radiation model parameters (absorption coefficients and gas radiation sources) were varied to assess radiant heat losses to the surroundings and pool surface. The radiation model was calibrated by comparing the simulated radiant fraction of the plume to experimental data. The effects of mesh resolution were also quantified starting with a grid resolution representative of engineering type fire calculations and then uniformly refining that mesh in the plume region. Simulation data were compared to experimental data collected at the University of Waterloo and the National Institute of Standards and Technology (NIST). Validation data included plume temperature, radial and axial velocities, velocity temperature turbulent correlations, velocity velocity turbulent correlations, radiant and convective heat fluxes to the pool surface, and plume radiant fraction. Additional analyses were performed in the pool boundary layer to assess simulated flame anchoring and the effect on convective heat fluxes. This work assesses the capability of the latest Fuego physics and chemistry model suite and provides additional insight into pool fire modeling for nonluminous, nonsooting flames.
We develop methods that could be used to qualify a training dataset and a data-driven turbulence closure trained on it. By qualify, we mean identify the kind of turbulent physics that could be simulated by the data-driven closure. We limit ourselves to closures for the Reynolds-Averaged Navier Stokes (RANS) equations. We build on our previous work on assembling feature-spaces, clustering and characterizing Direct Numerical Simulation datasets that are typically pooled to constitute training datasets. In this paper, we develop an alternative way to assemble feature-spaces and thus check the correctness and completeness of our previous method. We then use the characterization of our training dataset to identify if a data-driven turbulence closure learned on it would generalize to an unseen flow configuration – an impinging jet in our case. Finally, we train a RANS closure architected as a neural network, and develop an explanation i.e., an interpretable approximation, using generalized linear mixed-effects models and check whether the explanation resembles a contemporary closure from turbulence modeling.
We develop methods that could be used to qualify a training dataset and a data-driven turbulence closure trained on it. By qualify, we mean identify the kind of turbulent physics that could be simulated by the data-driven closure. We limit ourselves to closures for the Reynolds-Averaged Navier Stokes (RANS) equations. We build on our previous work on assembling feature-spaces, clustering and characterizing Direct Numerical Simulation datasets that are typically pooled to constitute training datasets. In this paper, we develop an alternative way to assemble feature-spaces and thus check the correctness and completeness of our previous method. We then use the characterization of our training dataset to identify if a data-driven turbulence closure learned on it would generalize to an unseen flow configuration – an impinging jet in our case. Finally, we train a RANS closure architected as a neural network, and develop an explanation i.e., an interpretable approximation, using generalized linear mixed-effects models and check whether the explanation resembles a contemporary closure from turbulence modeling.
Machine-learned models, specifically neural networks, are increasingly used as “closures” or “constitutive models” in engineering simulators to represent fine-scale physical phenomena that are too computationally expensive to resolve explicitly. However, these neural net models of unresolved physical phenomena tend to fail unpredictably and are therefore not used in mission-critical simulations. In this report, we describe new methods to authenticate them, i.e., to determine the (physical) information content of their training datasets, qualify the scenarios where they may be used and to verify that the neural net, as trained, adhere to physics theory. We demonstrate these methods with neural net closure of turbulent phenomena used in Reynolds Averaged Navier-Stokes equations. We show the types of turbulent physics extant in our training datasets, and, using a test flow of an impinging jet, identify the exact locations where the neural network would be extrapolating i.e., where it would be used outside the feature-space where it was trained. Using Generalized Linear Mixed Models, we also generate explanations of the neural net (à la Local Interpretable Model agnostic Explanations) at prototypes placed in the training data and compare them with approximate analytical models from turbulence theory. Finally, we verify our findings by reproducing them using two different methods.
The NVBL Viral Fate and Transport Team includes researchers from eleven DOE national laboratories and is utilizing unique experimental facilities combined with physics-based and data-driven modeling and simulation to study the transmission, transport, and fate of SARSCoV-2. The team was focused on understanding and ultimately predicting SARS-CoV-2 viability in varied environments with the goal of rapidly informing strategies that guide the nation’s resumption of normal activities. The primary goals of this project include prioritizing administrative and engineering controls that reduce the risk of SARS-CoV-2 transmission within an enclosed environment; identifying the chemical and physical properties that influence binding of SARS-CoV-2 to common surfaces; and understanding the contribution of environmental reservoirs and conditions on transmission and resurgence of SARS-CoV-2.
A low-Mach, unstructured, large-eddy-simulation-based, unsteady flamelet approach with a generalized heat loss combustion methodology (including soot generation and consumption mechanisms) is deployed to support a large-scale, quiescent, 5-m JP-8 pool fire validation study. The quiescent pool fire validation study deploys solution sensitivity procedures, i.e., the effect of mesh and time step refinement on capturing key fire dynamics such as fingering and puffing, as mesh resolutions approach O(1) cm. A novel design-order, discrete-ordinate-method discretization methodology is established by use of an analytical thermal/participating media radiation solution on both low-order hexahedral and tetrahedral mesh topologies in addition to quadratic hexahedral elements. The coupling between heat losses and the flamelet thermochemical state is achieved by augmenting the unsteady flamelet equation set with a heat loss source term. Soot and radiation source terms are determined using flamelet approaches for the full range of heat losses experienced in fire applications including radiative extinction. The proposed modeling and simulation paradigm are validated using pool surface radiative heat flux, maximum centerline temperature location, and puffing frequency data, all of which are predicted within 10% accuracy. Simulations demonstrate that under-resolved meshes predict an overly conservative radiative heat flux magnitude with improved comparisons as compared to a previously deployed hybrid Reynolds-averaged Navier-Stokes/eddy dissipation concept-based methodology.
This paper explores unsupervised learning approaches for analysis and categorization of turbulent flow data. Single point statistics from several high-fidelity turbulent flow simulation data sets are classified using a Gaussian mixture model clustering algorithm. Candidate features are proposed, which include barycentric coordinates of the Reynolds stress anisotropy tensor, as well as scalar and angular invariants of the Reynolds stress and mean strain rate tensors. A feature selection algorithm is applied to the data in a sequential fashion, flow by flow, to identify a good feature set and an optimal number of clusters for each data set. The algorithm is first applied to Direct Numerical Simulation data for plane channel flow, and produces clusters that are consistent with turbulent flow theory and empirical results that divide the channel flow into a number of regions (viscous sub-layer, log layer, etc). Clusters are then identified for flow over a wavy-walled channel, flow over a bump in a channel, and flow past a square cylinder. Some clusters are closely identified with the anisotropy state of the turbulence, as indicated by the location within the barycentric map of the Reynolds stress tensor. Other clusters can be connected to physical phenomena, such as boundary layer separation and free shear layers. Exemplar points from the clusters, or prototypes, are then identified using a prototype selection method. These exemplars summarize the dataset by a factor of 10 to 1000. The clustering and prototype selection algorithms provide a foundation for physics-based, semi-automated classification of turbulent flow states and extraction of a subset of data points that can serve as the basis for the development of explainable machine-learned turbulence models.
A high-fidelity, low-Mach computational fluid dynamics simulation tool that includes evaporating droplets and variable-density turbulent flow coupling is well-suited to ascertain transmission probability and supports risk mitigation methods development for airborne infectious diseases such as COVID-19. A multi-physics large-eddy simulation-based paradigm is used to explore droplet and aerosol pathogen transport from a synthetic cough emanating from a kneeling humanoid. For an outdoor configuration that mimics the recent open-space social distance strategy of San Francisco, maximum primary droplet deposition distances are shown to approach 8.1 m in a moderate wind configuration with the aerosol plume transported in excess of 15 m. In quiescent conditions, the aerosol plume extends to approximately 4 m before the emanating pulsed jet becomes neutrally buoyant. A dose–response model, which is based on previous SARS coronavirus (SARS-CoV) data, is exercised on the high-fidelity aerosol transport database to establish relative risk at eighteen virtual receptor probe locations.
In response to the global SARS-CoV-2 transmission pandemic, Sandia National Laboratories Rapid Lab-Directed Research and Development (LDRD) COVID-19 initiative has deployed a multi-physics, droplet-laden, turbulent low-Mach simulation tool to model pathogen-containing water droplets that emanate from synthetic human coughing and breathing. The low-Mach turbulent Eulerian/point-particle Lagrangian methodology directly couples mass, momentum, energy, and species to capture droplet evaporation physics that supports the ability to distinguish between droplets that deposit and those that persist in the environment. Additionally, the cough mechanism is modeled as a pulsed-spray with a prescribed log-normal droplet size distribution. Simulations demonstrate direct droplet deposition lengths in excess of three meters while the persistence of a droplet nuclei entrained within a buoyant plume is noted. Including the effect of protective barriers demonstrates effective mitigation of large-droplet transport. For coughs into a protective barrier, jet impingement and large-scale re-circulation can drive droplets vertically and back towards the subject while supporting persistence of droplet nuclei. Simulations in quiescent conditions demonstrate droplet preferential concentrations due to the coupling between vortex ring shedding and the subsequent advection of a series of three-dimensional rings that tilt and rise vertically due to a misalignment between the initial principle vortex trajectory and gravity. These resolved coughing simulations note vortex ring formation, roll-up and breakdown, while entraining droplet nuclei for large distances and time scales.
Sandia National Laboratories currently has 27 COVID-related Laboratory Directed Research & Development (LDRD) projects focused on helping the nation during the pandemic. These LDRD projects cross many disciplines including bioscience, computing & information sciences, engineering science, materials science, nanodevices & microsystems, and radiation effects & high energy density science.
The Sandia National Laboratories (SNL) Large-Scale Computing Initiative (LSCI) milestone required running two parallel simulation codes at scale on the Trinity supercomputer at Los Alamos National Laboratory (LANL) to obtain presentation quality visualization results via in-situ methods. The two simulation codes used were Sandia Parallel Aerosciences Research Code (SPARC) and Nalu, both fluid dynamics codes developed at SNL. The codes were integrated with the ParaView Catalyst in-situ visualization library via the SNL developed Input Output SubSystem (IOSS). The LSCI milestone had a relatively short time-scale for completion of two months. During setup and execution of in-situ visualization for the milestone, there were several challenging issues in the areas of software builds, parallel startup-times, and in the a priori specification of visualizations. This paper will discuss the milestone activities and technical challenges encountered in its completion.
The Nalu Exascale Wind application assembles linear systems using data structures provided by the Tpetra package in Trilinos. This note describes the initialization and assembly process. The purpose of this note is to help Nalu developers and maintainers to understand the code surrounding linear system assembly, in order to facilitate debugging, optimizations, and maintenance.
An implicit, low-dissipation, low-Mach, variable density control volume finite element formulation is used to explore foundational understanding of numerical accuracy for large-eddy simulation applications on hybrid meshes. Detailed simulation comparisons are made between low-order hexahedral, tetrahedral, pyramid, and wedge/prism topologies against a third-order, unstructured hexahedral topology. Using smooth analytical and manufactured low-Mach solutions, design-order convergence is established for the hexahedral, tetrahedral, pyramid, and wedge element topologies using a new open boundary condition based on energy-stable methodologies previously deployed within a finite-difference context. A wide range of simulations demonstrate that low-order hexahedral- and wedge-based element topologies behave nearly identically in both computed numerical errors and overall simulation timings. Moreover, low-order tetrahedral and pyramid element topologies also display nearly the same numerical characteristics. Although the superiority of the hexahedral-based topology is clearly demonstrated for trivial laminar, principally-aligned flows, e.g., a 1x2x10 channel flow with specified pressure drop, this advantage is reduced for non-aligned, turbulent flows including the Taylor–Green Vortex, turbulent plane channel flow (Reτ395), and buoyant flow past a heated cylinder. With the order of accuracy demonstrated for both homogenous and hybrid meshes, it is shown that solution verification for the selected complex flows can be established for all topology types. Although the number of elements in a mesh of like spacing comprised of tetrahedral, wedge, or pyramid elements increases as compared to the hexahedral counterpart, for wall-resolved large-eddy simulation, the increased assembly and residual evaluation computational time for non-hexahedral is offset by more efficient linear solver times. Lastly, most simulation results indicate that modest polynomial promotion provides a significant increase in solution accuracy.
The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. We describe in this report our efforts to decrease the setup and solution time for the mass-continuity Poisson system with respect to the benchmark timing results reported in FY18 Q1. In particular, we investigate improving and evaluating two types of algebraic multigrid (AMG) preconditioners: Classical Ruge-Stfiben AMG (C-AMG) and smoothed-aggregation AMG (SA-AMG), which are implemented in the Hypre and Trilinos/MueLu software stacks, respectively.
A hybrid, design-order sliding mesh algorithm, which uses a control volume finite element method (CVFEM), in conjunction with a discontinuous Galerkin (DG) approach at non-conformal interfaces, is outlined in the context of a low-Mach fluid dynamics equation set. This novel hybrid DG approach is also demonstrated to be compatible with a classic edge-based vertex centered (EBVC) scheme. For the CVFEM, element polynomial, P, promotion is used to extend the low-order P=1 CVFEM method to higher-order, i.e., P=2. An equal-order low-Mach pressure-stabilized methodology, with emphasis on the non-conformal interface boundary condition, is presented. A fully implicit matrix solver approach that accounts for the full stencil connectivity across the non-conformal interface is employed. A complete suite of formal verification studies using the method of manufactured solutions (MMS) is performed to verify the order of accuracy of the underlying methodology. The chosen suite of analytical verification cases range from a simple steady diffusion system to a traveling viscous vortex across mixed-order non-conformal interfaces. Results from all verification studies demonstrate either second- or third-order spatial accuracy and, for transient solutions, second-order temporal accuracy. Significant accuracy gains in manufactured solution error norms are noted even with modest promotion of the underlying polynomial order. The paper also demonstrates the CVFEM/DG methodology on two production-like simulation cases that include an inner block subjected to solid rotation, i.e., each of the simulations include a sliding mesh, non-conformal interface. The first production case presented is a turbulent flow past a high-rate-of-rotation cube (Re, 4000; RPM, 3600) on like and mixed-order polynomial interfaces. The final simulation case is a full-scale Vestas V27 225 kW wind turbine (tower and nacelle omitted) in which a hybrid topology, low-order mesh is used. Both production simulations provide confidence in the underlying capability and demonstrate the viability of this hybrid method for deployment towards high-fidelity wind energy validation and analysis.
The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.
Wind applications require the ability to simulate rotating blades. To support this use-case, a novel design-order sliding mesh algorithm has been developed and deployed. The hybrid method combines the control volume finite element methodology (CVFEM) with concepts found within a discontinuous Galerkin (DG) finite element method (FEM) to manage a sliding mesh. The method has been demonstrated to be design-order for the tested polynomial basis (P=1 and P=2) and has been deployed to provide production simulation capability for a Vestas V27 (225 kW) wind turbine. Other stationary and canonical rotating ow simulations are also presented. As the majority of wind-energy applications are driving extensive usage of hybrid meshes, a foundational study that outlines near-wall numerical behavior for a variety of element topologies is presented. Results indicate that the proposed nonlinear stabilization operator (NSO) is an effective stabilization methodology to control Gibbs phenomena at large cell Peclet numbers. The study also provides practical mesh resolution guidelines for future analysis efforts. Application-driven performance and algorithmic improvements have been carried out to increase robustness of the scheme on hybrid production wind energy meshes. Specifically, the Kokkos-based Nalu Kernel construct outlined in the FY17/Q4 ExaWind milestone has been transitioned to the hybrid mesh regime. This code base is exercised within a full V27 production run. Simulation timings for parallel search and custom ghosting are presented. As the low-Mach application space requires implicit matrix solves, the cost of matrix reinitialization has been evaluated on a variety of production meshes. Results indicate that at low element counts, i.e., fewer than 100 million elements, matrix graph initialization and preconditioner setup times are small. However, as mesh sizes increase, e.g., 500 million elements, simulation time associated with \setup-up" costs can increase to nearly 50% of overall simulation time when using the full Tpetra solver stack and nearly 35% when using a mixed Tpetra- Hypre-based solver stack. The report also highlights the project achievement of surpassing the 1 billion element mesh scale for a production V27 hybrid mesh. A detailed timing breakdown is presented that again suggests work to be done in the setup events associated with the linear system. In order to mitigate these initialization costs, several application paths have been explored, all of which are designed to reduce the frequency of matrix reinitialization. Methods such as removing Jacobian entries on the dynamic matrix columns (in concert with increased inner equation iterations), and lagging of Jacobian entries have reduced setup times at the cost of numerical stability. Artificially increasing, or bloating, the matrix stencil to ensure that full Jacobians are included is developed with results suggesting that this methodology is useful in decreasing reinitialization events without loss of matrix contributions. With the above foundational advances in computational capability, the project is well positioned to begin scientific inquiry on a variety of wind-farm physics such as turbine/turbine wake interactions.
Wind applications require the ability to simulate rotating blades. To support this use-case, a novel design-order sliding mesh algorithm has been developed and deployed. The hybrid method combines the control volume finite element methodology (CVFEM) with concepts found within a discontinuous Galerkin (DG) finite element method (FEM) to manage a sliding mesh. The method has been demonstrated to be design-order for the tested polynomial basis (P=1 and P=2) and has been deployed to provide production simulation capability for a Vestas V27 (225 kW) wind turbine. Other stationary and canonical rotating flow simulations are also presented. As the majority of wind-energy applications are driving extensive usage of hybrid meshes, a foundational study that outlines near-wall numerical behavior for a variety of element topologies is presented. Results indicate that the proposed nonlinear stabilization operator (NSO) is an effective stabilization methodology to control Gibbs phenomena at large cell Peclet numbers. The study also provides practical mesh resolution guidelines for future analysis efforts. Application-driven performance and algorithmic improvements have been carried out to increase robustness of the scheme on hybrid production wind energy meshes. Specifically, the Kokkos-based Nalu Kernel construct outlined in the FY17/Q4 ExaWind milestone has been transitioned to the hybrid mesh regime. This code base is exercised within a full V27 production run. Simulation timings for parallel search and custom ghosting are presented. As the low-Mach application space requires implicit matrix solves, the cost of matrix reinitialization has been evaluated on a variety of production meshes. Results indicate that at low element counts, i.e., fewer than 100 million elements, matrix graph initialization and preconditioner setup times are small. However, as mesh sizes increase, e.g., 500 million elements, simulation time associated with "setup-up" costs can increase to nearly 50% of overall simulation time when using the full Tpetra solver stack and nearly 35% when using a mixed Tpetra- Hypre-based solver stack. The report also highlights the project achievement of surpassing the 1 billion element mesh scale for a production V27 hybrid mesh. A detailed timing breakdown is presented that again suggests work to be done in the setup events associated with the linear system. In order to mitigate these initialization costs, several application paths have been explored, all of which are designed to reduce the frequency of matrix reinitialization. Methods such as removing Jacobian entries on the dynamic matrix columns (in concert with increased inner equation iterations), and lagging of Jacobian entries have reduced setup times at the cost of numerical stability. Artificially increasing, or bloating, the matrix stencil to ensure that full Jacobians are included is developed with results suggesting that this methodology is useful in decreasing reinitialization events without loss of matrix contributions. With the above foundational advances in computational capability, the project is well positioned to begin scientific inquiry on a variety of wind-farm physics such as turbine/turbine wake interactions.
This milestone was focused on deploying and verifying a “sliding-mesh interface,” and establishing baseline timings for blade-resolved simulations of a sub-MW-scale turbine. In the ExaWind project, we are developing both sliding-mesh and overset-mesh approaches for handling the rotating blades in an operating wind turbine. In the sliding-mesh approach, the turbine rotor and its immediate surrounding fluid are captured in a “disk” that is embedded in the larger fluid domain. The embedded fluid is simulated in a coordinate system that rotates with the rotor. It is important that the coupling algorithm (and its implementation) between the rotating and inertial discrete models maintains the accuracy of the numerical methods on either side of the interface, i.e., the interface is “design order.”
The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability on many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.
This report documents work performed using ALCC computing resources granted under a proposal submitted in February 2016, with the resource allocation period spanning the period July 2016 through June 2017. The award allocation was 10.7 million processor-hours at the National Energy Research Scientific Computing Center. The simulations performed were in support of two projects: the Atmosphere to Electrons (A2e) project, supported by the DOE EERE office; and the Exascale Computing Project (ECP), supported by the DOE Office of Science. The project team for both efforts consists of staff scientists and postdocs from Sandia National Laboratories and the National Renewable Energy Laboratory. At the heart of these projects is the open-source computational-fluid-dynamics (CFD) code, Nalu. Nalu solves the low-Mach-number Navier-Stokes equations using an unstructured- grid discretization. Nalu leverages the open-source Trilinos solver library and the Sierra Toolkit (STK) for parallelization and I/O. This report documents baseline computational performance of the Nalu code on problems of direct relevance to the wind plant physics application - namely, Large Eddy Simulation (LES) of an atmospheric boundary layer (ABL) flow and wall-modeled LES of a flow past a static wind turbine rotor blade. Parallel performance of Nalu and its constituent solver routines residing in the Trilinos library has been assessed previously under various campaigns. However, both Nalu and Trilinos have been, and remain, in active development and resources have not been available previously to rigorously track code performance over time. With the initiation of the ECP, it is important to establish and document baseline code performance on the problems of interest. This will allow the project team to identify and target any deficiencies in performance, as well as highlight any performance bottlenecks as we exercise the code on a greater variety of platforms and at larger scales. The current study is rather modest in scale, examining performance on problem sizes of O(100 million) elements and core counts up to 8k cores. This will be expanded as more computational resources become available to the projects.
When faced with a restrictive evaluation budget that is typical of today's highfidelity simulation models, the effective exploitation of lower-fidelity alternatives within the uncertainty quantification (UQ) process becomes critically important. Herein, we explore the use of multifidelity modeling within UQ, for which we rigorously combine information from multiple simulation-based models within a hierarchy of fidelity, in seeking accurate high-fidelity statistics at lower computational cost. Motivated by correction functions that enable the provable convergence of a multifidelity optimization approach to an optimal high-fidelity point solution, we extend these ideas to discrepancy modeling within a stochastic domain and seek convergence of a multifidelity uncertainty quantification process to globally integrated high-fidelity statistics. For constructing stochastic models of both the low-fidelity model and the model discrepancy, we employ stochastic expansion methods (non-intrusive polynomial chaos and stochastic collocation) computed by integration/interpolation on structured sparse grids or regularized regression on unstructured grids. We seek to employ a coarsely resolved grid for the discrepancy in combination with a more finely resolved Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Grid for the low-fidelity model. The resolutions of these grids may be defined statically or determined through uniform and adaptive refinement processes. Adaptive refinement is particularly attractive, as it has the ability to preferentially target stochastic regions where the model discrepancy becomes more complex, i.e., where the predictive capabilities of the low-fidelity model start to break down and greater reliance on the high-fidelity model (via the discrepancy) is necessary. These adaptive refinement processes can either be performed separately for the different grids or within a coordinated multifidelity algorithm. In particular, we present an adaptive greedy multifidelity approach in which we extend the generalized sparse grid concept to consider candidate index set refinements drawn from multiple sparse grids, as governed by induced changes in the statistical quantities of interest and normalized by relative computational cost. Through a series of numerical experiments using statically defined sparse grids, adaptive multifidelity sparse grids, and multifidelity compressed sensing, we demonstrate that the multifidelity UQ process converges more rapidly than a single-fidelity UQ in cases where the variance of the discrepancy is reduced relative to the variance of the high-fidelity model (resulting in reductions in initial stochastic error), where the spectrum of the expansion coefficients of the model discrepancy decays more rapidly than that of the high-fidelity model (resulting in accelerated convergence rates), and/or where the discrepancy is more sparse than the high-fidelity model (requiring the recovery of fewer significant terms).