Computational Science Careers at US DOE National Laboratories
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
We present an effort to port the nonhydrostatic atmosphere dynamical core of the Energy Exascale Earth System Model (E3SM) to efficiently run on a variety of architectures, including conventional CPU, many-core CPU, and GPU. We specifically target cloud-resolving resolutions of 3 km and 1 km. To express on-node parallelism we use the C++ library Kokkos, which allows us to achieve a performance portable code in a largely architecture-independent way. Our C++ implementation is at least as fast as the original Fortran implementation on IBM Power9 and Intel Knights Landing processors, proving that the code refactor did not compromise the efficiency on CPU architectures. On the other hand, when using the GPUs, our implementation is able to achieve 0.97 Simulated Years Per Day, running on the full Summit supercomputer. To the best of our knowledge, this is the most achieved to date by any global atmosphere dynamical core running at such resolutions.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Geoscientific Model Development
We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single-and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.
International Journal of High Performance Computing Applications
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This article presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We present performance results for the Aeras global atmosphere dynamical core module in Albany. Numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.
Geoscientific Model Development
We describe and evaluate version 2.1 of the Community Ice Sheet Model (CISM). CISM is a parallel, 3-D thermomechanical model, written mainly in Fortran, that solves equations for the momentum balance and the thickness and temperature evolution of ice sheets. CISM's velocity solver incorporates a hierarchy of Stokes flow approximations, including shallow-shelf, depth-integrated higher order, and 3-D higher order. CISM also includes a suite of test cases, links to third-party solver libraries, and parameterizations of physical processes such as basal sliding, iceberg calving, and sub-ice-shelf melting. The model has been verified for standard test problems, including the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) experiments, and has participated in the initMIP-Greenland initialization experiment. In multimillennial simulations with modern climate forcing on a 4 km grid, CISM reaches a steady state that is broadly consistent with observed flow patterns of the Greenland ice sheet. CISM has been integrated into version 2.0 of the Community Earth System Model, where it is being used for Greenland simulations under past, present, and future climates. The code is open-source with extensive documentation and remains under active development.
Journal of Advances in Modeling Earth Systems
This work documents the first version of the U.S. Department of Energy (DOE) new Energy Exascale Earth System Model (E3SMv1). We focus on the standard resolution of the fully coupled physical model designed to address DOE mission-relevant water cycle questions. Its components include atmosphere and land (110-km grid spacing), ocean and sea ice (60 km in the midlatitudes and 30 km at the equator and poles), and river transport (55 km) models. This base configuration will also serve as a foundation for additional configurations exploring higher horizontal resolution as well as augmented capabilities in the form of biogeochemistry and cryosphere configurations. The performance of E3SMv1 is evaluated by means of a standard set of Coupled Model Intercomparison Project Phase 6 (CMIP6) Diagnosis, Evaluation, and Characterization of Klima simulations consisting of a long preindustrial control, historical simulations (ensembles of fully coupled and prescribed SSTs) as well as idealized CO2 forcing simulations. The model performs well overall with biases typical of other CMIP-class models, although the simulated Atlantic Meridional Overturning Circulation is weaker than many CMIP-class models. While the E3SMv1 historical ensemble captures the bulk of the observed warming between preindustrial (1850) and present day, the trajectory of the warming diverges from observations in the second half of the twentieth century with a period of delayed warming followed by an excessive warming trend. Using a two-layer energy balance model, we attribute this divergence to the model's strong aerosol-related effective radiative forcing (ERFari+aci = −1.65 W/m2) and high equilibrium climate sensitivity (ECS = 5.3 K).
Abstract not provided.
Abstract not provided.
Geoscientific Model Development
We introduce MPAS-Albany Land Ice (MALI) v6.0, a new variable-resolution land ice model that uses unstructured Voronoi grids on a plane or sphere. MALI is built using the Model for Prediction Across Scales (MPAS) framework for developing variable-resolution Earth system model components and the Albany multi-physics code base for the solution of coupled systems of partial differential equations, which itself makes use of Trilinos solver libraries. MALI includes a three-dimensional first-order momentum balance solver (Blatter-Pattyn) by linking to the Albany-LI ice sheet velocity solver and an explicit shallow ice velocity solver. The evolution of ice geometry and tracers is handled through an explicit first-order horizontal advection scheme with vertical remapping. The evolution of ice temperature is treated using operator splitting of vertical diffusion and horizontal advection and can be configured to use either a temperature or enthalpy formulation. MALI includes a mass-conserving subglacial hydrology model that supports distributed and/or channelized drainage and can optionally be coupled to ice dynamics. Options for calving include eigencalving
, which assumes that the calving rate is proportional to extensional strain rates. MALI is evaluated against commonly used exact solutions and community benchmark experiments and shows the expected accuracy. Results for the MISMIP3d benchmark experiments with MALI's Blatter-Pattyn solver fall between published results from Stokes and L1L2 models as expected. We use the model to simulate a semi-realistic Antarctic ice sheet problem following the initMIP protocol and using 2 km resolution in marine ice sheet regions. MALI is the glacier component of the Energy Exascale Earth System Model (E3SM) version 1, and we describe current and planned coupling to other E3SM components.
.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Springer Series in Materials Science series Computational Materials, Chemistry, and Biochemistry: From Bold Initiatives to the Last Mile
This paper describes our work over the past few years to use tools from quantum chemistry to describe electronic structure of nanoelectronic devices. These devices, dubbed "artificial atoms", comprise a few electrons, con ned by semiconductor heterostructures, impurities, and patterned electrodes, and are of intense interest due to potential applications in quantum information processing, quantum sensing, and extreme-scale classical logic. We detail two approaches we have employed: nite-element and Gaussian basis sets, exploring the interesting complications that arise when techniques that were intended to apply to atomic systems are instead used for artificial, solid-state devices.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Scientific Computing
A multigrid method is proposed that combines ideas from matrix dependent multigrid for structured grids and algebraic multigrid for unstructured grids. It targets problems where a three-dimensional mesh can be viewed as an extrusion of a two-dimensional, unstructured mesh in a third dimension. Our motivation comes from the modeling of thin structures via finite elements and, more specifically, the modeling of ice sheets. Extruded meshes are relatively common for thin structures and often give rise to anisotropic problems when the thin direction mesh spacing is much smaller than the broad direction mesh spacing. Within our approach, the first few multigrid hierarchy levels are obtained by applying matrix dependent multigrid to semicoarsen in a structured thin direction fashion. After sufficient structured coarsening, the resulting mesh contains only a single layer corresponding to a two-dimensional, unstructured mesh. Algebraic multigrid can then be employed in a standard manner to create further coarse levels, as the anisotropic phenomena is no longer present in the single layer problem. The overall approach remains fully algebraic, with the minor exception that some additional information is needed to determine the extruded direction. Furthermore, this facilitates integration of the solver with a variety of different extruded mesh applications.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The Next Generation Global Atmosphere Model LDRD project developed a suite of atmosphere models: a shallow water model, an x-z hydrostatic model, and a 3D hydrostatic model, by using Albany, a finite element code. Albany provides access to a large suite of leading-edge Sandia high-performance computing technologies enabled by Trilinos, Dakota, and Sierra. The next-generation capabilities most relevant to a global atmosphere model are performance portability and embedded uncertainty quantification (UQ). Performance portability is the capability for a single code base to run efficiently on diverse set of advanced computing architectures, such as multi-core threading or GPUs. Embedded UQ refers to simulation algorithms that have been modified to aid in the quantifying of uncertainties. In our case, this means running multiple samples for an ensemble concurrently, and reaping certain performance benefits. We demonstrate the effectiveness of these approaches here as a prelude to introducing them into ACME.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
International Journal of HPC Applications
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to execute correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The Finite Element Method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industry applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library of Trilinos. We present performance results for two different physics simulations modules in Albany: the Aeras global atmosphere dynamical code and the FELIX land-ice solver. As a result, numerical experiments show that our single code implementation gives reasonable performance across two multi-core/many-core architectures: NVIDIA GPUs and multi-core CPUs.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Procedia Computer Science
We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditioner results in faster linear solve times but the ILU preconditioner exhibits better scalability. A weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. Here, we show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We present the Quantum Computer Aided Design (QCAD) simulator that targets modeling quantum devices, particularly silicon double quantum dots (DQDs) developed for quantum qubits. The simulator has three differentiating features: (i) its core contains nonlinear Poisson, effective mass Schrodinger, and Configuration Interaction solvers that have massively parallel capability for high simulation throughput, and can be run individually or combined self-consistently for 1D/2D/3D quantum devices; (ii) the core solvers show superior convergence even at near-zero-Kelvin temperatures, which is critical for modeling quantum computing devices; (iii) it couples with an optimization engine Dakota that enables optimization of gate voltages in DQDs for multiple desired targets. The Poisson solver includes MaxwellBoltzmann and Fermi-Dirac statistics, supports Dirichlet, Neumann, interface charge, and Robin boundary conditions, and includes the effect of dopant incomplete ionization. The solver has shown robust nonlinear convergence even in the milli-Kelvin temperature range, and has been extensively used to quickly obtain the semiclassical electrostatic potential in DQD devices. The self-consistent Schrodinger-Poisson solver has achieved robust and monotonic convergence behavior for 1D/2D/3D quantum devices at very low temperatures by using a predictor-correct iteration scheme. The QCAD simulator enables the calculation of dot-to-gate capacitances, and comparison with experiment and between solvers. It is observed that computed capacitances are in the right ballpark when compared to experiment, and quantum confinement increases capacitance when the number of electrons is fixed in a quantum dot. In addition, the coupling of QCAD with Dakota allows to rapidly identify which device layouts are more likely leading to few-electron quantum dots. Very efficient QCAD simulations on a large number of fabricated and proposed Si DQDs have made it possible to provide fast feedback for design comparison and optimization.
ACM Transaction on Mathematical Software
Abstract not provided.
Abstract not provided.
Journal of Applied Physics
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Over the past few years, we have defined and gone a long ways towards implementing a component-based strategy for building scientific application codes. We have asserted that this approach offers significant advantages over a model of writing project-based application codes. There are now several technical and programmatic successes that validate these claims. Not only are there net benefits to code projects that follow this strategy, but also the most striking gains are for the long-term impact and productivity of our computational science organizations.
Abstract not provided.
Abstract not provided.
Proposed for publication in Communications in Computational Physics.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computational Electronics (IWCE), 2012 15th International Workshop on
We present the Quantum Computer Aided Design (QCAD) simulator that targets modeling quantum devices, particularly Si double quantum dots (DQDs) developed for quantum computing. The simulator core includes Poisson, Schrodinger, and Configuration Interaction solvers which can be run individually or combined self-consistently. The simulator is built upon Sandia-developed Trilinos and Albany components, and is interfaced with the Dakota optimization tool. It is being developed for seamless integration, high flexibility and throughput, and is intended to be open source. The QCAD tool has been used to simulate a large number of fabricated silicon DQDs and has provided fast feedback for design comparison and optimization.
Abstract not provided.
Scientific Programming
An approach for incorporating embedded simulation and analysis capabilities in complex simulation codes through template-based generic programming is presented. This approach relies on templating and operator overloading within the C++ language to transform a given calculation into one that can compute a variety of additional quantities that are necessary for many state-of-the-art simulation and analysis algorithms. An approach for incorporating these ideas into complex simulation codes through general graph-based assembly is also presented. These ideas have been implemented within a set of packages in the Trilinos framework and are demonstrated on a simple problem from chemical engineering. © 2012 - IOS Press and the authors. All rights reserved.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.