The CCR enterprise is closely tied to the laboratories’ broader set of missions and strategies. We share responsibility within Sandia as stewards of important capabilities for the nation in high-strain-rate physics, scientific visualization, optimization, uncertainty quantification, scalable solvers, inverse methods, and computational materials. We also leverage our core technologies to execute projects through various partnerships, such as Cooperative Research and Development Agreements (CRADAs) and Strategic Partnerships, as well as partnerships with universities.
The Advanced Tri-lab Software Environment (ATSE) is an effort led by Sandia in partnership
The Advanced Tri-lab Software Environment (ATSE) is an effort led by Sandia in partnership with Lawrence Livermore National Laboratory and Los Alamos National Laboratory to build an open, modular, extensible, community-engaged, vendor-adaptable software ecosystem that enables the prototyping of new technologies for improving the ASC computing environment.
Albany is an implicit, unstructured grid, finite element code for the solution and analysis of partial differential equations. Albany is the main demonstration application of the AgileComponents software development strategy at Sandia. It is a PDE code that strives to be built almost entirely from functionality contained within reusable libraries (such as Trilinos/STK/Dakota/PUMI). Albany plays a large role in demonstrating and maturing functionality of new libraries, and also in the interfaces and interoperability between these libraries. It also serves to expose gaps in our coverage of capabilities and interface design.
In addition to the component-based code design strategy, Albany also attempts to showcase the concept of Analysis Beyond Simulation, where an application code is developed up from for a design and analysis mission. All Albany applications are born with the ability to perform sensitivity analysis, stability analysis, optimization, and uncertainty quantification, with a clean interfaces for exposing design parameters for manipulation by analysis algorithms.
Albany also attempts to be a model for software engineering tools and processes, so that new research codes can adopt the Albany infrastructure as a good starting point. This effort involves a close collaboration with the 1400 SEMS team.
The Albany code base is host to several application projects, notably:
- LCM (Laboratory for Computational Mechanics) [PI J. Ostien]: A platform for research in finite deformation mechanics, including algorithms for failure and fracture modeling, discretizations, implicit solution algorithms, advanced material models, coupling to particle-based methods, and efficient implementations on new architectures.
- QCAD (Quantum Computer Aided Design) [PI Nielsen]: A code to aid in the design of quantum dots from built in doped silicon devices. QCAD solves the coupled Schoedinger-Poisson system. When wrapped in Dakota, optimal operating conditions can be found.
- FELIX (Finite Element for Land Ice eXperiments) [PI Salinger]: This application solves variants of a nonlinear Stokes flow for simulating the evolution of Ice Sheets. In particular we are modeling the Greenland and Antarctic Ice Sheets for modeling effects of Climate change, in particularly their influence on Sea-Level Rise. Will be linked into ACME.
- Aeras [PI Spotz]: A component-based approach to atmospheric modeling, where advanced analysis algorithms and design for efficient code on new architectures are built into the code.
In addition, Albany is used as a platform for algorithmic research:
- FASTMath SciDAC project: We are developing a capability for adaptive mesh refinement within an unstructured grid application, in collaboration with Mark Shephard’s group at the SCOREC center at RPI.
- Embedded UQ: Research into embedded UQ algorithms led by Eric Phipps often uses Albany as a demonstration platform.
- Performance Portable Kernels for new architectures: Albany is serving as a research vehicle for programming finite element assembly kernels using the Trilinos/Kokkos programming model and library.
The ALEGRA application is targeted at the simulation of high strain rate, magnetohydrodynamic, electromechanic and high energy density physics phenomena for the U.S. defense and energy programs. Research and development in advanced methods, including code frameworks, large scale inline meshing, multiscale lagrangian hydrodynamics, resistive magnetohydrodynamic methods, material interface reconstruction, and code verification and validation, keeps the software on the cutting edge of high performance computing.
COINFLIPS – CO-designed Improved Neural Foundations Leveraging Inherent Physics Stochasticity – is a DOE Office of Science Co-Design in Microelectronics project that aims to develop a novel computing paradigm for probabilistic computing that leverages stochastic devices and brain-inspired architecture and algorithm principles.
The brain has effectively proven a powerful inspiration for the development of computing architectures in which processing is tightly integrated with memory, communication is event-driven, and analog computation can be performed at scale. These neuromorphic systems increasingly show an ability to improve the efficiency and speed of scientific computing and artificial intelligence applications. We propose that the brain’s ubiquitous stochasticity represents an additional source of inspiration for expanding the reach of neuromorphic computing to probabilistic applications. To date, many efforts exploring probabilistic computing have focused primarily on one scale of the microelectronics stack, such as implementing probabilistic algorithms on deterministic hardware or developing probabilistic devices and circuits with the expectation that they will be leveraged by eventual probabilistic architectures.
In COINFLIPS, we are exploring a co-design vision by which large numbers of devices, such as magnetic tunnel junctions and tunnel diodes, can be operated in a stochastic regime and incorporated into a scalable neuromorphic architecture that can impact a number of probabilistic computing applications, such as Monte Carlo simulations and Bayesian neural networks.
COINFLIPS has partners at New York University, University of Texas at Austin, University of Tennessee, Oak Ridge National Laboratory, and Temple University.
COMpact, performance-POrtable, SEmi-Lagrangian methods
Compose develops semi-Lagrangian algorithms tailored for parallel computing on heterogenous architectures, with primary applications for the E3SM Atmosphere model and the MPAS Ocean model.
Key components: Our methods find communication-efficient means for providing accuracy, mass conservation, shape preservation, and tracer-continuity consistency.
The CANGA project is an integrated approach to develop new coupler services comprising (i) application of tasking programming models to develop a flexible design approach for the coupled E3SM model; (ii) formulation of mathematically sound, stable and accurate coupling strategies for E3SM components; and (iii) development of advanced conservative interpolation (remap) services in support of (i) and (ii). Tasking programming models break up a complex computational model into many smaller tasks with associated data dependencies that can be eciently scheduled across a computing architecture. This approach provides a number of advantages for managing model complexity and improving computational performance on NGPs through enhanced parallelism, fault tolerance and the ability to better map tasks to architectural elements. New coupling strategies and associated analyses will improve how we couple components together and advance the solution in time, providing a sound mathematical basis and optimal techniques that eliminate artifacts associated with previous coupling choices. New remapping services will improve the way E3SM subcomponents exchange information by preserving properties of fields, improving the accuracy of the remapping and allowing the methods to change in space and time to adapt to the evolving solution. The proposed effort will improve E3SM performance on advanced architectures and enable a broad spectrum of mathematical coupling strategies that will enhance our ability to project future changes to the Earth system.
E3SM is an unprecedented collaboration among seven National Laboratories, the National Center for Atmospheric Research, four academic institutions and one private-sector company to develop and apply the most complete leading-edge climate and Earth system models to the most challenging and demanding climate change research imperatives. It is the only major national modeling project designed to address U.S. Department of Energy (DOE) mission needs and specifically targets DOE Leadership Computing Facility resources now and in the future, because DOE researchers do not have access to other major climate computing centers. A major motivation for the E3SM project is the coming paradigm shift in computing architectures and their related programming models as capability moves into the Exascale era. DOE, through its science programs and early adoption of new computing architectures, traditionally leads many scientific communities, including climate and Earth system simulation, through these disruptive changes in computing.
Container computing has revolutionized how many industries and enterprises develop and deploy software and services. Recently, this model has gained traction in the High Performance Computing (HPC) community through enabling technologies including Charliecloud, Shifter, Singularity, and Docker. In this same trend, container-based computing paradigms have gained significant interest within the DOE/NNSA Exascale Computing Project (ECP). While containers provide greater software flexibility, reliability, ease of deployment, and portability for users, there are still several challenges in this area for Exascale.
The goal of the ECP Supercomputing Containers Project (called Supercontainers) is to use a multi-level approach to accelerate adoption of container technologies for Exascale, ensuring that HPC container runtimes will be scalable, interoperable, and integrated into Exascale supercomputing across DOE. The core components of the SuperContainer project focus on foundational system software research needed for ensuring containers can be deployed at scale, enhanced user and developer support for enabling ECP Application Development (AD) and Software Technology (ST) projects looking to utilize containers, validated container runtime interoperability, and both vendor and E6 facilities system software integration with containers.
The FASTMath SciDAC Institute develops and deploys scalable mathematical algorithms and software tools for reliable simulation of complex physical phenomena and collaborates with application scientists to ensure the usefulness and applicability of FASTMath technologies.
FASTMath is a collaboration between Argonne National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Massachusetts Institute of Technology, Rensselaer Polytechnic Institute, Sandia National Laboratories, Southern Methodist University, and University of Southern California. Dr. Esmond Ng, LBNL, leads the FASTMath project.
The requirements to achieve Exascale computing place great demands on existing hardware and software solutions including the need to be significantly more energy efficient, to utilize much higher degrees of parallelism, to be resilient to hardware failures, to be tolerant to higher latencies to memory and remote communication partners as well as many other factors. The codesign process seeks to bring application and domain specialists together with computer scientists, hardware architectures and machine designers to ensure that an iterative optimization process can be established seeking to balance the many trade offs and benefits of new processor, memory or network features with novel approaches in system software, programming models, numerical methods and application/algorithm design. Sandia is one of the leading Department of Energy laboratories conducting codesign activities linking production engineering and physics specialists with experts in linear algebra, system software, computer scientists and leading industry vendors.
Hobbes was a Sandia-led collaboration between four national laboratories and eight universities supported by the DOE Office of Science Advanced Scientific Computing Research program office. The goal of this three-year project was to deliver an operating system for future extreme-scale parallel computing platforms that will address the major technical challenges of energy efficiency, managing massive parallelism and deep memory hierarchies, and providing resilience in the presence of increasing failures. Our approach was to enable application composition through lightweight virtualization. Application composition is a critical capability that will be the foundation of the way extreme-scale systems must be used in the future. The tighter integration of modeling and simulation capability with analysis and the increasing complexity of application workflows demand more sophisticated machine usage models and new system-level services. Ensemble calculations for uncertainty quantification, large graph analytics, multi-materials and multi-physics applications are just a few examples that are driving the need for these new system software interfaces and mechanisms for managing memory, network, and computational resources. Rather than providing a single unified operating system and runtime system that supports several parallel programming models, Hobbes leveraged lightweight virtualization to provide the flexibility to construct and efficiently execute custom OS/R environments. Hobbes extended previous work on the Kitten lightweight operating system and the Palacios lightweight virtual machine monitor.
HPC resource allocation consists of a pipeline of methods by which distributed-memory work is assigned to distributed-memory resources to complete that work. This pipeline spans both system and application level software. At the system level, it consists of scheduling and allocation. At the application level, broadly speaking, it consists of discretization (meshing), partitioning, and task mapping. Scheduling, given requests for resources and available resources, decides which request will be assigned resources next or when a request will be assigned resources. When a request is granted, allocation decides which specific resources will be assigned to that request. For the application, HPC resource allocation begins with the discretization and partitioning of the work into a distributed-memory model and ends with the task mapping that matches the allocated resources to the partitioned work. Additionally, network architecture and routing have a strong impact on HPC resource allocation. Each of these problems is solved independently and makes assumptions about how the other problems are solved. We have worked in all of these areas and have recently begun work to combine some of them, in particular allocation and task mapping. We have used analysis, simulation, and real system experiments in this work. Techniques specific to any particular application have not been considered in this work.
The IDEAS Project is intent on improving scientific productivity by qualitatively changing scientific software developer productivity, enabling a fundamentally different attitude to creating and supporting computational science and engineering (CSE) applications.
We are creating an extreme-scale scientific software development ecosystem composed of high-quality, reusable CSE software components and libraries; a collection of best practices, processes and tools; and substantial outreach mechanisms for promoting and disseminating productivity improvements. We intend to improve CSE productivity by enabling better, faster and cheaper CSE application capabilities for extreme-scale computing.
Transforming and decarbonizing the world’s energy systems to make them environmentally sustainable while maintaining high reliability and low cost is a task that requires the very best computational and simulation capabilities to examine a complete range of technology options, ensure that the best choices are made, and to support their rapid and effective implementation.
The Institute for Design of Advanced Energy Systems (IDAES) was originated to bring the most advanced modeling and optimization capabilities to these challenges. The resulting IDAES integrated platform utilizes the most advanced computational algorithms to enable the design and optimization of complex, interacting energy and process systems from individual plant components to the entire electrical grid.
IDAES is led by the National Energy Technology Laboratory (NETL) with participants from Lawrence Berkeley National Laboratory (LBNL), Sandia National Laboratories (SNL), Carnegie-Mellon University, West Virginia University, University of Notre Dame, and Georgia Institute of Technology.
The IDAES leadership team is:
- David C. Miller, Technical Director
- Anthony Burgard, NETL PI
- Deb Agarwal, LBNL PI
- John Siirola, SNL PI
Kitten is a current-generation lightweight kernel (LWK) compute node operating system designed for large-scale parallel computing systems. Kitten is the latest in a long line of successful LWKs, including SUNMOS, Puma, Cougar, and Catamount. Kitten distinguishes itself from these prior LWKs by providing a Linux-compatible user environment, a more modern and extendable open-source codebase, and a virtual machine monitor capability via Palacios that allows full-featured guest operating systems to be loaded on-demand.
Modern high performance computing (HPC) nodes have diverse and heterogeneous types of cores and memory. For applications and domain-specific libraries/languages to scale, port, and perform well on these next generation architectures, their on-node algorithms must be re-engineered for thread scalability and performance portability. The Kokkos programming model and its C++ library implementation helps HPC applications and domain libraries implement intra-node thread-scalable algorithms that are performance portable across diverse manycore architectures such as multicore CPUs, Intel Xeon Phi, NVIDIA GPU, and AMD GPU.
This research, development, and deployment project advances the Kokkos programming model with new intra-node parallel algorithm abstractions, implements these abstractions in the Kokkos library, and supports applications’ and domain libraries’ effective use of Kokkos through consulting and tutorials. The project fosters numerous internal and external collaborations, especially with the ISO/C++ language standard committee to promote Kokkos abstractions into future ISO/C++ language standards. Kokkos is part of the DOE Exascale Computing Project.
Mantevo is a multi-faceted application performance project. It provides application performance proxies known as miniapps. Miniapps combine some or all of the dominant numerical kernels contained in an actual stand-alone application. Miniapps include libraries wrapped in a test driver providing representative inputs. They may also be hard-coded to solve a particular test case so as to simplify the need for parsing input files and mesh descriptions. Mini apps range in scale from partial, performance-coupled components of the application to a simplified representation of a complete execution path through the application.
MR-MPI is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing.
MapReduce is the programming paradigm, popularized by Google, which is widely used for processing large data sets in parallel. Its salient feature is that if a task can be formulated as a MapReduce, the user can perform it in parallel without writing any parallel code. Instead the user writes serial functions (maps and reduces) which operate on portions of the data set independently. The data-movement and other necessary parallel operations can be performed in an application-independent fashion, in this case by the MR-MPI library.
The MR-MPI library was developed to solve informatics problems on traditional distributed-memory parallel computers. It includes C++ and C interfaces callable from most hi-level languages, and also a Python wrapper and our own OINK scripting wrapper, which can be used to develop and chain MapReduce operations together. MR-MPI and OINK are open-source codes, distributed freely under the terms of the modified Berkeley Software Distribution (BSD) license.
MESQUITE is a linkable software library that applies a variety of node-movement algorithms to improve the quality and/or adapt a given mesh. Mesquite uses advanced smoothing and optimization to:
- Untangle meshes,
- Provide local size control,
- Improve angles, orthogonality, and skew,
- Increase minimum edge-lengths for increased time-steps,
- Improve mesh smoothness,
- Perform anisotropic smoothing,
- Improve surface meshes, adapt to surface curvature,
- Improve hybrid meshes (including pyramids & wedges),
- Smooth meshes with hanging nodes,
- Maintain quality of moving and/or deforming meshes,
- Perform ALE rezoning,
- Improve mesh quality on and near boundaries,
- Improve transitions across internal boundaries,
- Align meshes with vector fields, and
- R-adapt meshes to solutions using error estimates.
Mesquite improves surface or volume meshes which are structured, unstructured, hybrid, or non-comformal. A variety of element types are permitted. Mesquite is designed to be as efficient as possible so that large meshes can be improved.
Portals is an interconnect API intended to allow scalable, high-performance network communication between nodes of a large-scale parallel computing system. Portals is based on a building blocks approach that enables multiple upper-level protocols, such as MPI and SHMEM, to be used simultaneously within a process. This approach also encapsulates important semantics that can be offloaded to a network interface controller (NIC) to optimize performance-critical functionality. Previous generations of Portals have been deployed on large-scale production systems, including the Intel ASCI Red machine and Cray’s SeaStar interconnect for their XT product line. The current generation API is being used to enable advanced NIC architecture research for future extreme-scale systems.
Achieving practical exascale supercomputing will require massive increases in energy efficiency. The bulk of this improvement will likely be derived from hardware advances such as improved semiconductor device technologies and tighter integration, hopefully resulting in more energy efficient computer architectures. Still, software will have an important role to play. With every generation of new hardware, more power measurement and control capabilities are exposed. Many of these features require software involvement to maximize feature benefits. This trend will allow algorithm designers to add power and energy efficiency to their optimization criteria. Similarly, at the system level, opportunities now exist for energy-aware scheduling to meet external utility constraints such as time of day cost charging and power ramp rate limitations. Finally, future architectures might not be able to operate all components at full capability for a range of reasons including temperature considerations or power delivery limitations. Software will need to make appropriate choices about how to allocate the available power budget given many, sometimes conflicting considerations. For these reasons, we have developed a portable API for power measurement and control.
Quantum computers have the potential to solve certain problems dramatically faster than is possible with classical computers. We are funded by the DOE Quantum Algorithms Teams program to explore the abilities of quantum computers in three interrelated areas: quantum simulation, optimization, and machine learning. We leverage connections among these areas and unearth deeper ones to fuel new applications of quantum information processing to science and technology.
IO libraries typically focus on writing the entire simulation domain for each output. For many computation classes, this is the correct choice. However, there are some cases where this approach is wasteful in time and space.
The Stitch library was developed initially for use with the SPPARKS kinetic monte carlo simulation to handle IO tasks for a welding simulation. This simulation type has a particular feature where there is computational intensity in a small part of the simulation domain with the rest being idle. Given this intensity, only writing the area that changes is far more space efficient than writing the entire simulation domain for each output. Further, the computation can be focused strictly on the area where the data will change rather than the whole domain. This can yield a reduction from 1024 to 16 processes and 1/64th the data written. These combined can lead to a reduction in the computation time with no loss in data quality. If anything, by reducing the amount written each time, more output is possible.
This approach is also applicable for finite element codes that share the same localized physics.
The code is in the final stages of copyright review and will be released on github.com. A work in progress paper was presented at PDSW-DISCS @ SC18 and a full CS conference paper is planned for H1 2019 and a follow-on materials science journal paper.
High performance computing architectures are undergoing a marked transformation. Increasing performance of the largest parallel machines at the same exponential rate will require that applications expose more parallelism at an accelerated pace due to the advent of multi-core processors at relatively flat clock rates. The extreme number of hardware components in this machines along with I/O bottlenecks will necessitate looking beyond the traditional checkpoint/restart mechanism for dealing with machine failures. Additionally, as a result of the high power requirements of these machines the energy required to obtain a result will become as important as the time to solution. These changes mean that a new approach to the development of extreme-scale hardware and software is needed relying on the simulaneous exploration of both the hardware and software design space, a process referred to as co-design.
The Structural Simulation Toolkit (SST) enables co-design of extreme-scale architectures by allowing simulation of diverse aspects of hardware and software relevant to such environments. Innovations in instruction set architecture, memory systems, the network interface, and full system network can be explored in the context of design choices for the programming model and algorithms. The package provides two novel capabilities. The first is a fully modular design that enables extensive exploration of an individual system parameter without the need for intrusive changes to the simulator. The second is a parallel simulation environment based on MPI. This provides a high level of performance and the ability to look at large systems. The framework has been successfully used to model concepts ranging from processing in memory to conventional processors connected by conventional network interfaces and running MPI.
The goal of the xSDK is to provide the foundation of this extensible scientific software ecosystem. The first xSDK release (in April 2016) demonstrated the impact of defining draft xSDK community policies to simplify the combined use and portability of independently developed software packages. The xSDK releases have continued to attract more participating libraries. The latest release includes hypre, MAGMA, MFEM, PETSc/TAO, SUNDIALS, SuperLU, and Trilinos. Releases also lay the groundwork for addressing broader issues in software interoperability and performance portability. This work is especially important as emerging extreme-scale architectures provide unprecedented resources for more complex computational science and engineering simulations, yet the current era of disruptive architectural changes requires refactoring and enhancing software packages in order to effectively use these machines for scientific discovery.
Our goal is to make the xSDK a turnkey and standard software ecosystem that is easily installed on common computing platforms, and can be assumed as available on any leadership computing system in the same way that BLAS and LAPACK are available today. The capabilities in the xSDK are essential for the next generation of multiscale and multiphysics applications, where the libraries and components in the xSDK must compile, link, and interoperate from within a single executable.
The Extreme-Scale Scientific Software Stack (E4S) is an open source, open architecture effort to create a US and international collaborative software stack for high performance computing. Led by the US Exascale Computing Project (ECP), E4S provides a conduit for building, testing, integrating and delivering the open source products developed under ECP. E4S further provides easy integration of any other, non-ECP products, that are available via Spack source code installation, Spack being a meta-build tool widely used to compile a software product and its dependencies.
More information can be found at https://e4s.io
The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.
The Vanguard project is expanding the high-performance computing ecosystem by evaluting and accelerating the development of emerging technologies in order to increase their viability for future large-scale production platforms. The goal of the project is to reduce the risk in deploying unproven technologies by identifying gaps in the hardware and software ecosystem and making focused investments to address them. The approach is to make early investments that identify the essential capabilities needed to move technologies from small-scale testbed to large-scale production use.
The XPRESS Project was one of four major projects of the DOE Office of Science Advanced Scientific Computing Research X-stack Program initiated in September, 2012. The purpose of XPRESS was to devise an innovative system software stack to enable practical and useful exascale computing around the end of the decade with near-term contributions to efficient and scalable operation of trans-Petaflops performance systems in the next two to three years; both for DOE mission-critical applications. To this end, XPRESS directly addressed critical challenges in computing of efficiency, scalability, and programmability through introspective methods of dynamic adaptive resource management and task scheduling.
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
The XVis project concluded in 2017. The final report is available.
The Zoltan project focuses on parallel algorithms for parallel combinatorial scientific computing, including partitioning, load balancing, task placement, graph coloring, matrix ordering, distributed data directories, and unstructured communication plans.
The Zoltan toolkit is an open-source library of MPI-based distributed memory algorithms. It includes geometric and hypergraph partitioners, global graph coloring, distributed data directories using rendezvous algorithms, primitives to simplify data movement and unstructured communication, and interfaces to the ParMETIS, Scotch and PaToH partitioning libraries. It is written in C and can be used as a stand-alone library.
The Zoltan2 toolkit is the next-generation toolkit for multicore architectures. It includes MPI+OpenMP algorithms for geometric partitioning, architecture-aware task placement, and local matrix ordering. It is written in templated C++ and is tightly integrated with the Trilinos toolkit.