Computational Systems & Software Environment

The Computational Systems & Software Environment program builds integrated, high performance computational systems to meet predictive simulation requirements. R&D activities within this program create technologies that address Advanced Simulation and Computing’s (ASC) unique mission needs for scalability, parallelism, performance, and reliability.

Computational Systems & Software Environment activities are distributed across the three nuclear weapons laboratories. Described below are Sandia’s areas of expertise and contribution.

ACES Architecture Office

The Alliance for Computing at Extreme Scale (ACES) is a partnership between Sandia and Los Alamos national laboratories. The objective of the ACES Architecture Office is to define requirements and potential system architectures for platforms that meet future ASC programmatic requirements and drivers.

Next Generation Computing Technologies

ASC plans to motivate and influence next generation high performance scientific computing system designs using a co-design methodology. This project promulgates Sandia’s efforts in co-design for future generation technologies. It brings together all our forward looking activities described in the other projects herein to understand their impact on our applications. Of particular importance to this activity is our Mantevo and SST work.

Advanced Systems Technology

Addressing the critical need to explore a diverse set of architectural alternatives for future systems, this project focuses on experimental architecture test beds. These test beds aim to support path-finding explorations of alternative programming models, architecture-aware algorithms, low-energy runtime and system software, and advanced memory subsystem development.

System Software

Power has become a first-order design constraint for future supercomputers. Accordingly, we are developing data collection techniques that provide new insight into understanding the power requirements of ASC applications. We are also using our long-standing expertise in lightweight kernel (LWK) technology to develop improved mechanisms for system-level memory locality correction techniques and for a highly scalable, shared library implementation.

Performance Analysis and Mantevo Mini-applications

Our Mantevo project provides open-source software packages for the analysis, prediction, and improvement of high performance computing applications. Application performance is determined by a combination of choices: hardware platform, runtime environment, languages and compilers, algorithm choice and implementation, and more. Our researchers find that the use of mini-applications (small self-contained proxies for real applications) is an excellent approach for rapidly exploring the parameter space of all these choices.

Structural Simulation Toolkit (SST)

In order to design and optimize the next generation of the world’s fastest computers, we developed and released the open source Structural Simulation Toolkit (SST). This computer simulator models the complex interactions between processors, network, and memory for future supercomputers. When combined with mini-applications, SST facilitates the consideration of a broad space of potential architectural and application/algorithmic designs.

Resilience

The objective of this project is to develop scalable techniques, mechanisms, and enhancements to hardware and system software that increase understanding of system behaviors and application-system interactions with the goal of reducing the risk of failure at the application level. Four key core capabilities associated with this effort are: 1) scalable, real-time failure and system characterization, 2) failure and system modeling, 3) failure detection and prediction, and 4) development of effective response mechanisms.

Input/Output (I/O)

Critical to any high performance computing system is the ability to deliver data to or from the system itself. This project is responsible for storage and networking that can meet the high demands of high performance computing systems. Technologies include high-performance parallel file systems, I/O libraries such as TRIOS, interconnect protocols, and high-performance storage systems.

Data Analysis and Visualization

Visualization (or visual representations of information, data, and knowledge) provides researchers important insights into complex physical and engineering processes. Sandia’s capabilities in this area include scalable data analysis software released through ParaView and Visualization Tool Kit (VTK), an in-situ data analysis library for coupling directly with running codes, and ensemble analysis R&D. 

ParaView

This free, open source visualization application is optimized for handling large data sets. It is supported by Sandia, Los Alamos, and Livermore national laboratories and a private company, Kitware Inc. Sandia contributes extensively to the functionality and the infrastructure of this project. VTK is a popular mainstay in academic visualization courses worldwide. ParaView is built upon Visualization Tool Kit (VTK) to allow rapid deployment of visualization components.