The Computational
Systems & Software Environment program builds integrated, high
performance computational systems to meet predictive simulation requirements.
R&D activities within this program create technologies that address Advanced
Simulation and Computing's (ASC)
unique mission needs for scalability, parallelism, performance, and
reliability.
Computational
Systems & Software Environment activities are distributed across the three nuclear weapons laboratories. Described
below are Sandia’s areas of expertise and contribution.
ACES Architecture Office
The Alliance
for Computing at Extreme Scale (ACES) is a partnership between Sandia and Los
Alamos national laboratories. The objective of the ACES Architecture Office is
to define requirements and potential system architectures for platforms that
meet future ASC programmatic requirements and drivers.
Next Generation Computing Technologies
ASC plans to motivate and influence next generation high performance scientific computing system designs using a co-design methodology. This project promulgates Sandia's efforts in co-design for future generation technologies. It brings together all our forward looking activities described in the other projects herein to understand their impact on our applications. Of particular importance to this activity is our Mantevo and SST work.
Advanced Systems Technology
Addressing the critical
need to explore a diverse set of architectural alternatives for future systems,
this project focuses on experimental architecture test beds. These test beds aim
to support path-finding explorations of alternative programming models,
architecture-aware algorithms, low-energy runtime and system software, and
advanced memory subsystem development.
System Software
Power has
become a first-order design constraint for future supercomputers. Accordingly, we are developing data collection
techniques that provide new insight into understanding the power requirements
of ASC applications. We are also using our long-standing expertise in
lightweight kernel (LWK) technology to develop improved mechanisms for
system-level memory locality correction techniques and for a highly scalable,
shared library implementation.
Performance Analysis and Mantevo Mini-applications
Our Mantevo
project provides open-source software packages for the analysis, prediction,
and improvement of high performance computing applications. Application
performance is determined by a combination of choices: hardware platform,
runtime environment, languages and compilers, algorithm choice and
implementation, and more. Our researchers find that the use of
mini-applications (small self-contained proxies for real applications) is an
excellent approach for rapidly exploring the parameter space of all these choices.
In order to
design and optimize the next generation of the world’s fastest computers, we
developed and released the open source Structural Simulation Toolkit (SST).
This computer simulator models the complex interactions between processors,
network, and memory for future supercomputers. When combined with mini-applications,
SST facilitates the consideration of a broad space of potential architectural
and application/algorithmic designs.
Resilience
The objective
of this project is to develop scalable techniques, mechanisms, and enhancements
to hardware and system software that increase understanding of system behaviors
and application-system interactions with the goal of reducing the risk of
failure at the application level. Four key core capabilities associated with
this effort are: 1) scalable, real-time failure and system characterization, 2)
failure and system modeling, 3) failure detection and prediction, and 4)
development of effective response mechanisms.
Input/Output (I/O)
Critical to
any high performance computing system is the ability to deliver data to or from
the system itself. This project is responsible for storage and networking that
can meet the high demands of high performance computing systems.
Technologies include high-performance parallel file systems, I/O libraries such
as TRIOS, interconnect protocols, and high-performance storage systems.
Data Analysis and Visualization
Visualization
(or visual representations of information, data, and knowledge) provides researchers
important insights into complex physical and engineering processes. Sandia’s capabilities
in this area include scalable data analysis software released through ParaView
and Visualization Tool Kit (VTK), an in-situ data analysis
library for coupling directly with running codes, and ensemble analysis
R&D.
This free, open source
visualization application is optimized for handling large data sets. It is
supported by Sandia, Los Alamos, and Livermore national laboratories and a private
company, Kitware Inc. Sandia contributes extensively to the functionality and
the infrastructure of this project. VTK is a popular mainstay
in academic visualization courses worldwide. ParaView is built upon Visualization Tool Kit (VTK) to
allow rapid deployment of visualization components.