Publications

Results 1–100 of 157

Search results

Jump to search filters

Accurate data-driven surrogates of dynamical systems for forward propagation of uncertainty

International Journal for Numerical Methods in Engineering

De, Saibal; Jones, Reese E.; Kolla, Hemanth

Stochastic collocation (SC) is a well-known non-intrusive method of constructing surrogate models for uncertainty quantification. In dynamical systems, SC is especially suited for full-field uncertainty propagation that characterizes the distributions of the high-dimensional solution fields of a model with stochastic input parameters. However, due to the highly nonlinear nature of the parameter-to-solution map in even the simplest dynamical systems, the constructed SC surrogates are often inaccurate. This work presents an alternative approach, where we apply the SC approximation over the dynamics of the model, rather than the solution. By combining the data-driven sparse identification of nonlinear dynamics framework with SC, we construct dynamics surrogates and integrate them through time to construct the surrogate solutions. We demonstrate that the SC-over-dynamics framework leads to smaller errors, both in terms of the approximated system trajectories as well as the model state distributions, when compared against full-field SC applied to the solutions directly. We present numerical evidence of this improvement using three test problems: a chaotic ordinary differential equation, and two partial differential equations from solid mechanics.

More Details

Uncertainty Quantification and Sensitivity Analysis of Low-Dimensional Manifold via Co-Kurtosis PCA in Combustion Modeling

Balakrishnan, Uma; Kolla, Hemanth

For multi-scale multi-physics applications e.g., the turbulent combustion code Pele, robust and accurate dimensionality reduction is crucial to solving problems at exascale and beyond. A recently developed technique, Co-Kurtosis based Principal Component Analysis (CoK-PCA) which leverages principal vectors of co-kurtosis, is a promising alternative to traditional PCA for complex chemical systems. To improve the effectiveness of this approach, we employ Artificial Neural Networks for reconstructing thermo-chemical scalars, species production rates, and overall heat release rates corresponding to the full state space. Our focus is on bolstering confidence in this deep learning based non-linear reconstruction through Uncertainty Quantification (UQ) and Sensitivity Analysis (SA). UQ involves quantifying uncertainties in inputs and outputs, while SA identifies influential inputs. One of the noteworthy challenges is the computational expense inherent in both endeavors. To address this, we employ the Monte Carlo methods to effectively quantify and propagate uncertainties in our reduced spaces while managing computational demands. Our research carries profound implications not only for the realm of combustion modeling but also for a broader audience in UQ. By showcasing the reliability and robustness of CoK-PCA in dimensionality reduction and deep learning predictions, we empower researchers and decision-makers to navigate complex combustion systems with greater confidence.

More Details

A co-kurtosis PCA based dimensionality reduction with nonlinear reconstruction using neural networks

Combustion and Flame

Nayak, Dibyajyoti; Jonnalagadda, Anirudh; Balakrishnan, Uma; Kolla, Hemanth; Aditya, Konduri

For turbulent reacting flow systems, identification of low-dimensional representations of the thermo-chemical state space is vitally important, primarily to significantly reduce the computational cost of device-scale simulations. Principal component analysis (PCA), and its variants, are a widely employed class of methods. Recently, an alternative technique that focuses on higher-order statistical interactions, co-kurtosis PCA (CoK-PCA), has been shown to effectively provide a low-dimensional representation by capturing the stiff chemical dynamics associated with spatiotemporally localized reaction zones. While its effectiveness has only been demonstrated based on a priori analyses with linear reconstruction, in this work, we employ nonlinear techniques to reconstruct the full thermo-chemical state and evaluate the efficacy of CoK-PCA compared to PCA. Specifically, we combine a CoK-PCA-/PCA-based dimensionality reduction (encoding) with an artificial neural network (ANN) based reconstruction (decoding) and examine, a priori, the reconstruction errors of the thermo-chemical state. In addition, we evaluate the errors in species production rates and heat release rates, which are nonlinear functions of the reconstructed state, as a measure of the overall accuracy of the dimensionality reduction technique. We employ four datasets to assess CoK-PCA/PCA coupled with ANN-based reconstruction: zero-dimensional (homogeneous) reactor for autoignition of an ethylene/air mixture that has conventional single-stage ignition kinetics, a dimethyl ether (DME)/air mixture which has two-stage (low and high temperature) ignition kinetics, a one-dimensional freely propagating premixed ethylene/air laminar flame, and a two-dimensional dataset representing turbulent autoignition of ethanol in a homogeneous charge compression ignition (HCCI) engine. Results from the analyses demonstrate the robustness of the CoK-PCA based low-dimensional manifold with ANN reconstruction in accurately capturing the data, specifically from the reaction zones.

More Details

Comprehensive uncertainty quantification (UQ) for full engineering models by solving probability density function (PDF) equation

Kolla, Hemanth; De, Saibal; Jones, Reese E.; Hansen, Michael A.; Plews, Julia A.

This report details a new method for propagating parameter uncertainty (forward uncertainty quantification) in partial differential equations (PDE) based computational mechanics applications. The method provides full-field quantities of interest by solving for the joint probability density function (PDF) equations which are implied by the PDEs with uncertain parameters. Full-field uncertainty quantification enables the design of complex systems where quantities of interest, such as failure points, are not known apriori. The method, motivated by the well-known probability density function (PDF) propagation method of turbulence modeling, uses an ensemble of solutions to provide the joint PDF of desired quantities at every point in the domain. A small subset of the ensemble is computed exactly, and the remainder of the samples are computed with approximation of the driving (dynamics) term of the PDEs based on those exact solutions. Although the proposed method has commonalities with traditional interpolatory stochastic collocation methods applied directly to quantities of interest, it is distinct and exploits the parameter dependence and smoothness of the dynamics term of the governing PDEs. The efficacy of the method is demonstrated by applying it to two target problems: solid mechanics explicit dynamics with uncertain material model parameters, and reacting hypersonic fluid mechanics with uncertain chemical kinetic rate parameters. A minimally invasive implementation of the method for representative codes SPARC (reacting hypersonics) and NimbleSM (finite- element solid mechanics) and associated software details are described. For solid mechanics demonstration problems the method shows order of magnitudes improvement in accuracy over traditional stochastic collocation. For the reacting hypersonics problem, the method is implemented as a streamline integration and results show very good accuracy for the approximate sample solutions of re-entry flow past the Apollo capsule geometry at Mach 30.

More Details

Parallel memory-efficient computation of symmetric higher-order joint moment tensors

Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022

Li, Zitong; Kolla, Hemanth; Phipps, Eric T.

The decomposition of higher-order joint cumulant tensors of spatio-temporal data sets is useful in analyzing multi-variate non-Gaussian statistics with a wide variety of applications (e.g. anomaly detection, independent component analysis, dimensionality reduction). Computing the cumulant tensor often requires computing the joint moment tensor of the input data first, which is very expensive using a naïve algorithm. The current state-of-the-art algorithm takes advantage of the symmetric nature of a moment tensor by dividing it into smaller cubic tensor blocks and only computing the blocks with unique values and thus reducing computation. We propose a refactoring of this algorithm by posing its computation as matrix operations, specifically Khatri-Rao products and standard matrix multiplications. An analysis of the computational and cache complexity indicates significant performance savings due to the refactoring. Implementations of our refactored algorithm in Julia show speedups up to 10x over the reference algorithm in single processor experiments. We describe multiple levels of hierarchical parallelism inherent in the refactored algorithm, and present an implementation using an advanced programming model that shows similar speedups in experiments run on a GPU.

More Details

A minimally invasive, efficient method for propagation of full-field uncertainty in solid dynamics

International Journal for Numerical Methods in Engineering

Jones, Reese E.; Redle, Michael T.; Kolla, Hemanth; Plews, Julia A.

We present a minimally invasive method for forward propagation of material property uncertainty to full-field quantities of interest in solid dynamics. Full-field uncertainty quantification enables the design of complex systems where quantities of interest, such as failure points, are not known a priori. The method, motivated by the well-known probability density function (PDF) propagation method of turbulence modeling, uses an ensemble of solutions to provide the joint PDF of desired quantities at every point in the domain. A small subset of the ensemble is computed exactly, and the remainder of the samples are computed with approximation of the evolution equations based on those exact solutions. Although the proposed method has commonalities with traditional interpolatory stochastic collocation methods applied directly to quantities of interest, it is distinct and exploits the parameter dependence and smoothness of the driving term of the evolution equations. The implementation is model independent, storage and communication efficient, and straightforward. We demonstrate its efficiency, accuracy, scaling with dimension of the parameter space, and convergence in distribution with two problems: a quasi-one-dimensional bar impact, and a two material notched plate impact. For the bar impact problem, we provide an analytical solution to PDF of the solution fields for method validation. With the notched plate problem, we also demonstrate good parallel efficiency and scaling of the method.

More Details

A priori analysis of a power-law mixing model for transported PDF model based on high Karlovitz turbulent premixed DNS flames

Proceedings of the Combustion Institute

Zhang, Pei; Xie, Tianfang; Kolla, Hemanth; Wang, Haiou; Hawkes, Evatt R.; Chen, Jacqueline H.; Wang, Haifeng

Accurate modeling of mixing in large-eddy simulation (LES)/transported probability density function (PDF) modeling of turbulent combustion remains an outstanding issue. The issue is particularly salient in turbulent premixed combustion under extreme conditions such as high-Karlovitz number Ka. The present study addresses this issue by conducting an a priori analysis of a power-law scaling based mixing timescale model for the transported PDF model. A recently produced DNS dataset of a high-Ka turbulent jet flame is used for the analysis. A power-law scaling is observed for a scaling factor used to model the sub-filter scale mixing timescale in this high-Ka turbulent premixed DNS flame when the LES filter size is much greater than the characteristic thermal thickness of a laminar premixed flame. The sensitivity of the observed power-law scaling to the different viewpoints (local or global) and to the different scalars for the data analysis is examined and the dependence of the model parameters on the dimensionless numbers Ka and Re (the Reynolds number) is investigated. Different model formulations for the mixing timescale are then constructed and assessed in the DNS flame. The proposed model is found to be able to reproduce the mixing timescale informed by the high-Ka DNS flame significantly better than a previous model.

More Details

Improving Scalability of Silent-Error Resilience for Message-Passing Solvers via Local Recovery and Asynchrony

Proceedings of FTXS 2020: Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Kolla, Hemanth; Mayo, Jackson R.; Teranishi, Keita; Armstrong, Robert C.

Benefits of local recovery (restarting only a failed process or task) have been previously demonstrated in parallel solvers. Local recovery has a reduced impact on application performance due to masking of failure delays (for message-passing codes) or dynamic load balancing (for asynchronous many-task codes). In this paper, we implement MPI-process-local checkpointing and recovery of data (as an extension of the Fenix library) in combination with an existing method for local detection of silent errors in partial-differential-equation solvers, to show a path for incorporating lightweight silent-error resilience. In addition, we demonstrate how asynchrony introduced by maximizing computation-communication overlap can halt the propagation of delays. For a prototype stencil solver (including an iterative-solver-like variant) with injected memory bit flips, results show greatly reduced overhead under weak scaling compared to global recovery, and high failure-masking efficiency. The approach is expected to be generalizable to other MPI-based solvers.

More Details

Improving Scalability of Silent-Error Resilience for Message-Passing Solvers via Local Recovery and Asynchrony

Proceedings of FTXS 2020: Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Kolla, Hemanth; Mayo, Jackson R.; Teranishi, Keita; Armstrong, Robert C.

Benefits of local recovery (restarting only a failed process or task) have been previously demonstrated in parallel solvers. Local recovery has a reduced impact on application performance due to masking of failure delays (for message-passing codes) or dynamic load balancing (for asynchronous many-task codes). In this paper, we implement MPI-process-local checkpointing and recovery of data (as an extension of the Fenix library) in combination with an existing method for local detection of silent errors in partial-differential-equation solvers, to show a path for incorporating lightweight silent-error resilience. In addition, we demonstrate how asynchrony introduced by maximizing computation-communication overlap can halt the propagation of delays. For a prototype stencil solver (including an iterative-solver-like variant) with injected memory bit flips, results show greatly reduced overhead under weak scaling compared to global recovery, and high failure-masking efficiency. The approach is expected to be generalizable to other MPI-based solvers.

More Details

CoREC: Scalable and Resilient In-memory Data Staging for In-situWorkflows

ACM Transactions on Parallel Computing

Duan, Shaohua; Subedi, Pradeep; Davis, Philip; Teranishi, Keita; Kolla, Hemanth; Gamell, Marc; Parashar, Manish

The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on leadership class computing machines and present an experimental evaluation in the article. The experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.

More Details

Anomaly detection in scientific data using joint statistical moments

Journal of Computational Physics

Aditya, Konduri; Kolla, Hemanth; Kegelmeyer, W.P.; Shead, Timothy M.; Ling, Julia; Davis, Warren L.

We propose an anomaly detection method for multi-variate scientific data based on analysis of high-order joint moments. Using kurtosis as a reliable measure of outliers, we suggest that principal kurtosis vectors, by analogy to principal component analysis (PCA) vectors, signify the principal directions along which outliers appear. The inception of an anomaly, then, manifests as a change in the principal values and vectors of kurtosis. Obtaining the principal kurtosis vectors requires decomposing a fourth order joint cumulant tensor for which we use a simple, computationally less expensive approach that involves performing a singular value decomposition (SVD) over the matricized tensor. We demonstrate the efficacy of this approach on synthetic data, and develop an algorithm to identify the occurrence of a spatial and/or temporal anomalous event in scientific phenomena. The algorithm decomposes the data into several spatial sub-domains and time steps to identify regions with such events. Feature moment metrics, based on the alignments of the principal kurtosis vectors, are computed at each sub-domain and time step for all features to quantify their relative importance towards the overall kurtosis in the data. Accordingly, spatial and temporal anomaly metrics for each sub-domain are proposed using the Hellinger distance of the feature moment metric distribution from a suitable nominal distribution. We apply the algorithm to two turbulent auto-ignition combustion cases and demonstrate that the anomaly metrics reliably capture the occurrence of auto-ignition in relevant spatial sub-domains at the right time steps.

More Details

Anomaly detection in scientific data using joint statistical moments

Journal of Computational Physics

Aditya, Konduri; Kolla, Hemanth; Kegelmeyer, William P.; Shead, Timothy M.; Ling, Julia; Davis, Warren L.

We propose an anomaly detection method for multi-variate scientific data based on analysis of high-order joint moments. Using kurtosis as a reliable measure of outliers, we suggest that principal kurtosis vectors, by analogy to principal component analysis (PCA) vectors, signify the principal directions along which outliers appear. The inception of an anomaly, then, manifests as a change in the principal values and vectors of kurtosis. Obtaining the principal kurtosis vectors requires decomposing a fourth order joint cumulant tensor for which we use a simple, computationally less expensive approach that involves performing a singular value decomposition (SVD) over the matricized tensor. We demonstrate the efficacy of this approach on synthetic data, and develop an algorithm to identify the occurrence of a spatial and/or temporal anomalous event in scientific phenomena. The algorithm decomposes the data into several spatial sub-domains and time steps to identify regions with such events. Feature moment metrics, based on the alignments of the principal kurtosis vectors, are computed at each sub-domain and time step for all features to quantify their relative importance towards the overall kurtosis in the data. Accordingly, spatial and temporal anomaly metrics for each sub-domain are proposed using the Hellinger distance of the feature moment metric distribution from a suitable nominal distribution. We apply the algorithm to two turbulent auto-ignition combustion cases and demonstrate that the anomaly metrics reliably capture the occurrence of auto-ignition in relevant spatial sub-domains at the right time steps.

More Details

Enabling Resilience in Asynchronous Many-Task Programming Models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Paul, Sri R.; Hayashi, Akihiro; Slattengren, Nicole L.; Kolla, Hemanth; Whitlock, Matthew J.; Bak, Seonmyeong; Teranishi, Keita; Mayo, Jackson R.; Sarkar, Vivek

Resilience is an imminent issue for next-generation platforms due to projected increases in soft/transient failures as part of the inherent trade-offs among performance, energy, and costs in system design. In this paper, we introduce a comprehensive approach to enabling application-level resilience in Asynchronous Many-Task (AMT) programming models with a focus on remedying Silent Data Corruption (SDC) that can often go undetected by the hardware and OS. Our approach makes it possible for the application programmer to declaratively express resilience attributes with minimal code changes, and to delegate the complexity of efficiently supporting resilience to our runtime system. We have created a prototype implementation of our approach as an extension to the Habanero C/C++ library (HClib), where different resilience techniques including task replay, task replication, algorithm-based fault tolerance (ABFT), and checkpointing are available. Our experimental results show that task replay incurs lower overhead than task replication when an appropriate error checking function is provided. Further, task replay matches the low overhead of ABFT. Our results also demonstrate the ability to combine different resilience schemes. To evaluate the effectiveness of our resilience mechanisms in the presence of errors, we injected synthetic errors at different error rates (1.0%, and 10.0%) and found modest increase in execution times. In summary, the results show that our approach supports efficient and scalable recovery, and that our approach can be used to influence the design of future AMT programming models and runtime systems that aim to integrate first-class support for user-level resilience.

More Details

ASC CSSE Level 2 Milestone #6362: Resilient Asynchronous Many Task Programming Model

Teranishi, Keita; Kolla, Hemanth; Slattengren, Nicole L.; Whitlock, Matthew J.; Mayo, Jackson R.; Clay, Robert L.; Paul, Sri R.; Hayashi, Akihiro; Sarkar, Vivek

This report is an outcome of the ASC CSSE Level 2 Milestone 6362: Analysis of Re- silient Asynchronous Many-Task (AMT) Programming Model. It comprises a summary and in-depth analysis of resilience schemes adapted to the AMT programming model. Herein, performance trade-offs of a resilient-AMT prograrnming model are assessed through two ap- proaches: (1) an analytical model realized by discrete event simulations and (2) empirical evaluation of benchmark programs representing regular and irregular workloads of explicit partial differential equation solvers. As part of this effort, an AMT execution simulator and a prototype resilient-AMT programming framework have been developed. The former permits us to hypothesize the performance behavior of a resilient-AMT model, and has undergone a verification and validation (V&V) process. The latter allows empirical evaluation of the perfor- mance of resilience schemes under emulated program failures and enabled the aforementioned V&V process. The outcome indicates that (1) resilience techniques implemented within an AMT framework allow efficient and scalable recovery under frequent failures, that (2) the abstraction of task and data instances in the AMT programming model enables readily us- able Application Program Interfaces (APIs) for resilience, and that (3) this abstraction enables predicting the performance of resilient-AMT applications with a simple simulation infrastruc- ture. This outcome will provide guidance for the design of the AMT programming model and runtime systems, user-level resilience support, and application development for ASC's next generation platforms (NGPs).

More Details

Embedding Python for In-Situ Analysis

Dunlavy, Daniel M.; Shead, Timothy M.; Aditya, Konduri; Kolla, Hemanth; Kegelmeyer, William P.; Davis, Warren L.

We describe our work to embed a Python interpreter in S3D, a highly scalable parallel direct numerical simulation reacting flow solver written in Fortran. Although S3D had no in-situ capability when we began, embedding the interpreter was surprisingly easy, and the result is an extremely flexible platform for conducting machine-learning experiments in-situ.

More Details

SNL ATDM Visualization (ECP ST Capability Assessment Report)

Moreland, Kenneth D.; Kolla, Hemanth

The SNL ATDM Data and Visualization work consolidates existing ATDM activities in scalable data management and visualization. Part of the responsibilities of the SNL ATDM Data and Visualization Project is the maintenance and development of visualization resources for ATDM applications on Exascale platforms. The ATDM Scalable Visualization project provides visualization and analysis required to satisfy the needs of the ASC/ATDM applications on next-generation, many-core platforms. This involves many activities including the re-engineering of visualization algorithms, services, and tools that enable ASC customers to carry out data analysis on ASC systems and ACES platforms. Current tools include scalable data analysis software released open source through ParaView, VTK, and Catalyst. We are also both leveraging and contributing to VTK-m, a many-core visualization library, to satisfy our visualization needs on advanced architectures. The scope of the Scalable Visualization under ATDM at SNL is R&D for the programming model and implementation of visualization code for ASC/ATDM projects and ASC/ATDM application support.

More Details

Turbulent Combustion Simulations with High-Performance Computing

Energy, Environment, and Sustainability

Kolla, Hemanth; Chen, Jacqueline H.

Considering that simulations of turbulent combustion are computationally expensive, this chapter takes a decidedly different perspective, that of high-performance computing (HPC). The cost scaling arguments of non-reacting turbulence simulations are revisited and it is shown that the cost scaling for reacting flows is much more stringent for comparable conditions, making parallel computing and HPC indispensable. Hardware abstractions of typical parallel supercomputers are presented which show that for design of an efficient and optimal program, it is essential to exploit both distributed memory parallelism and shared-memory parallelism, i.e. hierarchical parallelism. Principles of efficient programming at various levels of parallelism are illustrated using archetypal code examples. The vast array of numerical methods, particularly schemes for spatial and temporal discretization, are examined in terms of tradeoffs they present from an HPC perspective. Aspects of data analytics that invariably result from large feature-rich data sets generated by combustion simulations are covered briefly.

More Details

Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

SIAM Journal on Scientific Computing

Gamell, Marc; Teranishi, Keita; Kolla, Hemanth; Mayo, Jackson R.; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments. In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.

More Details

Modeling and simulating multiple failure masking enabled by local recovery for stencil-based applications at extreme scales

IEEE Transactions on Parallel and Distributed Systems

Gamell, Marc; Teranishi, Keita; Mayo, Jackson R.; Kolla, Hemanth; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish

Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.

More Details

ASC ATDM Level 2 Milestone #6015: Asynchronous Many-Task Software Stack Demonstration

Bennett, Janine C.; Bettencourt, Matthew T.; Clay, Robert L.; Edwards, Harold C.; Glass, Micheal W.; Hollman, David S.; Kolla, Hemanth; Lifflander, Jonathan J.; Littlewood, David J.; Markosyan, Aram; Moore, Stan G.; Olivier, Stephen L.; Phipps, Eric T.; Rizzi, Francesco; Slattengren, Nicole L.; Sunderland, Daniel; Wilke, Jeremiah

This report is an outcome of the ASC ATDM Level 2 Milestone 6015: Asynchronous Many-Task Software Stack Demonstration. It comprises a summary and in depth analysis of DARMA and a DARMA-compliant Asynchronous Many-Task (AMT) runtime software stack. Herein performance and productivity of the over- all approach are assessed on benchmarks and proxy applications representative of the Sandia ATDM applications. As part of the effort to assess the perceived strengths and weaknesses of AMT models compared to more traditional methods, experiments were performed on ATS-1 (Advanced Technology Systems) test bed machines and Trinity. In addition to productivity and performance assessments, this report includes findings on the generality of DARMAs backend API as well as findings on interoperability with node- level and network-level system libraries. Together, this information provides a clear understanding of the strengths and limitations of the DARMA approach in the context of Sandias ATDM codes, to guide our future research and development in this area.

More Details

Scalability of Several Asynchronous Many-Task Models for In Situ Statistical Analysis

Pebay, Philippe P.; Bennett, Janine C.; Kolla, Hemanth; Borghesi, Giulio

This report is a sequel to [PB16], in which we provided a first progress report on research and development towards a scalable, asynchronous many-task, in situ statistical analysis engine using the Legion runtime system. This earlier work included a prototype implementation of a proposed solution, using a proxy mini-application as a surrogate for a full-scale scientific simulation code. The first scalability studies were conducted with the above on modestly-sized experimental clusters. In contrast, in the current work we have integrated our in situ analysis engines with a full-size scientific application (S3D, using the Legion-SPMD model), and have conducted nu- merical tests on the largest computational platform currently available for DOE science ap- plications. We also provide details regarding the design and development of a light-weight asynchronous collectives library. We describe how this library is utilized within our SPMD- Legion S3D workflow, and compare the data aggregation technique deployed herein to the approach taken within our previous work.

More Details

A Unified Data-Driven Approach for Programming In Situ Analysis and Visualization: An Interim Report of Sandia Sub-Team Contributions

Bennett, Janine C.; Pebay, Philippe P.; Kolla, Hemanth; Borghesi, Giulio

As we look ahead to next generation high performance computing platforms, the placement and movement of data is becoming the key-limiting factor on both performance and energy efficiency. Furthermore, the increased quantities of data the systems are capable of generating, in conjunction with the insufficient rate of improvements in the supporting I/0 infrastructure, is forcing applications away from the off-line post-processing of data towards techniques based on in ,situ analysis and visualization. Together, these challenges are shaping how we will both design and develop effective, performant and energy-efficient software. In particular, the challenges highlight the need for data and data-centric operations to be fundamental in the reasoning about, and optimization of, scientific workflows on extreme-scale architectures.

More Details

Flame thickness and conditional scalar dissipation rate in a premixed temporal turbulent reacting jet

Combustion and Flame

Chaudhuri, Swetaprovo; Kolla, Hemanth; Dave, Himanshu L.; Hawkes, Evatt R.; Chen, Jacqueline H.; Law, Chung K.

The flame structure corresponding to lean hydrogen–air premixed flames in intense sheared turbulence in the thin reaction zone regime is quantified from flame thickness and conditional scalar dissipation rate statistics, obtained from recent direct numerical simulation data of premixed temporally-evolving turbulent slot jet flames [1]. It is found that, on average, these sheared turbulent flames are thinner than their corresponding planar laminar flames. Extensive analysis is performed to identify the reason for this counter-intuitive thinning effect. The factors controlling the flame thickness are analyzed through two different routes i.e., the kinematic route, and the transport and chemical kinetics route. The kinematic route is examined by comparing the statistics of the normal strain rate due to fluid motion with the statistics of the normal strain rate due to varying flame displacement speed or self-propagation. It is found that while the fluid normal straining is positive and tends to separate iso-scalar surfaces, the dominating normal strain rate due to self-propagation is negative and tends to bring the iso-scalar surfaces closer resulting in overall thinning of the flame. The transport and chemical kinetics route is examined by studying the non-unity Lewis number effect on the premixed flames. The effects from the kinematic route are found to couple with the transport and chemical kinetics route. In addition, the intermittency of the conditional scalar dissipation rate is also examined. It is found to exhibit a unique non-monotonicity of the exponent of the stretched exponential function, conventionally used to describe probability density function tails of such variables. The non-monotonicity is attributed to the detailed chemical structure of hydrogen-air flames in which heat release occurs close to the unburnt reactants at near free-stream temperatures.

More Details

A mixing timescale model for TPDF simulations of turbulent premixed flames

Combustion and Flame

Kuron, Michael; Ren, Zhuyin; Hawkes, Evatt R.; Zhou, Hua; Kolla, Hemanth; Chen, Jacqueline H.; Lu, Tianfeng

Transported probability density function (TPDF) methods are an attractive modeling approach for turbulent flames as chemical reactions appear in closed form. However, molecular micro-mixing needs to be modeled and this modeling is considered a primary challenge for TPDF methods. In the present study, a new algebraic mixing rate model for TPDF simulations of turbulent premixed flames is proposed, which is a key ingredient in commonly used molecular mixing models. The new model aims to properly account for the transition in reactive scalar mixing rate behavior from the limit of turbulence-dominated mixing to molecular mixing behavior in flamelets. An a priori assessment of the new model is performed using direct numerical simulation (DNS) data of a lean premixed hydrogen–air jet flame. The new model accurately captures the mixing timescale behavior in the DNS and is found to be a significant improvement over the commonly used constant mechanical-to-scalar mixing timescale ratio model. An a posteriori TPDF study is then performed using the same DNS data as a numerical test bed. The DNS provides the initial conditions and time-varying input quantities, including the mean velocity, turbulent diffusion coefficient, and modeled scalar mixing rate for the TPDF simulations, thus allowing an exclusive focus on the mixing model. The new mixing timescale model is compared with the constant mechanical-to-scalar mixing timescale ratio coupled with the Euclidean Minimum Spanning Tree (EMST) mixing model, as well as a laminar flamelet closure by Pope and Anand (1984). It is found that the laminar flamelet closure is unable to properly capture the mixing behavior in the thin reaction zones regime while the constant mechanical-to-scalar mixing timescale model under-predicts the flame speed. The EMST model coupled with the new mixing timescale model provides the best prediction of the flame structure and flame propagation among the models tested, as the dynamics of reactive scalar mixing across different flame regimes are appropriately accounted for.

More Details

Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

Computational Statistics

Pebay, Philippe P.; Terriberry, Timothy B.; Kolla, Hemanth; Bennett, Janine C.

Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which we further extend with weighted and compound variants. We also develop a generalized correction factor for standard two-pass algorithms that enables the maintenance of accuracy over nearly the full representable range of the input, avoiding the need for extended-precision arithmetic. We then empirically examine algorithm correctness for pairwise update formulas up to order four as well as condition number and relative error bounds for eight different central moment formulas, each up to degree six, to address the trade-offs between numerical accuracy and speed of the various algorithms. Finally, we demonstrate the use of the most elaborate among the above mentioned formulas, with the utilization of the compound moments for a practical large-scale scientific application.

More Details

Velocity and Reactive Scalar Dissipation Spectra in Turbulent Premixed Flames

Combustion Science and Technology

Kolla, Hemanth; Zhao, Xin Y.; Chen, Jacqueline H.; Swaminathan, N.

Dissipation spectra of velocity and reactive scalars—temperature and fuel mass fraction—in turbulent premixed flames are studied using direct numerical simulation data of a temporally evolving lean hydrogen-air premixed planar jet (PTJ) flame and a statistically stationary planar lean methane-air (SP) flame. The equivalence ratio in both cases was 0.7, the pressure 1 atm while the unburned temperature was 700 K for the hydrogen-air PTJ case and 300 K for methane-air SP case, resulting in data sets with a density ratio of 3 and 5, respectively. The turbulent Reynolds numbers for the cases ranged from 200 to 428.4, the Damköhler number from 3.1 to 29.1, and the Karlovitz number from 0.1 to 4.5. The dissipation spectra collapse when normalized by the respective Favre-averaged dissipation rates. However, the normalized dissipation spectra in all the cases deviate noticeably from those predicted by classical scaling laws for constant-density turbulent flows and bear a clear influence of the chemical reactions on the dissipative range of the energy cascade.

More Details

DARMA 0.3.0-alpha Specification

Wilke, Jeremiah; Hollman, David S.; Slattengren, Nicole L.; Lifflander, Jonathan; Kolla, Hemanth; Rizzi, Francesco; Teranishi, Keita; Bennett, Janine C.

In this document, we provide the specifications for DARMA (Distributed Asynchronous Resilient Models and Applications), a co-design research vehicle for asynchronous many-task (AMT) programming models that serves to: 1) insulate applications from runtime system and hardware idiosyncrasies, 2) improve AMT runtime programmability by co-designing an application programmer interface (API) directly with application developers, 3) synthesize application co-design activities into meaningful requirements for runtime systems, and 4) facilitate AMT design space characterization and definition, accelerating the development of AMT best practices.

More Details

Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

Computational Statistics

Pebay, Philippe P.; Terriberry, Timothy B.; Kolla, Hemanth; Bennett, Janine C.

Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Formulas such as these, are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which we further extend with weighted and compound variants. We also develop a generalized correction factor for standard two-pass algorithms that enables the maintenance of accuracy over nearly the full representable range of the input, avoiding the need for extended-precision arithmetic. We then empirically examine algorithm correctness for pairwise update formulas up to order four as well as condition number and relative error bounds for eight different central moment formulas, each up to degree six, to address the trade-offs between numerical accuracy and speed of the various algorithms. Finally, we demonstrate the use of the most elaborate among the above mentioned formulas, with the utilization of the compound moments for a practical large-scale scientific application.

More Details

Local recovery and failure masking for stencil-based applications at extreme scales

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Gamell, Marc; Teranishi, Keita; Heroux, Michael A.; Mayo, Jackson R.; Kolla, Hemanth; Chen, Jacqueline H.; Parashar, Manish

Application resilience is a key challenge that has to be addressed to realize the exascale vision. Online recovery, even when it involves all processes, can dramatically reduce the overhead of failures as compared to the more traditional approach where the job is terminated and restarted from the last checkpoint. In this paper we explore how local recovery can be used for certain classes of applications to further reduce overheads due to resilience. Specifically we develop programming support and scalable runtime mechanisms to enable online and transparent local recovery for stencil-based parallel applications on current leadership class systems. We also show how multiple independent failures can be masked to effectively reduce the impact on the total time to solution. We integrate these mechanisms with the S3D combustion simulation, and experimentally demonstrate (using the Titan Cray-XK7 system at ORNL) the ability to tolerate high failure rates (i.e., node failures every 5 seconds) with low overhead while sustaining performance, at scales up to 262144 cores.

More Details

Scalable Parallel Distance Field Construction for Large-Scale Applications

IEEE Transactions on Visualization and Computer Graphics

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan L.; Kolla, Hemanth; Chen, Jacqueline H.

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.

More Details
Results 1–100 of 157
Results 1–100 of 157