Hybrid Parallel Tucker Decomposition of Streaming Data
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Derivative computation is a key component of optimization, sensitivity analysis, uncertainty quantification, and the solving of nonlinear problems. Automatic differentiation (AD) is a powerful technique for evaluating such derivatives, and in recent years, has been integrated into programming environments such as Jax, PyTorch, and TensorFlow to support derivative computations needed for training of machine learning models, facilitating wide-spread use of these technologies. The C++ language has become the de facto standard for scientific computing due to numerous factors, yet language complexity has made the wide-spread adoption of AD technologies for C++ difficult, hampering the incorporation of powerful differentiable programming approaches into C++ scientific simulations. This is exacerbated by the increasing emergence of architectures, such as GPUs, with limited memory capabilities and requiring massive thread-level concurrency. C++ AD tools must effectively use these environments to bring novel scientific simulations to next-generation DOE experimental and observational facilities. In this project, we investigated source transformation-based automatic differentiation using LLVM compiler infrastructure to automatically generate portable and efficient gradient computations of Kokkos-based code. We have demonstrated that our proposed strategy is feasible by investigating the usage of a prototype LLVM-based source transformation tool to generate gradients of simple functions made of sequences of simple Kokkos parallel regions. Speedups of up to 500x compared to Sacado were observed on NVIDIA V100 GPU.
Abstract not provided.
AIAA Journal
This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. Here in this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
AIAA Journal
This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. In this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.
AIAA Journal
This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. In this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.
ACM Transactions on Mathematical Software
Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.
Abstract not provided.
The objective of this milestone was to finish integrating GenTen tensor software with combustion application Pele using the Ascent in situ analysis software, partnering with the ALPINE and Pele teams. Also, to demonstrate the usage of the tensor analysis as part of a combustion simulation.
Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022
The decomposition of higher-order joint cumulant tensors of spatio-temporal data sets is useful in analyzing multi-variate non-Gaussian statistics with a wide variety of applications (e.g. anomaly detection, independent component analysis, dimensionality reduction). Computing the cumulant tensor often requires computing the joint moment tensor of the input data first, which is very expensive using a naïve algorithm. The current state-of-the-art algorithm takes advantage of the symmetric nature of a moment tensor by dividing it into smaller cubic tensor blocks and only computing the blocks with unique values and thus reducing computation. We propose a refactoring of this algorithm by posing its computation as matrix operations, specifically Khatri-Rao products and standard matrix multiplications. An analysis of the computational and cache complexity indicates significant performance savings due to the refactoring. Implementations of our refactored algorithm in Julia show speedups up to 10x over the reference algorithm in single processor experiments. We describe multiple levels of hierarchical parallelism inherent in the refactored algorithm, and present an implementation using an advanced programming model that shows similar speedups in experiments run on a GPU.
Abstract not provided.
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced usability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
2021 IEEE High Performance Extreme Computing Conference Hpec 2021
In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.