Publications

Results 1–25 of 161

Search results

Jump to search filters

An Assessment of the Laminar Hypersonic Double-Cone Experiments in the LENS-XX Tunnel

AIAA Journal

Blonigan, Patrick J.; Phipps, Eric T.; Maupin, Kathryn A.

This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. Here in this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.

More Details

An Assessment of the Laminar Hypersonic Double-Cone Experiments in the LENS-XX Tunnel

AIAA Journal

Ray, Jaideep R.; Blonigan, Patrick J.; Phipps, Eric T.; Maupin, Kathryn A.

This is an investigation on two experimental datasets of laminar hypersonic flows, over a double-cone geometry, acquired in Calspan—University at Buffalo Research Center’s Large Energy National Shock (LENS)-XX expansion tunnel. These datasets have yet to be modeled accurately. A previous paper suggested that this could partly be due to mis-specified inlet conditions. The authors of this paper solved a Bayesian inverse problem to infer the inlet conditions of the LENS-XX test section and found that in one case they lay outside the uncertainty bounds specified in the experimental dataset. However, the inference was performed using approximate surrogate models. In this paper, the experimental datasets are revisited and inversions for the tunnel test-section inlet conditions are performed with a Navier–Stokes simulator. The inversion is deterministic and can provide uncertainty bounds on the inlet conditions under a Gaussian assumption. It was found that deterministic inversion yields inlet conditions that do not agree with what was stated in the experiments. An a posteriori method is also presented to check the validity of the Gaussian assumption for the posterior distribution. This paper contributes to ongoing work on the assessment of datasets from challenging experiments conducted in extreme environments, where the experimental apparatus is pushed to the margins of its design and performance envelopes.

More Details

Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado

ACM Transactions on Mathematical Software

Phipps, Eric T.; Pawlowski, Roger P.; Trott, Christian R.

Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.

More Details

Parallel memory-efficient computation of symmetric higher-order joint moment tensors

Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022

Li, Zitong L.; Kolla, Hemanth K.; Phipps, Eric T.

The decomposition of higher-order joint cumulant tensors of spatio-temporal data sets is useful in analyzing multi-variate non-Gaussian statistics with a wide variety of applications (e.g. anomaly detection, independent component analysis, dimensionality reduction). Computing the cumulant tensor often requires computing the joint moment tensor of the input data first, which is very expensive using a naïve algorithm. The current state-of-the-art algorithm takes advantage of the symmetric nature of a moment tensor by dividing it into smaller cubic tensor blocks and only computing the blocks with unique values and thus reducing computation. We propose a refactoring of this algorithm by posing its computation as matrix operations, specifically Khatri-Rao products and standard matrix multiplications. An analysis of the computational and cache complexity indicates significant performance savings due to the refactoring. Implementations of our refactored algorithm in Julia show speedups up to 10x over the reference algorithm in single processor experiments. We describe multiple levels of hierarchical parallelism inherent in the refactored algorithm, and present an implementation using an advanced programming model that shows similar speedups in experiments run on a GPU.

More Details

Streaming Generalized Canonical Polyadic Tensor Decompositions

Phipps, Eric T.; Johnson, Nick; Kolda, Tamara G.

In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced usability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.

More Details

Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Lewis, Cannada L.; Phipps, Eric T.

In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.

More Details

GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models

Computer Methods in Applied Mechanics and Engineering

Liegeois, Kim; Boman, Romain; Phipps, Eric T.; Wiesner, Tobias A.; Arnst, Maarten

In a previous work, embedded ensemble propagation was proposed to improve the efficiency of sampling-based uncertainty quantification methods of computational models on emerging computational architectures. It consists of simultaneously evaluating the model for a subset of samples together, instead of evaluating them individually. A first approach introduced to solve parametric linear systems with ensemble propagation is ensemble reduction. In Krylov methods for example, this reduction consists in coupling the samples together using an inner product that sums the sample contributions. Ensemble reduction has the advantages of being able to use optimized implementations of BLAS functions and having a stopping criterion which involves only one scalar. However, the reduction potentially decreases the rate of convergence due to the gathering of the spectra of the samples. In this paper, we investigate a second approach: ensemble propagation without ensemble reduction in the case of GMRES. This second approach solves each sample simultaneously but independently to improve the convergence compared to ensemble reduction. This raises two new issues which are solved in this paper: the fact that optimized implementations of BLAS functions cannot be used anymore and that ensemble divergence, whereby individual samples within an ensemble must follow different code execution paths, can occur. We tackle those issues by implementing a high-performing ensemble GEMV and by using masks. The proposed ensemble GEMV leads to a similar cost per GMRES iteration for both approaches, i.e. with and without reduction. For illustration, we study the performances of the new linear solver in the context of a mesh tying problem. This example demonstrates improved ensemble propagation speed-up without reduction.

More Details
Results 1–25 of 161
Results 1–25 of 161