Publications

Results 26–50 of 60
Skip to search filters

Designing vector-friendly compact BLAS and LAPACK kernels

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Kim, Kyungjoo K.; Costa, Timothy B.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Guney, Murat E.; Knepper, Sarah; Story, Shane; Rajamanickam, Sivasankaran R.

Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.

More Details

Constraining the Magmatic System at Mount St. Helens (2004–2008) Using Bayesian Inversion With Physics-Based Models Including Gas Escape and Crystallization

Journal of Geophysical Research: Solid Earth

Wong, Ying Q.; Segall, Paul; Bradley, Andrew M.; Anderson, Kyle

Physics-based models of volcanic eruptions track conduit processes as functions of depth and time. When used in inversions, these models permit integration of diverse geological and geophysical data sets to constrain important parameters of magmatic systems. We develop a 1-D steady state conduit model for effusive eruptions including equilibrium crystallization and gas transport through the conduit and compare with the quasi-steady dome growth phase of Mount St. Helens in 2005. Viscosity increase resulting from pressure-dependent crystallization leads to a natural transition from viscous flow to frictional sliding on the conduit margin. Erupted mass flux depends strongly on wall rock and magma permeabilities due to their impact on magma density. Including both lateral and vertical gas transport reveals competing effects that produce nonmonotonic behavior in the mass flux when increasing magma permeability. Using this physics-based model in a Bayesian inversion, we link data sets from Mount St. Helens such as extrusion flux and earthquake depths with petrological data to estimate unknown model parameters, including magma chamber pressure and water content, magma permeability constants, conduit radius, and friction along the conduit walls. Even with this relatively simple model and limited data, we obtain improved constraints on important model parameters. We find that the magma chamber had low (<5 wt %) total volatiles and that the magma permeability scale is well constrained at ∼10−11.4m2 to reproduce observed dome rock porosities. Compared with previous results, higher magma overpressure and lower wall friction are required to compensate for increased viscous resistance while keeping extrusion rate at the observed value.

More Details

Bounding the moment deficit rate on crustal faults using geodetic data: Methods

Journal of Geophysical Research: Solid Earth

Maurer, Jeremy; Segall, Paul; Bradley, Andrew M.

The geodetically derived interseismic moment deficit rate (MDR) provides a first-order constraint on earthquake potential and can play an important role in seismic hazard assessment, but quantifying uncertainty in MDR is a challenging problem that has not been fully addressed. We establish criteria for reliable MDR estimators, evaluate existing methods for determining the probability density of MDR, and propose and evaluate new methods. Geodetic measurements moderately far from the fault provide tighter constraints on MDR than those nearby. Previously used methods can fail catastrophically under predictable circumstances. The bootstrap method works well with strong data constraints on MDR, but can be strongly biased when network geometry is poor. We propose two new methods: the Constrained Optimization Bounding Estimator (COBE) assumes uniform priors on slip rate (from geologic information) and MDR, and can be shown through synthetic tests to be a useful, albeit conservative estimator; the Constrained Optimization Bounding Linear Estimator (COBLE) is the corresponding linear estimator with Gaussian priors rather than point-wise bounds on slip rates. COBE matches COBLE with strong data constraints on MDR. We compare results from COBE and COBLE to previously published results for the interseismic MDR at Parkfield, on the San Andreas Fault, and find similar results; thus, the apparent discrepancy between MDR and the total moment release (seismic and afterslip) in the 2004 Parkfield earthquake remains.

More Details

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah A.; Bradley, Andrew M.; Bova, S.W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

Evaluation of the accuracy of an offline seasonally-varying matrix transport model for simulating ideal age

Ocean Modelling

Bardin, Ann; Primeau, François; Lindsay, Keith; Bradley, Andrew M.

Newton–Krylov solvers for ocean tracers have the potential to greatly decrease the computational costs of spinning up deep-ocean tracers, which can take several thousand model years to reach equilibrium with surface processes. One version of the algorithm uses offline tracer transport matrices to simulate an annual cycle of tracer concentrations and applies Newton's method to find concentrations that are periodic in time. Here we present the impact of time-averaging the transport matrices on the equilibrium values of an ideal-age tracer. We compared annually-averaged, monthly-averaged, and 5-day-averaged transport matrices to an online simulation using the ocean component of the Community Earth System Model (CESM) with a nominal horizontal resolution of 1° × 1° and 60 vertical levels. We found that increasing the time resolution of the offline transport model reduced a low age bias from 12% for the annually-averaged transport matrices, to 4% for the monthly-averaged transport matrices, and to less than 2% for the transport matrices constructed from 5-day averages. The largest differences were in areas with strong seasonal changes in the circulation, such as the Northern Indian Ocean. For many applications the relatively small bias obtained using the offline model makes the offline approach attractive because it uses significantly less computer resources and is simpler to set up and run.

More Details
Results 26–50 of 60
Results 26–50 of 60