Publications

Results 1–25 of 187
Skip to search filters

A block coordinate descent optimizer for classification problems exploiting convexity

CEUR Workshop Proceedings

Patel, Ravi G.; Trask, Nathaniel A.; Gulian, Mamikon G.; Cyr, Eric C.

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting gains of second-order methods over gradient descent. A Tensorflow implementation of the algorithm is available at github.com/rgp62/.

More Details

A block preconditioner for an exact penalty formulation for stationary MHD

SIAM Journal on Scientific Computing

Phillips, Edward G.; Elman, Howard C.; Cyr, Eric C.; Shadid, John N.; Pawlowski, Roger P.

The magnetohydrodynamics (MHD) equations are used to model the flow of electrically conducting fluids in such applications as liquid metals and plasmas. This system of nonself-adjoint, nonlinear PDEs couples the Navier-Stokes equations for fluids and Maxwell's equations for electromagnetics. There has been recent interest in fully coupled solvers for the MHD system because they allow for fast steady-state solutions that do not require pseudo-time-stepping. When the fully coupled system is discretized, the strong coupling can make the resulting algebraic systems difficult to solve, requiring effective preconditioning of iterative methods for efficiency. In this work, we consider a finite element discretization of an exact penalty formulation for the stationary MHD equations posed in two-dimensional domains. This formulation has the benefit of implicitly enforcing the divergence-free condition on the magnetic field without requiring a Lagrange multiplier. We consider extending block preconditioning techniques developed for the Navier-Stokes equations to the full MHD system. We analyze operators arising in block decompositions from a continuous perspective and apply arguments based on the existence of approximate commutators to develop new preconditioners that account for the physical coupling. This results in a family of parameterized block preconditioners for both Picard and Newton linearizations. We develop an automated method for choosing the relevant parameters and demonstrate the robustness of these preconditioners for a range of the physical nondimensional parameters and with respect to mesh refinement.

More Details

A comparison of adjoint and data-centric verification techniques

Cyr, Eric C.; Shadid, John N.; Pawlowski, Roger P.

This document summarizes the results from a level 3 milestone study within the CASL VUQ effort. We compare the adjoint-based a posteriori error estimation approach with a recent variant of a data-centric verification technique. We provide a brief overview of each technique and then we discuss their relative advantages and disadvantages. We use Drekar::CFD to produce numerical results for steady-state Navier Stokes and SARANS approximations. 3

More Details

A linearity preserving nodal variation limiting algorithm for continuous Galerkin discretization of ideal MHD equations

Journal of Computational Physics

Mabuza, Sibusiso M.; Shadid, John N.; Cyr, Eric C.; Pawlowski, Roger P.; Kuzmin, Dmitri

In this work, a stabilized continuous Galerkin (CG) method for magnetohydrodynamics (MHD) is presented. Ideal, compressible inviscid MHD equations are discretized in space on unstructured meshes using piecewise linear or bilinear finite element bases to get a semi-discrete scheme. Stabilization is then introduced to the semi-discrete method in a strategy that follows the algebraic flux correction paradigm. This involves adding some artificial diffusion to the high order, semi-discrete method and mass lumping in the time derivative term. The result is a low order method that provides local extremum diminishing properties for hyperbolic systems. The difference between the low order method and the high order method is scaled element-wise using a limiter and added to the low order scheme. The limiter is solution dependent and computed via an iterative linearity preserving nodal variation limiting strategy. The stabilization also involves an optional consistent background high order dissipation that reduces phase errors. The resulting stabilized scheme is a semi-discrete method that can be applied to inviscid shock MHD problems and may be even extended to resistive and viscous MHD problems. To satisfy the divergence free constraint of the MHD equations, we add parabolic divergence cleaning to the system. Various time integration methods can be used to discretize the scheme in time. We demonstrate the robustness of the scheme by solving several shock MHD problems.

More Details

A physics-informed operator regression framework for extracting data-driven continuum models

Computer Methods in Applied Mechanics and Engineering

Patel, Ravi G.; Trask, Nathaniel A.; Wood, Mitchell A.; Cyr, Eric C.

The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. Here, we demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration.

More Details

A time-parallel method for the solution of PDE-constrained optimization problems

Ridzal, Denis R.; Cyr, Eric C.; Hajghassem, Mona H.

We study a time-parallel approach to solving quadratic optimization problems with linear time-dependent partial differential equation (PDE) constraints. These problems arise in formulations of optimal control, optimal design and inverse problems that are governed by parabolic PDE models. They may also arise as subproblems in algorithms for the solution of optimization problems with nonlinear time-dependent PDE constraints, e.g., in sequential quadratic programming methods. We apply a piecewise linear finite element discretization in space to the PDE constraint, followed by the Crank-Nicolson discretization in time. The objective function is discretized using finite elements in space and the trapezoidal rule in time. At this point in the discretization, auxiliary state variables are introduced at each discrete time interval, with the goal to enable: (i) a decoupling in time; and (ii) a fixed-point iteration to recover the solution of the discrete optimality system. The fixed-point iterative schemes can be used either as preconditioners for Krylov subspace methods or as smoothers for multigrid (in time) schemes. We present promising numerical results for both use cases.

More Details
Results 1–25 of 187
Results 1–25 of 187