ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
ACM International Conference Proceeding Series
With many recent advances in interconnect technologies and memory interfaces, disaggregated memory systems are approaching industrial adoption. For instance, the recent Gen-Z consortium focuses on a new memory semantic protocol that enables fabric-attached memories (FAM), where the memory and other compute units can be directly attached to fabric interconnects. Decoupling of memory from compute units becomes a feasible option as the rate of data transfer increases due to the emergence of novel interconnect technologies, such as Silicon Photonic Interconnects. Disaggregated memories not only enable more efficient use of capacity (minimizes under-utilization) they also allow easy integration of evolving technologies. Additionally, they simplify the programming model at the same time allowing efficient sharing of data. However, the latency of accessing the data in these Fabric Attached disaggregated Memories (FAMs) is dependent on the latency imposed by the fabric interfaces. To reduce memory access latency and to improve the performance of FAM systems, in this paper, we explore techniques to prefetch data from FAMs to the local memory present in the node (PreFAM). We realize that since the memory access latency is high in FAMs, prefetching a cache block (64 bytes) from FAM can be inefficient, since the possibility of issuing demand requests before the completion of prefetch requests, to the same FAM locations, is high. Hence, we explore predicting and prefetching FAM blocks at a distance; prefetching blocks which are going to be accessed in future but not immediately. We show that, with prefetching, the performance of FAM architectures increases by 38.84%, while memory access latency is improved by 39.6%, with only 17.65% increase in the number of accesses to the FAM, on average. Further, by prefetching at a distance we show a performance improvement of 72.23%.
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD
We employ a fully charge self-consistent quantum transport formalism, together with a heuristic elastic scattering model, to study the local density of state (LDOS) and the conductive properties of Si:P δ-layer wires at the cryogenic temperature of 4 K. The simulations allow us to explain the origin of shallow conducting sub-bands, recently observed in high resolution angle-resolved photoemission spectroscopy experiments. Our LDOS analysis shows the free electrons are spatially separated in layers with different average kinetic energies, which, along with elastic scattering, must be accounted for to reproduce the sheet resistance values obtained over a wide range of the δ-layer donor densities.
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD
One big challenge of the emerging atomic precision advanced manufacturing (APAM) technology for microelectronics application is to realize APAM devices that operate at room temperature (RT). We demonstrate that semiclassical technology computer aided design (TCAD) device simulation tool can be employed to understand current leakage and improve APAM device design for RT operation. To establish the applicability of semiclassical simulation, we first show that a semiclassical impurity scattering model with the Fermi-Dirac statistics can explain the very low mobility in APAM devices quite well; we also show semiclassical TCAD reproduces measured sheet resistances when proper mobility values are used. We then apply semiclassical TCAD to simulate current leakage in realistic APAM wires. With insights from modeling, we were able to improve device design, fabricate Hall bars, and demonstrate RT operation for the very first time.
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD
We present a Physics-Informed Graph Neural Network (pigNN) methodology for rapid and automated compact model development. It brings together the inherent strengths of data-driven machine learning, high-fidelity physics in TCAD simulations, and knowledge contained in existing compact models. In this work, we focus on developing a neural network (NN) based compact model for a non-ideal PN diode that represents one nonlinear edge in a pigNN graph. This model accurately captures the smooth transition between the exponential and quasi-linear response regions. By learning voltage dependent non-ideality factor using NN and employing an inverse response function in the NN loss function, the model also accurately captures the voltage dependent recombination effect. This NN compact model serves as basis model for a PN diode that can be a single device or represent an isolated diode in a complex device determined by topological data analysis (TDA) methods. The pigNN methodology is also applicable to derive reduced order models in other engineering areas.
2020 IEEE High Performance Extreme Computing Conference, HPEC 2020
Canonical Polyadic tensor decomposition using alternate Poisson regression (CP-APR) is an effective analysis tool for large sparse count datasets. One of the variants using projected damped Newton optimization for row subproblems (PDNR) offers quadratic convergence and is amenable to parallelization. Despite its potential effectiveness, PDNR performance on modern high performance computing (HPC) systems is not well understood. To remedy this, we have developed a parallel implementation of PDNR using Kokkos, a performance portable parallel programming framework supporting efficient runtime of a single code base on multiple HPC systems. We demonstrate that the performance of parallel PDNR can be poor if load imbalance associated with the irregular distribution of nonzero entries in the tensor data is not addressed. Preliminary results using tensors from the FROSTT data set indicate that using multiple kernels to address this imbalance when solving the PDNR row subproblems in parallel can improve performance, with up to 80% speedup on CPUs and 10-fold speedup on NVIDIA GPUs.
2020 IEEE High Performance Extreme Computing Conference, HPEC 2020
Tensor decomposition models play an increasingly important role in modern data science applications. One problem of particular interest is fitting a low-rank Canonical Polyadic (CP) tensor decomposition model when the tensor has sparse structure and the tensor elements are nonnegative count data. SparTen is a high-performance C++ library which computes a low-rank decomposition using different solvers: a first-order quasi-Newton or a second-order damped Newton method, along with the appropriate choice of runtime parameters. Since default parameters in SparTen are tuned to experimental results in prior published work on a single real-world dataset conducted using MATLAB implementations of these methods, it remains unclear if the parameter defaults in SparTen are appropriate for general tensor data. Furthermore, it is unknown how sensitive algorithm convergence is to changes in the input parameter values. This report addresses these unresolved issues with large-scale experimentation on three benchmark tensor data sets. Experiments were conducted on several different CPU architectures and replicated with many initial states to establish generalized profiles of algorithm convergence behavior.
Physical Review E
Intuition tells us that a rolling or spinning sphere will eventually stop due to the presence of friction and other dissipative interactions. The resistance to rolling and spinning or twisting torque that stops a sphere also changes the microstructure of a granular packing of frictional spheres by increasing the number of constraints on the degrees of freedom of motion. We perform discrete element modeling simulations to construct sphere packings implementing a range of frictional constraints under a pressure-controlled protocol. Mechanically stable packings are achievable at volume fractions and average coordination numbers as low as 0.53 and 2.5, respectively, when the particles experience high resistance to sliding, rolling, and twisting. Only when the particle model includes rolling and twisting friction were experimental volume fractions reproduced.
ESAIM: Control, Optimisation and Calculus of Variations
In this paper, we consider the optimal control of semilinear elliptic PDEs with random inputs. These problems are often nonconvex, infinite-dimensional stochastic optimization problems for which we employ risk measures to quantify the implicit uncertainty in the objective function. In contrast to previous works in uncertainty quantification and stochastic optimization, we provide a rigorous mathematical analysis demonstrating higher solution regularity (in stochastic state space), continuity and differentiability of the control-to-state map, and existence, regularity and continuity properties of the control-to-adjoint map. Our proofs make use of existing techniques from PDE-constrained optimization as well as concepts from the theory of measurable multifunctions. We illustrate our theoretical results with two numerical examples motivated by the optimal doping of semiconductor devices.
Abstract not provided.
Applied Physics Reviews
Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
INFORMS Journal on Computing
We provide a comprehensive overview of mixed-integer programming formulations for the unit commitment (UC) problem. UC formulations have been an especially active area of research over the past 12 years due to their practical importance in power grid operations, and this paper serves as a capstone for this line of work. We additionally provide publicly available reference implementations of all formulations examined. We computationally test existing and novel UC formulations on a suite of instances drawn from both academic and real-world data sources. Driven by our computational experience from this and previous work, we contribute some additional formulations for both generator production upper bounds and piecewise linear production costs. By composing new UC formulations using existing components found in the literature and new components introduced in this paper, we demonstrate that performance can be significantly improved—and in the process, we identify a new state-of-the-art UC formulation.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
GPUs are now a fundamental accelerator for many high-performance computing applications. They are viewed by many as a technology facilitator for the surge in fields like machine learning and Convolutional Neural Networks. To deliver the best performance on a GPU, we need to create monitoring tools to ensure that we optimize the code to get the most performance and efficiency out of a GPU. Since NVIDIA GPUs are currently the most commonly implemented in HPC applications and systems, NVIDIA tools are the solution for performance monitoring. The Light-Weight Distributed Metric System (LDMS) at Sandia is an infrastructure widely adopted for large-scale systems and application monitoring. Sandia has developed CPU application monitoring capability within LDMS. Therefore, we chose to develop a GPU monitoring capability within the same framework. In this report, we discuss the current limitations in the NVIDIA monitoring tools, how we overcame such limitations, and present an overview of the tool we built to monitor GPU performance in LDMS and its capabilities. Also, we discuss our current validation results. Most of the performance counter results are the same in both vendor tools and our tool when using LDMS to collect these results. Furthermore, our tool provides these statistics during the entire runtime of the tool as a time series and not just aggregate statistics at the end of the application run. This allows the user to see the progress of the behavior of the applications during their lifetime.
Abstract not provided.
Abstract not provided.
Journal of Numerical Mathematics
This paper provides an overview of the new features of the finite element library deal.II, version 9.2.
Abstract not provided.
Journal of Materials Research
Hydrogen lithography has been used to template phosphine-based surface chemistry to fabricate atomic-scale devices, a process we abbreviate as atomic precision advanced manufacturing (APAM). Here, we use mid-infrared variable angle spectroscopic ellipsometry (IR-VASE) to characterize single-nanometer thickness phosphorus dopant layers (δ-layers) in silicon made using APAM compatible processes. A large Drude response is directly attributable to the δ-layer and can be used for nondestructive monitoring of the condition of the APAM layer when integrating additional processing steps. The carrier density and mobility extracted from our room temperature IR-VASE measurements are consistent with cryogenic magneto-transport measurements, showing that APAM δ-layers function at room temperature. Finally, the permittivity extracted from these measurements shows that the doping in the APAM δ-layers is so large that their low-frequency in-plane response is reminiscent of a silicide. However, there is no indication of a plasma resonance, likely due to reduced dimensionality and/or low scattering lifetime.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including graph mining, recommender systems and natural language processing. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we present a novel optimization method for parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating a new Accelerated Locality-Optimized NMF (ALO-NMF) that obtains up to 2.29x lower data movement cost and up to 4.45x speedup over existing state-of-the-art parallel NMF algorithms.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including graph mining, recommender systems and natural language processing. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we present a novel optimization method for parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating a new Accelerated Locality-Optimized NMF (ALO-NMF) that obtains up to 2.29x lower data movement cost and up to 4.45x speedup over existing state-of-the-art parallel NMF algorithms.
Journal of Chemical Physics
We present a scale-bridging approach based on a multi-fidelity (MF) machine-learning (ML) framework leveraging Gaussian processes (GP) to fuse atomistic computational model predictions across multiple levels of fidelity. Through the posterior variance of the MFGP, our framework naturally enables uncertainty quantification, providing estimates of confidence in the predictions. We used density functional theory as high-fidelity prediction, while a ML interatomic potential is used as low-fidelity prediction. Practical materials' design efficiency is demonstrated by reproducing the ternary composition dependence of a quantity of interest (bulk modulus) across the full aluminum-niobium-titanium ternary random alloy composition space. The MFGP is then coupled to a Bayesian optimization procedure, and the computational efficiency of this approach is demonstrated by performing an on-the-fly search for the global optimum of bulk modulus in the ternary composition space. The framework presented in this manuscript is the first application of MFGP to atomistic materials simulations fusing predictions between density functional theory and classical interatomic potential calculations.
ACM International Conference Proceeding Series
Sparse triangular solver is an important kernel in many computational applications. However, a fast, parallel, sparse triangular solver on a manycore architecture such as GPU has been an open issue in the field for several years. In this paper, we develop a sparse triangular solver that takes advantage of the supernodal structures of the triangular matrices that come from the direct factorization of a sparse matrix. We implemented our solver using Kokkos and Kokkos Kernels such that our solver is portable to different manycore architectures. This has the additional benefit of allowing our triangular solver to use the team-level kernels and take advantage of the hierarchical parallelism available on the GPU. We compare the effects of different scheduling schemes on the performance and also investigate an algorithmic variant called the partitioned inverse. Our performance results on an NVIDIA V100 or P100 GPU demonstrate that our implementation can be 12.4 × or 19.5 × faster than the vendor optimized implementation in NVIDIA's CuSPARSE library.
Journal of Physics Condensed Matter
Efficient algorithms for the calculation of minimum energy paths of magnetic transitions are implemented within the geodesic nudged elastic band (GNEB) approach. While an objective function is not available for GNEB and a traditional line search can, therefore, not be performed, the use of limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) and conjugate gradient algorithms in conjunction with orthogonal spin optimization (OSO) approach is shown to greatly outperform the previously used velocity projection and dissipative Landau-Lifschitz dynamics optimization methods. The implementation makes use of energy weighted springs for the distribution of the discretization points along the path and this is found to improve performance significantly. The various methods are applied to several test problems using a Heisenberg-type Hamiltonian, extended in some cases to include Dzyaloshinskii-Moriya and exchange interactions beyond nearest neighbours. Minimum energy paths are found for magnetization reversals in a nano-island, collapse of skyrmions in two-dimensional layers and annihilation of a chiral bobber near the surface of a three-dimensional magnet. The LBFGS-OSO method is found to outperform the dynamics based approaches by up to a factor of 8 in some cases.
Journal of Applied Physics
The Corrected Rigid Spheres (CRIS) equation of state (EOS) model [Kerley, J. Chem. Phys. 73, 469 (1980); 73, 478 (1980); 73, 487 (1980)], developed from fluid perturbation theory using a hard sphere reference system, has been successfully used to calculate the EOS of many materials, including gases and metals. The radial distribution function (RDF) plays a pivotal role in choosing the sphere diameter, through a variational principle, as well as the thermodynamic response. Despite its success, the CRIS model has some shortcomings in that it predicts too large a temperature for liquid-vapor critical points, can break down at large compression, and is computationally expensive. We first demonstrate that an improved analytic representation of the hard sphere RDF does not alleviate these issues. Relaxing the strict adherence of the RDF to hard spheres allows an accurate fit to the isotherms and vapor dome of the Lennard-Jones fluid using an arbitrary reference system. The second order correction is eliminated, limiting the breakdown at large compression and significantly reducing the computation cost. The transferability of the new model to real systems is demonstrated on argon, with an improved vapor dome compared to the original CRIS model.
Journal of Chemical Physics
Hyperdynamics (HD) is a method for accelerating the timescale of standard molecular dynamics (MD). It can be used for simulations of systems with an energy potential landscape that is a collection of basins, separated by barriers, where transitions between basins are infrequent. HD enables the system to escape from a basin more quickly while enabling a statistically accurate renormalization of the simulation time, thus effectively boosting the timescale of the simulation. In the work of Kim et al. [J. Chem. Phys. 139, 144110 (2013)], a local version of HD was formulated, which exploits the intrinsic locality characteristic typical of most systems to mitigate the poor scaling properties of standard HD as the system size is increased. Here, we discuss how both HD and local HD can be formulated to run efficiently in parallel. We have implemented these ideas in the LAMMPS MD code, which means HD can be used with any interatomic potential LAMMPS supports. Together, these parallel methods allow simulations of any size to achieve the time acceleration offered by HD (which can be orders of magnitude), at a cost of 2-4× that of standard MD. As examples, we performed two simulations of a million-atom system to model the diffusion and clustering of Pt adatoms on a large patch of the Pt(100) surface for 80 μs and 160 μs.
IEEE Power and Energy Society General Meeting
This paper presents model formulations for generators that have the ability to use multiple fuels and to switch between them if necessary. These models are used to generate different scenarios of fuel switching penetration from a test power system. With these scenarios, for a severe disruption in the fuel supply to multiple generators, the paper analyzes the effect that fuel switching has on the resilience of the power system. Load not served is used as the proxy metric to evaluate power system resilience. The paper shows that the presence of generators with fuel switching capabilities considerably reduces the amount and duration of the load shed by the system facing the fuel disruption.
Abstract not provided.
Abstract not provided.
Acta Materialia
Determining a process–structure–property relationship is the holy grail of materials science, where both computational prediction in the forward direction and materials design in the inverse direction are essential. Problems in materials design are often considered in the context of process–property linkage by bypassing the materials structure, or in the context of structure–property linkage as in microstructure-sensitive design problems. However, there is a lack of research effort in studying materials design problems in the context of process–structure linkage, which has a great implication in reverse engineering. In this work, given a target microstructure, we propose an active learning high-throughput microstructure calibration framework to derive a set of processing parameters, which can produce an optimal microstructure that is statistically equivalent to the target microstructure. The proposed framework is formulated as a noisy multi-objective optimization problem, where each objective function measures a deterministic or statistical difference of the same microstructure descriptor between a candidate microstructure and a target microstructure. Furthermore, to significantly reduce the physical waiting wall-time, we enable the high-throughput feature of the microstructure calibration framework by adopting an asynchronously parallel Bayesian optimization by exploiting high-performance computing resources. Case studies in additive manufacturing and grain growth are used to demonstrate the applicability of the proposed framework, where kinetic Monte Carlo (kMC) simulation is used as a forward predictive model, such that for a given target microstructure, the target processing parameters that produced this microstructure are successfully recovered.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Physics
The study of hypersonic flows and their underlying aerothermochemical reactions is particularly important in the design and analysis of vehicles exiting and reentering Earth's atmosphere. Computational physics codes can be employed to simulate these phenomena; however, code verification of these codes is necessary to certify their credibility. To date, few approaches have been presented for verifying codes that simulate hypersonic flows, especially flows reacting in thermochemical nonequilibrium. In this work, we present our code-verification techniques for verifying the spatial accuracy and thermochemical source term in hypersonic reacting flows in thermochemical nonequilibrium. Additionally, we demonstrate the effectiveness of these techniques on the Sandia Parallel Aerodynamics and Reentry Code (SPARC).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The propagation of a wave pulse due to low-speed impact on a one-dimensional, heterogeneous bar is studied. Due to the dispersive character of the medium, the pulse attenuates as it propagates. This attenuation is studied over propagation distances that are much longer than the size of the microstructure. A homogenized peridynamic material model can be calibrated to reproduce the attenuation and spreading of the wave. The calibration consists of matching the dispersion curve for the heterogeneous material in the limit of long and moderately long wavelengths. It is demonstrated that the peridynamic method reproduces the attenuation of wave pulses predicted by an exact microstructural model over large propagation distances.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Leibniz International Proceedings in Informatics, LIPIcs
We present a classical approximation algorithm for the MAX-2-Local Hamiltonian problem. This is a maximization version of the QMA-complete 2-Local Hamiltonian problem in quantum computing, with the additional assumption that each local term is positive semidefinite. The MAX-2-Local Hamiltonian problem generalizes NP-hard constraint satisfaction problems, and our results may be viewed as generalizations of approximation approaches for the MAX-2-CSP problem. We work in the product state space and extend the framework of Goemans and Williamson for approximating MAX-2-CSPs. The key difference is that in the product state setting, a solution consists of a set of normalized 3-dimensional vectors rather than boolean numbers, and we leverage approximation results for rank-constrained Grothendieck inequalities. For MAX-2-Local Hamiltonian we achieve an approximation ratio of 0.328. This is the first example of an approximation algorithm beating the random quantum assignment ratio of 0.25 by a constant factor.
Abstract not provided.
ACM International Conference Proceeding Series
Deep learning networks have become a vital tool for image and data processing tasks for deployed and edge applications. Resource constraints, particularly low power budgets, have motivated methods and devices for efficient on-edge inference. Two promising methods are reduced precision communication networks (e.g. binary activation spiking neural networks) and weight pruning. In this paper, we provide a preliminary exploration for combining these two methods, specifically in-training weight pruning of whetstone networks, to achieve deep networks with both sparse weights and binary activations.
ACM International Conference Proceeding Series
While dragonflies are well-known for their high success rates when hunting prey, how the underlying neural circuitry generates the prey-interception trajectories used by dragonflies to hunt remains an open question. I present a model of dragonfly prey interception that uses a neural network to calculate motor commands for prey-interception. The model uses the motor outputs of the neural network to internally generate a forward model of prey-image translation resulting from the dragonfly's own turning that can then serve as a feedback guidance signal, resulting in trajectories with final approaches very similar to proportional navigation. The neural network is biologically-plausible and can therefore can be compared against in vivo neural responses in the biological dragonfly, yet parsimonious enough that the algorithm can be implemented without requiring specialized hardware.
ACM International Conference Proceeding Series
The widely parallel, spiking neural networks of neuromorphic processors can enable computationally powerful formulations. While recent interest has focused on primarily machine learning tasks, the space of appropriate applications is wide and continually expanding. Here, we leverage the parallel and event-driven structure to solve a steady state heat equation using a random walk method. The random walk can be executed fully within a spiking neural network using stochastic neuron behavior, and we provide results from both IBM TrueNorth and Intel Loihi implementations. Additionally, we position this algorithm as a potential scalable benchmark for neuromorphic systems.
Journal of Theoretical Biology
Reaction-diffusion systems have been widely used to study spatio-temporal phenomena in cell biology, such as cell polarization. Coupled bulk-surface models naturally include compartmentalization of cytosolic and membrane-bound polarity molecules. Here we study the distribution of the polarity protein Cdc42 in a mass-conserved membrane-bulk model, and explore the effects of diffusion and spatial dimensionality on spatio-temporal pattern formation. We first analyze a one-dimensional (1-D) model for Cdc42 oscillations in fission yeast, consisting of two diffusion equations in the bulk domain coupled to nonlinear ODEs for binding kinetics at each end of the cell. In 1-D, our analysis reveals the existence of symmetric and asymmetric steady states, as well as anti-phase relaxation oscillations typical of slow-fast systems. We then extend our analysis to a two-dimensional (2-D) model with circular bulk geometry, for which species can either diffuse inside the cell or become bound to the membrane and undergo a nonlinear reaction-diffusion process. We also consider a nonlocal system of PDEs approximating the dynamics of the 2-D membrane-bulk model in the limit of fast bulk diffusion. In all three model variants we find that mass conservation selects perturbations of spatial modes that simply redistribute mass. In 1-D, only anti-phase oscillations between the two ends of the cell can occur, and in-phase oscillations are excluded. In higher dimensions, no radially symmetric oscillations are observed. Instead, the only instabilities are symmetry-breaking, either corresponding to stationary Turing instabilities, leading to the formation of stationary patterns, or to oscillatory Turing instabilities, leading to traveling and standing waves. Codimension-two Bogdanov–Takens bifurcations occur when the two distinct instabilities coincide, causing traveling waves to slow down and to eventually become stationary patterns. Our work clarifies the effect of geometry and dimensionality on behaviors observed in mass-conserved cell polarity models.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Computer Architecture Letters
Neural network (NN) inference is an essential part of modern systems and is found at the heart of numerous applications ranging from image recognition to natural language processing. In situ NN accelerators can efficiently perform NN inference using resistive crossbars, which makes them a promising solution to the data movement challenges faced by conventional architectures. Although such accelerators demonstrate significant potential for dense NNs, they often do not benefit from sparse NNs, which contain relatively few non-zero weights. Processing sparse NNs on in situ accelerators results in wasted energy to charge the entire crossbar where most elements are zeros. To address this limitation, this letter proposes Granular Matrix Reordering (GMR): a preprocessing technique that enables an energy-efficient computation of sparse NNs on in situ accelerators. GMR reorders the rows and columns of sparse weight matrices to maximize the crossbars' utilization and minimize the total number of crossbars needed to be charged. The reordering process does not rely on sparsity patterns and incurs no accuracy loss. Overall, GMR achieves an average of 28 percent and up to 34 percent reduction in energy consumption over seven pruned NNs across four different pruning methods and network architectures.
Abstract not provided.
Abstract not provided.
JPhys Materials
Atomic precision advanced manufacturing (APAM) offers creation of donor devices in an atomically thin layer doped beyond the solid solubility limit, enabling unique device physics. This presents an opportunity to use APAM as a pathfinding platform to investigate digital electronics at the atomic limit. Scaling to smaller transistors is increasingly difficult and expensive, necessitating the investigation of alternative fabrication paths that extend to the atomic scale. APAM donor devices can be created using a scanning tunneling microscope (STM). However, these devices are not currently compatible with industry standard fabrication processes. There exists a tradeoff between low thermal budget (LT) processes to limit dopant diffusion and high thermal budget (HT) processes to grow defect-free layers of epitaxial Si and gate oxide. To this end, we have developed an LT epitaxial Si cap and LT deposited Al2O3 gate oxide integrated with an atomically precise single-electron transistor (SET) that we use as an electrometer to characterize the quality of the gate stack. The surface-gated SET exhibits the expected Coulomb blockade behavior. However, the gate’s leverage over the SET is limited by defects in the layers above the SET, including interfaces between the Si and oxide, and structural and chemical defects in the Si cap. We propose a more sophisticated gate stack and process flow that is predicted to improve performance in future atomic precision devices.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Engineering with Computers
In finite element simulations, not all of the data are of equal importance. In fact, the primary purpose of a numerical study is often to accurately assess only one or two engineering output quantities that can be expressed as functionals. Adjoint-based error estimation provides a means to approximate the discretization error in functional quantities and mesh adaptation provides the ability to control this discretization error by locally modifying the finite element mesh. In the past, adjoint-based error estimation has only been accessible to expert practitioners in the field of solid mechanics. In this work, we present an approach to automate the process of adjoint-based error estimation and mesh adaptation on parallel machines. This process is intended to lower the barrier of entry to adjoint-based error estimation and mesh adaptation for solid mechanics practitioners. We demonstrate that this approach is effective for example problems in Poisson’s equation, nonlinear elasticity, and thermomechanical elastoplasticity.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computer Methods in Applied Mechanics and Engineering
Standard approaches for uncertainty quantification in cardiovascular modeling pose challenges due to the large number of uncertain inputs and the significant computational cost of realistic three-dimensional simulations. We propose an efficient uncertainty quantification framework utilizing a multilevel multifidelity Monte Carlo (MLMF) estimator to improve the accuracy of hemodynamic quantities of interest while maintaining reasonable computational cost. This is achieved by leveraging three cardiovascular model fidelities, each with varying spatial resolution to rigorously quantify the variability in hemodynamic outputs. We employ two low-fidelity models (zero- and one-dimensional) to construct several different estimators. Our goal is to investigate and compare the efficiency of estimators built from combinations of these two low-fidelity model alternatives and our high-fidelity three-dimensional models. We demonstrate this framework on healthy and diseased models of aortic and coronary anatomy, including uncertainties in material property and boundary condition parameters. Our goal is to demonstrate that for this application it is possible to accelerate the convergence of the estimators by utilizing a MLMF paradigm. Therefore, we compare our approach to single fidelity Monte Carlo estimators and to a multilevel Monte Carlo approach based only on three-dimensional simulations, but leveraging multiple spatial resolutions. We demonstrate significant, on the order of 10 to 100 times, reduction in total computational cost with the MLMF estimators. We also examine the differing properties of the MLMF estimators in healthy versus diseased models, as well as global versus local quantities of interest. As expected, global quantities such as outlet pressure and flow show larger reductions than local quantities, such as those relating to wall shear stress, as the latter rely more heavily on the highest fidelity model evaluations. Similarly, healthy models show larger reductions than diseased models. In all cases, our workflow coupling Dakota's MLMF estimators with the SimVascular cardiovascular modeling framework makes uncertainty quantification feasible for constrained computational budgets.
Proceedings of the ACM SIGMOD International Conference on Management of Data
Given an input stream of size N, a †-heavy hitter is an item that occurs at least † N times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = † N-th occurrence (and hence becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams, and with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (ω(N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable trade-off between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., ∼100K observations per second.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Physics
In this work, a stabilized continuous Galerkin (CG) method for magnetohydrodynamics (MHD) is presented. Ideal, compressible inviscid MHD equations are discretized in space on unstructured meshes using piecewise linear or bilinear finite element bases to get a semi-discrete scheme. Stabilization is then introduced to the semi-discrete method in a strategy that follows the algebraic flux correction paradigm. This involves adding some artificial diffusion to the high order, semi-discrete method and mass lumping in the time derivative term. The result is a low order method that provides local extremum diminishing properties for hyperbolic systems. The difference between the low order method and the high order method is scaled element-wise using a limiter and added to the low order scheme. The limiter is solution dependent and computed via an iterative linearity preserving nodal variation limiting strategy. The stabilization also involves an optional consistent background high order dissipation that reduces phase errors. The resulting stabilized scheme is a semi-discrete method that can be applied to inviscid shock MHD problems and may be even extended to resistive and viscous MHD problems. To satisfy the divergence free constraint of the MHD equations, we add parabolic divergence cleaning to the system. Various time integration methods can be used to discretize the scheme in time. We demonstrate the robustness of the scheme by solving several shock MHD problems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Leibniz International Proceedings in Informatics, LIPIcs
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set T of trajectories in the plane, and an integer k-2, we seek to compute a set of k points ("portals") to maximize the total weight of all subtrajectories of T between pairs of portals. This problem naturally arises in trajectory analysis and summarization. We show that the TCP is NP-hard (even in very special cases) and give some first approximation results. Our main focus is on attacking the TCP with practical algorithm-engineering approaches, including integer linear programming (to solve instances to provable optimality) and local search methods. We study the integrality gap arising from such approaches. We analyze our methods on different classes of data, including benchmark instances that we generate. Our goal is to understand the best performing heuristics, based on both solution time and solution quality. We demonstrate that we are able to compute provably optimal solutions for real-world instances. 2012 ACM Subject Classification Theory of computation ! Design and analysis of algorithms.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Optimization Letters
We present SUSPECT, an open source toolkit that symbolically analyzes mixed-integer nonlinear optimization problems formulated using the Python algebraic modeling library Pyomo. We present the data structures and algorithms used to implement SUSPECT. SUSPECT works on a directed acyclic graph representation of the optimization problem to perform: bounds tightening, bound propagation, monotonicity detection, and convexity detection. We show how the tree-walking rules in SUSPECT balance the need for lightweight computation with effective special structure detection. SUSPECT can be used as a standalone tool or as a Python library to be integrated in other tools or solvers. We highlight the easy extensibility of SUSPECT with several recent convexity detection tricks from the literature. We also report experimental results on the MINLPLib 2 dataset.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.