Publications Search

We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integrates processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

A UQ Enabled Aluminum Tabular Multiphase Equation-of-State Model

Carpenter, John H.; Robinson, Allen C.; Wills, Ann E.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Scaling Beyond Moore's Law with Processor-In-Memory-and-Storage (PIMS)

DeBenedictis, Erik

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Comparison of CTH and miniAMR

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Resistive Memory for Neuromorphic Algorithm Acceleration

Marinella, Matthew J.; Agarwal, Sapan A.; Hughart, David R.; Mickel, Patrick R.; Hsia, Alexander W.; Plimpton, Steven J.; Decker, Seth D.; Apodaca, Roger A.; Aimone, James B.; James, Conrad D.; Draelos, Timothy J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Designing Shallow Donors in Diamond

Moussa, Jonathan E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Quantum Monte Carlo Studies of Bulk and Few- or Single-Layer Black Phosphorus

Shulenburger, Luke N.; Baczewski, Andrew D.; Zhu, Zhen; Guan, Jie; Tomanek, David

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A 2-edge-connected spanning subgraph problem

Carr, Robert D.; Parekh, Ojas D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

VTK-m Overview (NVIDIA Design Review)

Moreland, Kenneth D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Exploiting data representation for fault tolerance

Journal of Computational Science

Laros, James H.; Hoemmen, Mark F.; Mueller, F.

Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

More Details

TYPE Journal Article YEAR 2015

DOI OSTI

Why do simple algorithms for triangle enumeration work in the real world?

Internet Mathematics

Berry, Jonathan W.; Fostvedt, Luke A.; Nordman, Daniel J.; Phillips, Cynthia A.; Comandur, Seshadhri C.; Wilson, Alyson G.

Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time? We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length 2, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length 2 in which the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm. We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the l4/3-norm of the degree sequence. Norms of the degree sequence are a measure of the heaviness of the tail, and it is precisely this feature that allows non trivial speedups of simple triangle enumeration algorithms. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent α ≥ 7/3, and non trivial speedup whenever α ∈ (2, 3).

More Details

TYPE Conference YEAR 2015

Scopus OSTI

Assessing the role of mini-applications in predicting key performance characteristics of scientific and engineering applications

Journal of Parallel and Distributed Computing

Barrett, R.F.; Crozier, Paul C.; Doerfler, Douglas W.; Heroux, Michael A.; Lin, Paul L.; Thornquist, Heidi K.; Trucano, Timothy G.; Vaughan, Courtenay T.

Computational science and engineering application programs are typically large, complex, and dynamic, and are often constrained by distribution limitations. As a means of making tractable rapid explorations of scientific and engineering application programs in the context of new, emerging, and future computing architectures, a suite of "miniapps" has been created to serve as proxies for full scale applications. Each miniapp is designed to represent a key performance characteristic that does or is expected to significantly impact the runtime performance of an application program. In this paper we introduce a methodology for assessing the ability of these miniapps to effectively represent these performance issues. We applied this methodology to three miniapps, examining the linkage between them and an application they are intended to represent. Herein we evaluate the fidelity of that linkage. This work represents the initial steps required to begin to answer the question, "Under what conditions does a miniapp represent a key performance characteristic in a full app?"

More Details

TYPE Journal Article YEAR 2015

Scopus OSTI DOI

A hybrid approach for parallel transistor-level full-chip circuit simulation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Thornquist, Heidi K.; Rajamanickam, Sivasankaran R.

The computer-aided design (CAD) applications that are fundamental to the electronic design automation industry need to harness the available hardware resources to be able to perform full-chip simulation for modern technology nodes (45nm and below). We will present a hybrid (MPI+threads) approach for parallel transistor-level transient circuit simulation that achieves scalable performance for some challenging large-scale integrated circuits. This approach focuses on the computationally expensive part of the simulator: the linear system solve. Hybrid versions of two iterative linear solver strategies are presented, one takes advantage of block triangular form structure while the other uses a Schur complement technique. Results indicate up to a 27x improvement in total simulation time on 256 cores.

More Details

TYPE Conference YEAR 2015

Scopus OSTI DOI

Preserving lagrangian structure in nonlinear model reduction with application to structural dynamics

SIAM Journal on Scientific Computing

Carlberg, Kevin; Tuminaro, Raymond S.; Boggs, Paul

This work proposes a model-reduction methodology that preserves Lagrangian structure and achieves computational efficiency in the presence of high-order nonlinearities and arbitrary parameter dependence. As such, the resulting reduced-order model retains key properties such as energy conservation and symplectic time-evolution maps. We focus on parameterized simple mechanical systems subjected to Rayleigh damping and external forces, and consider an application to nonlinear structural dynamics. To preserve structure, the method first approximates the system's "Lagrangian ingredients"-the Riemannian metric, the potential-energy function, the dissipation function, and the external force-and subsequently derives reduced-order equations of motion by applying the (forced) Euler-Lagrange equation with these quantities. From the algebraic perspective, key contributions include two efficient techniques for approximating parameterized reduced matrices while preserving symmetry and positive definiteness: matrix gappy proper orthogonal decomposition and reduced-basis sparsification. Results for a parameterized truss-structure problem demonstrate the practical importance of preserving Lagrangian structure and illustrate the proposed method's merits: it reduces computation time while maintaining high accuracy and stability, in contrast to existing nonlinear model-reduction techniques that do not preserve structure.

More Details

TYPE Journal Article YEAR 2015

Scopus OSTI DOI

Publications

Search results