Publications Search

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

All-quad meshing without cleanup

CAD Computer Aided Design

Ebeida, Mohamed; Rushdi, Ahmad A.; Mitchell, Scott A.; Mahmoud, Ahmed H.; Bajaj, Chandrajit C.

We present an all-quad meshing algorithm for general domains. We start with a strongly balanced quadtree. In contrast to snapping the quadtree corners onto the geometric domain boundaries, we move them away from the geometry. Then we intersect the moved grid with the geometry. The resulting polygons are converted into quads with midpoint subdivision. Moving away avoids creating any flat angles, either at a quadtree corner or at a geometry–quadtree intersection. We are able to handle two-sided domains, and more complex topologies than prior methods. The algorithm is provably correct and robust in practice. It is cleanup-free, meaning we have angle and edge length bounds without the use of any pillowing, swapping, or smoothing. Thus, our simple algorithm is fast and predictable. This paper has better quality bounds, and the algorithm is demonstrated over more complex domains, than our prior version.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

Modeling propellant-based stimulation of a borehole with peridynamics

International Journal of Rock Mechanics and Mining Sciences

Panchadhara, Rohan; Gordon, Peter A.; Parks, Michael L.

A non-local formulation of classical continuum mechanics theory known as peridynamics is used to study fracture initiation and growth from a wellbore penetrating the subsurface within the context of propellant-based stimulation. The principal objectives of this work are to analyze the influence of loading conditions on the resulting fracture pattern, to investigate the effect of in-situ stress anisotropy on fracture propagation, and to assess the suitability of peridynamics for modeling complex fracture formation. It is shown that the loading rate significantly influences the number and extent of fractures initiated from a borehole. Results show that low loading rates produce fewer but longer fractures, whereas high loading rates produce numerous shorter fractures around the borehole. The numerical method is able to predict fracture growth patterns over a wide range of loading and stress conditions. Our results also show that fracture growth is attenuated with increasing in-situ confining stress, and, in the case of confining stress anisotropy, fracture extensions are largest in the direction perpendicular to the minimum compressive stress. Since the results are in broad qualitative agreement with experimental and numerical studies found in the literature, suggesting that peridynamics can be a powerful tool in the study of complex fracture network formation.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem (PI Meeting Slides 2017)

Moreland, Kenneth D.; Geveci, Berk; Pugmire, David; Rogers, David; Ma, Kwan-Liu; Childs, Hank

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI OSTI

Monotone local projection stabilization for nonlinear hyperbolic systems

Mabuza, Sibusiso; Shadid, John N.; Kuzmin, Dmitri

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI OSTI

High-throughput stochastic tensile performance of additively manufactured stainless steel

Journal of Materials Processing Technology

Boyce, Brad L.; Salzbrenner, Bradley; Rodelas, Jeffrey; Madison, Jonathan D.; Jared, Bradley H.; Swiler, Laura P.; Shen, Yu L.

An adage within the Additive Manufacturing (AM) community is that “complexity is free”. Complicated geometric features that normally drive manufacturing cost and limit design options are not typically problematic in AM. While geometric complexity is usually viewed from the perspective of part design, this advantage of AM also opens up new options in rapid, efficient material property evaluation and qualification. In the current work, an array of 100 miniature tensile bars are produced and tested for a comparable cost and in comparable time to a few conventional tensile bars. With this technique, it is possible to evaluate the stochastic nature of mechanical behavior. The current study focuses on stochastic yield strength, ultimate strength, and ductility as measured by strain at failure (elongation). However, this method can be used to capture the statistical nature of many mechanical properties including the full stress-strain constitutive response, elastic modulus, work hardening, and fracture toughness. Moreover, the technique could extend to strain-rate and temperature dependent behavior. As a proof of concept, the technique is demonstrated on a precipitation hardened stainless steel alloy, commonly known as 17-4PH, produced by two commercial AM vendors using a laser powder bed fusion process, also commonly known as selective laser melting. Using two different commercial powder bed platforms, the vendors produced material that exhibited slightly lower strength and markedly lower ductility compared to wrought sheet. Moreover, the properties were much less repeatable in the AM materials as analyzed in the context of a Weibull distribution, and the properties did not consistently meet minimum allowable requirements for the alloy as established by AMS. The diminished, stochastic properties were examined in the context of major contributing factors such as surface roughness and internal lack-of-fusion porosity. This high-throughput capability is expected to be useful for follow-on extensive parametric studies of factors that affect the statistical reliability of AM components.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

ShyLU: A Collection of Node-Scalable Sparse Linear Solvers

Rajamanickam, Sivasankaran; Bradley, Andrew M.; Kim, Kyungjoo; Boman, Erik G.; Deveci, Mehmet

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Publications

Search results