Publications Search

The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community. However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.

More Details

TYPE Journal Article YEAR 2015

OSTI DOI

Phase Detection with Hidden Markov Models for DVFS on Many-Core Processors

Proceedings - International Conference on Distributed Computing Systems

Booth, Joshua D.; Kotra, Jagadish; Zhao, Hui; Kandemir, Mahmut; Raghavan, Padma

The energy concerns of many-core processors are increasing with the number of cores. We provide a new method that reduces energy consumption of an application on many-core processors by identifying unique segments to apply dynamic voltage and frequency scaling (DVFS). Our method, phase-based voltage and frequency scaling (PVFS), hinges on the identification of phases, i.e., Segments of code with unique performance and power attributes, using hidden Markov Models. In particular, we demonstrate the use of this method to target hardware components on many-core processors such as Network-on-Chip (NoC). PVFS uses these phases to construct a static power schedule that uses DVFS to reduce energy with minimal performance penalty. This general scheme can be used with a variety of performance and power metrics to match the needs of the system and application. More importantly, the flexibility in the general scheme allows for targeting of the unique hardware components of future many-core processors. We provide an in-depth analysis of PVFS applied to five threaded benchmark applications, and demonstrate the advantage of using PVFS for 4 to 32 cores in a single socket. Empirical results of PVFS show a reduction of up to 10.1% of total energy while only impacting total time by at most 2.7% across all core counts. Furthermore, PVFS outperforms standard coarse-grain time-driven DVFS, while scaling better in terms of energy savings with increasing core counts.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

High-Performance Graph Analytics on Manycore Processors

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015

Slota, George M.; Rajamanickam, Sivasankaran R.; Madduri, Kamesh

The divergence in the computer architecture landscape has resulted in different architectures being considered mainstream at the same time. For application and algorithm developers, a dilemma arises when one must focus on using underlying architectural features to extract the best performance on each of these architectures, while writing portable code at the same time. We focus on this problem with graph analytics as our target application domain. In this paper, we present an abstraction-based methodology for performance-portable graph algorithm design on manicure architectures. We demonstrate our approach by systematically optimizing algorithms for the problems of breadth-first search, color propagation, and strongly connected components. We use Kokkos, a manicure library and programming model, for prototyping our algorithms. Our portable implementation of the strongly connected components algorithm on the NVIDIA Tesla K40M is up to 3.25× faster than a state-of-the-art parallel CPU implementation on a dual-socket Sandy Bridge compute node.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Cooperative Computing for Autonomous Data Centers

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015

Berry, Jonathan W.; Collins, Michael; Kearns, Aaron; Phillips, Cynthia A.; Saia, Jared; Smith, Randy

We present a new distributed model for graph computations motivated by limited information sharing. Two or more independent entities have collected large social graphs. They wish to compute the result of running graph algorithms on the entire set of relationships. Because the information is sensitive or economically valuable, they do not wish to simply combine the information in a single location. We consider two models for computing the solution to graph algorithms in this setting: 1) limited-sharing: the two entities can share only a poly logarithmic size subgraph, 2) low-trust: the entities must not reveal any information beyond the query answer, assuming they are all honest but curious. We believe this model captures realistic constraints on cooperating autonomous data centres' have results for both models for s-t connectivity, one of the simplest graph problems that requires global information in the worst case. In the limited-sharing model, our results exploit social network structure. Standard communication complexity gives polynomial lower bounds on s-t connectivity for general graphs. However, if the graph for each data centre has a giant component and these giant components intersect, then we can overcome this lower bound, computing-t connectivity while exchanging O(log 2 n) bits for a constant number of data centers. We can also test the assumption that the giant components overlap using O(log 2 n) bits provided the (unknown) overlap is sufficiently large. The second result is in the low trust model. We give a secure multi-party computation (MPC) algorithm that 1) does not make cryptographic assumptions when there are 3 or more entities, and 2) is efficient, especially when compared to the usual garbled circuit approach. The entities learn only the yes/no answer. No party learns anything about the others' graph, not even node names. This algorithm does not require any special graph structure. This secure MPC result for s-t connectivity is one of the first that involves a few parties computing on large inputs, instead of many parties computing on a few local values.

More Details

TYPE Conference Poster YEAR 2015

OSTI Scopus

Sandia Equation of State (EOS) Deep Dive

Robinson, Allen C.; Mattsson, Thomas M.; Shulenburger, Luke N.; Carpenter, John H.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Beyond Moore's Law and Implications for Computing in Space

DeBenedictis, Erik; Cook, Jeanine C.; Metodi, Tzvetan S.; Hoemmen, Mark F.; Marinella, Matthew J.; Schiek, Richard S.; Zima, Hans

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Constrained Interpolation Remap for Interface-Capturing Finite Element Methods Applied to Multi-Material Electromagnetics

Kramer, Richard M.; Siefert, Christopher S.; Voth, Thomas E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Ice Sheet Dynamics: High-Order Approximation on the Sphere

Perego, Mauro P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A Highly-Scalable Implicit SPH Code for Simulating Single- and Multi-phase Flows in Geometrically Complex Bounded Domains

Trask, Nathaniel; Kim, Kyungjoo K.; Tartakovsky, Alexadre; Perego, Mauro P.; Parks, Michael L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A Summary Of Modeling and Simulation Assessments Techniques 2015

Weirs, Vincent G.; Kamm, James R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Secure Distributed Membership Tests via Secret Sharing How to Hide Your Hostile Hosts: Harnessing Shamir Secret Sharing

Kroeger, Thomas M.; Zage, David J.; Benson, Thomas; Donoghue, Nolan P.; Xu, Helen; Hahn, Bridger; Saia, Jared

More Details

TYPE Conference Poster YEAR 2015

OSTI

Risk-averse Optimization with PDE Constraints

Kouri, Drew P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Human Factors within the Transportation Security Administration: Optimizing performance through human factors assessments

Speed, Ann S.; Kudrick, Bonnie; Caggiano, Daniel

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Monitoring and Accelerating Progressive Hedging with Cross-scenario Information

Siirola, John D.; Watson, Jean-Paul W.; Woodruff, David L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Kokkos: Enabling Performance Portability Across Next Generation Platforms

Edwards, Harold C.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Different weightings of input components to hippocampal CA1 place cells in young and aged rats

Chance, Frances S.; Maurer, Andrew; Burke, Sara; Barnes, Carol

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Modeling Mathematical Programs with Equilibrium Constraints in Pyomo

Hart, William E.; Siirola, John D.

We describe new capabilities for modeling MPEC problems within the Pyomo modeling software. These capabilities include new modeling components that represent complementar- ity conditions, modeling transformations for re-expressing models with complementarity con- ditions in other forms, and meta-solvers that apply transformations and numeric optimization solvers to optimize MPEC problems. We illustrate the breadth of Pyomo's modeling capabil- ities for MPEC problems, and we describe how Pyomo's meta-solvers can perform local and global optimization of MPEC problems.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI