Results 6901–6950 of 9,998
Sort by Date
Sort by Title
Standard Format
Show Abstracts
As Citations (APA)
Search results
Jump to search filters
Salinger, Andrew G.
Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Moreland, Kenneth D.
The predictions for exascale computing are dire. Although we have benefited from a consistent supercomputer architecture design, even across manufacturers, for well over a decade, recent trends indicate that future high-performance computers will have different hardware structure and programming models to which software must adapt. This paper provides an informal discussion on the ways in which changes in high-performance computing architecture will profoundly affect the scalability of our current generation of scientific visualization and analysis codes and how we must adapt our applications, workflows, and attitudes to continue our success at exascale computing. © 2012 IEEE.
Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Barrett, Richard F. ; Crozier, Paul C. ; Doerfler, Douglas W. ; Hammond, Simon D. ; Heroux, Michael A. ; Lin, Paul L. ; Trucano, Timothy G. ; Vaughan, Courtenay T. ; Williams, Alan B.
The push to exascale computing is informed by the assumption that the architecture, regardless of the specific design, will be fundamentally different from petascale computers. The Mantevo project has been established to produce a set of proxies, or 'miniapps,' which enable rapid exploration of key performance issues that impact a broad set of scientific applications programs of interest to ASC and the broader HPC community. Understanding the conditions under which a miniapp can be confidently used as predictive of an applications' behavior must be clearly elucidated. Toward this end, we have developed a methodology for assessing the predictive capabilities of application proxies. Adhering to the spirit of experimental validation, our approach provides a framework for examining data from the application with that provided by their proxies. In this poster we present this methodology, and apply it to three miniapps developed by the Mantevo project. © 2012 IEEE.
Reliability Engineering and System Safety
Weirs, V.G.; Kamm, James R. ; Swiler, Laura P. ; Tarantola, Stefano; Ratto, Marco; Adams, Brian M. ; Rider, William J. ; Eldred, Michael S.
Sensitivity analysis is comprised of techniques to quantify the effects of the input variables on a set of outputs. In particular, sensitivity indices can be used to infer which input parameters most significantly affect the results of a computational model. With continually increasing computing power, sensitivity analysis has become an important technique by which to understand the behavior of large-scale computer simulations. Many sensitivity analysis methods rely on sampling from distributions of the inputs. Such sampling-based methods can be computationally expensive, requiring many evaluations of the simulation; in this case, the Sobol method provides an easy and accurate way to compute variance-based measures, provided a sufficient number of model evaluations are available. As an alternative, meta-modeling approaches have been devised to approximate the response surface and estimate various measures of sensitivity. In this work, we consider a variety of sensitivity analysis methods, including different sampling strategies, different meta-models, and different ways of evaluating variance-based sensitivity indices. The problem we consider is the 1-D Riemann problem. By a careful choice of inputs, discontinuous solutions are obtained, leading to discontinuous response surfaces; such surfaces can be particularly problematic for meta-modeling approaches. The goal of this study is to compare the estimated sensitivity indices with exact values and to evaluate the convergence of these estimates with increasing samples sizes and under an increasing number of meta-model evaluations. © 2011 Elsevier Ltd. All rights reserved.
DeBenedictis, Erik
Schultz, Peter A.
Gao, Xujiao G. ; Nielsen, Erik N. ; Muller, Richard P. ; Young, Ralph W. ; Salinger, Andrew G. ; Kalashnikova, Irina
Spotz, William S.
Phillips, Cynthia A. ; Plimpton, Steven J.
Schultz, Peter A.
Brightwell, Ronald B. ; Pedretti, Kevin; Wheeler, Kyle B. ; Hemmert, Karl S. ; Barrett, Brian B.
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.
Swiler, Laura P. ; Romero, Vicente J.
Lofstead, Gerald F.
Lofstead, Gerald F. ; Oldfield, Ron A.
Moreland, Kenneth D.
Desjarlais, Michael P. ; Knudson, Marcus D. ; Thompson, Aidan P. ; Lane, James M.
Forsythe, James C. ; Glickman, Matthew R. ; Haass, Michael J. ; Whetzel, Jonathan H.
Moreland, Kenneth D.
Fabian, Nathan D. ; McClain, Jonathan T. ; Davis, Warren L.
Liu, Zhen L. ; Safta, Cosmin S. ; Sargsyan, Khachik S. ; van Bloemen Waanders, Bart G. ; Bambha, Ray B. ; Michelsen, Hope A.
Missert, Nancy A. ; Garcia, Robert M. ; Nagasubramanian, Ganesan N. ; Leung, Kevin L. ; Rempe, Susan R. ; Rogers, David R.
Curry, Matthew L.
Parks, Michael L.
Proposed for publication in Engineering with Computers.
Baczewski, Andrew D.
Ridzal, Denis R.
Lofstead, Gerald F.
Littlewood, David J. ; Tikare, Veena T.
Knupp, Patrick K. ; Day, David M.
Barrett, Richard F. ; Hammond, Simon D. ; Vaughan, Courtenay T. ; Doerfler, Douglas W. ; Heroux, Michael A.
Hammond, Simon D.
Silling, Stewart A.
Olivier, Stephen L.
Olivier, Stephen L.
Doerfler, Douglas W.
Littlewood, David J. ; Mish, Kyran D. ; Pierson, Kendall H.
Rajan, Mahesh R. ; Doerfler, Douglas W. ; Lin, Paul L. ; Hammond, Simon D. ; Barrett, Richard F. ; Vaughan, Courtenay T.
Proposed for publication in Human Factors: The Journal of Human Factors and Ergonomics Society.
Abbott, Robert G. ; Haass, Michael J.
Moreland, Kenneth D.
Moreland, Kenneth D.
Littlewood, David J. ; Bignell, John B. ; Tikare, Veena T.
Barrett, Brian B. ; Brightwell, Ronald B. ; Hemmert, Karl S.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Hoefler, Torsten; Dinan, James; Buntinas, Darius; Balaji, Pavan; Barrett, Brian W.; Brightwell, Ronald B. ; Gropp, William; Kale, Vivek; Thakur, Rajeev
Hybrid parallel programming with MPI for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the upcoming MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil solver. © 2012 Springer-Verlag.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Barrett, Brian W.; Brightwell, Ronald B. ; Underwood, Keith D.
Message passing paradigms provide for many to one messaging patterns that result in receive side resource exhaustion. Traditionally, MPI implementations layered over the Portals network programming interface provided a large default unexpected receive buffer space, the user was expected to configure the buffer size to the application demand, and the application was aborted when the buffer space was overrun. The Portals 4 design provides a set of primitives for implementing scalable resource exhaustion recovery without negatively impacting normal operation. A resource exhaustion recovery protocol for MPI implementations is presented, as well as performance results for an Open MPI implementation of the protocol. © 2012 Springer-Verlag.
Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Heroux, Michael A.
Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Azad, Ariful; Halappanavar, Mahantesh; Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Khan, Arif; Pothen, Alex
We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm? We answer this question in the affirmative. We report efficient multithreaded implementations of three classes of algorithms based on their manner of searching for augmenting paths: breadth-first-search, depth-first-search, and a combination of both. The Karp-Sipser initialization algorithm is used to make the parallel algorithms practical. We report extensive results and insights using three shared-memory platforms (a 48-core AMD Opteron, a 32-coreIntel Nehalem, and a 128-processor Cray XMT) on a representative set of real-world and synthetic graphs. To the best of our knowledge, this is the first study of augmentation-based parallel algorithms for bipartite cardinality matching that demonstrates good speedups on multithreaded shared memory multiprocessors. © 2012 IEEE.
Journal of Computational Physics
Scovazzi, Guglielmo S.
Lofstead, Gerald F.
Proposed for publication in IEEE Computer Magazine.
Moreland, Kenneth D.
Lofstead, Gerald F.
Lofstead, Gerald F.
Results 6901–6950 of 9,998
25 Results per page
50 Results per page
100 Results per page
200 Results per page