Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015
The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node interconnect and processing cores, and hide intra- and inter-node data movement. Our approach maintains the existing coding style commonly employed in computational science and engineering applications. Although we show improved performance on existing computers, up to 131,072 processor cores, the effectiveness of this approach on expected future architectures will require the continued evolution of capabilities throughout the codesign stack. Success then will not only result in decreased time to solution, but would also make better use of the hardware capabilities and reduce power and energy requirements, while fundamentally maintaining the current code configuration strategy.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Science
Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.
Abstract not provided.
Quantum tomography is used to characterize quantum operations implemented in quantum information processing (QIP) hardware. Traditionally, state tomography has been used to characterize the quantum state prepared in an initialization procedure, while quantum process tomography is used to characterize dynamical operations on a QIP system. As such, tomography is critical to the development of QIP hardware (since it is necessary both for debugging and validating as-built devices, and its results are used to influence the next generation of devices). But tomography suffers from several critical drawbacks. In this report, we present new research that resolves several of these flaws. We describe a new form of tomography called gate set tomography (GST), which unifies state and process tomography, avoids prior methods critical reliance on precalibrated operations that are not generally available, and can achieve unprecedented accuracies. We report on theory and experimental development of adaptive tomography protocols that achieve far higher fidelity in state reconstruction than non-adaptive methods. Finally, we present a new theoretical and experimental analysis of process tomography on multispin systems, and demonstrate how to more effectively detect and characterize quantum noise using carefully tailored ensembles of input states.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.