Publications

Results 8401–8450 of 9,998

Search results

Jump to search filters

Building more powerful less expensive supercomputers using Processing-In-Memory (PIM) LDRD final report

Murphy, Richard C.

This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential of PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.

More Details

Final Report on LDRD project 130784 : functional brain imaging by tunable multi-spectral Event-Related Optical Signal (EROS)

Hsu, Alan Y.; Speed, Ann S.

Functional brain imaging is of great interest for understanding correlations between specific cognitive processes and underlying neural activity. This understanding can provide the foundation for developing enhanced human-machine interfaces, decision aides, and enhanced cognition at the physiological level. The functional near infrared spectroscopy (fNIRS) based event-related optical signal (EROS) technique can provide direct, high-fidelity measures of temporal and spatial characteristics of neural networks underlying cognitive behavior. However, current EROS systems are hampered by poor signal-to-noise-ratio (SNR) and depth of measure, limiting areas of the brain and associated cognitive processes that can be investigated. We propose to investigate a flexible, tunable, multi-spectral fNIRS EROS system which will provide up to 10x greater SNR as well as improved spatial and temporal resolution through significant improvements in electronics, optoelectronics and optics, as well as contribute to the physiological foundation of higher-order cognitive processes and provide the technical foundation for miniaturized portable neuroimaging systems.

More Details

LDRD final report : massive multithreading applied to national infrastructure and informatics

Barrett, Brian B.; Hendrickson, Bruce A.; Laviolette, Randall A.; Leung, Vitus J.; Mackey, Greg; Murphy, Richard C.; Phillips, Cynthia A.; Pinar, Ali P.

Large relational datasets such as national-scale social networks and power grids present different computational challenges than do physical simulations. Sandia's distributed-memory supercomputers are well suited for solving problems concerning the latter, but not the former. The reason is that problems such as pattern recognition and knowledge discovery on large networks are dominated by memory latency and not by computation. Furthermore, most memory requests in these applications are very small, and when the datasets are large, most requests miss the cache. The result is extremely low utilization. We are unlikely to be able to grow out of this problem with conventional architectures. As the power density of microprocessors has approached that of a nuclear reactor in the past two years, we have seen a leveling of Moores Law. Building larger and larger microprocessor-based supercomputers is not a solution for informatics and network infrastructure problems since the additional processors are utilized to only a tiny fraction of their capacity. An alternative solution is to use the paradigm of massive multithreading with a large shared memory. There is only one instance of this paradigm today: the Cray MTA-2. The proposal team has unique experience with and access to this machine. The XMT, which is now being delivered, is a Red Storm machine with up to 8192 multithreaded 'Threadstorm' processors and 128 TB of shared memory. For many years, the XMT will be the only way to address very large graph problems efficiently, and future generations of supercomputers will include multithreaded processors. Roughly 10 MTA processor can process a simple short paths problem in the time taken by the Gordon Bell Prize-nominated distributed memory code on 32,000 processors of Blue Gene/Light. We have developed algorithms and open-source software for the XMT, and have modified that software to run some of these algorithms on other multithreaded platforms such as the Sun Niagara and Opteron multi-core chips.

More Details

Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing

Pedretti, Kevin T.T.; Levenhagen, Michael J.; Brightwell, Ronald B.

Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

More Details

HPC application fault-tolerance using transparent redundant computation

Ferreira, Kurt; Riesen, Rolf; Oldfield, Ron A.; Brightwell, Ronald B.; Laros, James H.; Pedretti, Kevin P.

As the core count of HPC machines continue to grow in size, issues such as fault tolerance and reliability are becoming limiting factors for application scalability. Current techniques to ensure progress across faults, for example coordinated checkpoint-restart, are unsuitable for machines of this scale due to their predicted high overheads. In this study, we present the design and implementation of a novel system for ensuring reliability which uses transparent, rank-level, redundant computation. Using this system, we show the overheads involved in redundant computation for a number of real-world HPC applications. Additionally, we relate the communication characteristics of an application to the overheads observed.

More Details

Algebraic connectivity and graph robustness

Feddema, John T.

Recent papers have used Fiedler's definition of algebraic connectivity to show that network robustness, as measured by node-connectivity and edge-connectivity, can be increased by increasing the algebraic connectivity of the network. By the definition of algebraic connectivity, the second smallest eigenvalue of the graph Laplacian is a lower bound on the node-connectivity. In this paper we show that for circular random lattice graphs and mesh graphs algebraic connectivity is a conservative lower bound, and that increases in algebraic connectivity actually correspond to a decrease in node-connectivity. This means that the networks are actually less robust with respect to node-connectivity as the algebraic connectivity increases. However, an increase in algebraic connectivity seems to correlate well with a decrease in the characteristic path length of these networks - which would result in quicker communication through the network. Applications of these results are then discussed for perimeter security.

More Details

IceT users' guide and reference

Moreland, Kenneth D.

The Image Composition Engine for Tiles (IceT) is a high-performance sort-last parallel rendering library. In addition to providing accelerated rendering for a standard display, IceT provides the unique ability to generate images for tiled displays. The overall resolution of the display may be several times larger than any viewport that may be rendered by a single machine. This document is an overview of the user interface to IceT.

More Details

Using adversary text to detect adversary phase changes

Doser, Adele D.; Speed, Ann S.; Warrender, Christina E.

The purpose of this work was to help develop a research roadmap and small proof ofconcept for addressing key problems and gaps from the perspective of using text analysis methods as a primary tool for detecting when a group is undergoing a phase change. Self- rganizing map (SOM) techniques were used to analyze text data obtained from the tworld-wide web. Statistical studies indicate that it may be possible to predict phase changes, as well as detect whether or not an example of writing can be attributed to a group of interest.

More Details
Results 8401–8450 of 9,998
Results 8401–8450 of 9,998