Publications

Results 201–225 of 272

Search results

Jump to search filters

Design methodology for optimizing optical interconnection networks in high performance systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Rumley, Sebastien; Glick, Madeleine; Hammond, Simon D.; Rodrigues, Arun; Bergman, Keren

Modern high performance computers connect hundreds of thousands of endpoints and employ thousands of switches. This allows for a great deal of freedom in the design of the network topology. At the same time, due to the sheer numbers and complexity involved, it becomes more challenging to easily distinguish between promising and improper designs. With ever increasing line rates and advances in optical interconnects, there is a need for renewed design methodologies that comprehensively capture the requirements and expose tradeoffs expeditiously in this complex design space. We introduce a systematic approach, based on Generalized Moore Graphs, allowing one to quickly gauge the ideal level of connectivity required for a given number of end-points and traffic hypothesis, and to collect insight on the role of the switch radix in the topology cost. Based on this approach, we present a methodology for the identification of Pareto-optimal topologies. We apply our method to a practical case with 25,000 nodes and present the results.

More Details

Trinity Benchmarks on the Intel Xeon Phi (Knights Corner)

Rajan, Mahesh R.; Doerfler, Douglas W.; Hammond, Simon D.

This report documents the early experiences with porting and performance analysis of the Tri-Lab Trinity benchmark applications on Intel Xeon Phi (Knights Corner) (KNC) processor. KNC, the second generation of the Intel Many Integrated Core (MIC) architectures, uses a large number of small P54C-x86 cores with wide vector units and is deployed as PCI bus attached process accelerators. Sandia has experimental test beds of small InifiniBand clusters and workstations to investigate the performance of the MIC architecture. On these experimental test beds the programming models that may be investigated are "offload", "symmetric" and "native". Among these program usage models our primary interest is in the so called "native" mode, because the planned Trinity system to be deployed in 2016 using the next generation MIC processor architecture called Knights Landing would be self-hosted. Trinity / NERSC-8 benchmark programs cover a variety of scientific disciplines and they were used to guide the procurement of these systems. Architectures such as the Intel MIC are well suited to study evolving processor architectures and a usage model commonly referred to as MPI + X that facilitates migration of our applications to use both coarse grain and fine grain parallelism. Our focus with the applications selected is on the efficacy of algorithms in these applications to take advantage of features like: large number of cores, wide vector units, higher-bandwidth and deeper memory sub-system. This is a first step towards understanding applications, algorithms and programming environments for Trinity and future exascale computing systems.

More Details

An evaluation of MPI message rate on hybrid-core processors

International Journal of High Performance Computing Applications

Brightwell, Ronald B.; Barrett, Brian W.; Grant, Ryan E.; Hammond, Simon D.; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details
Results 201–225 of 272
Results 201–225 of 272