SOS7 Machines Session

Revised March 26, 2003

Introduction

There will be two sessions at SOS7 dedicated for presentations about machines. The Tuesday session will discuss already operational machines, while the Wednesday session is for planned machines. Instead of presenting technical details of their machines, the panelists were asked to answer three or four specific questions. As a reminder and for people less familier with the individual machines, we summarize here, and briefly before each session, the main technical details for each machine presented.

The panelists for the already exisitng machines session are:

  • ASCI White: Mark Seager
  • ASCI Q: John Morrison
  • NCSA Cluster: Dan Reed
  • PSC Cluster: Mike Levine
  • NOAA Cluster: Leslie Hart
  • French CEA Machine: Jean Gonnord

Here are the questions these panelists were asked to answer:

  1. Is your machine living up to the performance expectations? If yes, how? If not, what is the root cause?
  2. What is the MTBI (mean time between interrupts)? What are the topmost reasons for interrupts? What is the average utilization rate?
  3. What is the primary complaint, if any, from the users?

The panelists for the planned machines session are:

  • LLNL Purple & MCR Cluster: Mark Seager
  • SNL Red Storm: Jim Tomkins
  • LANL Pink Cluster: Ron Minnich
  • PNNL Cluster: Scott Studham
  • ORNL X1/X2: Buddy Bland
  • LLNL/IBM Blue Gene/L: Jose Moreira (IBM Watson Center)
  • NERSC Blue Planet: Brent Gorda

Here are the questions these panelists were asked to answer:

  1. What is unique in structure and function of your machine?
  2. What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data.
  3. What prior experience guided you to this choice?
  4. Other than your own machine, for your needs what are the best and worst machines? And, why?

Technical Data

We will now present a series of tables comparing the technical aspects of these machines.

Hardware: General

MachineMakerAvailable·PowerSizePlans
ASCI QHPQA Aug. 2002
QB Feb. 2003
3.5 MW> 12,000 ft2 (2 segments) 
PSC ClusterCompaq/PSCApr. 20020.46 MW2500 ft2Add small number of EV7
NOAA ClusterHPTiOct. 2002 1,200 ft2 
French CEACompaqFeb. 20010.6 MW Phase 2 in 2003: 50 TF peak
SNL Red StormCray/SandiaAug. 2004< 2 MW< 3,000 ft2Expandable to > 100 TF
LANL Pink ClusterLinux NetworxJun. 2003 1,000 ft2Could expand to 8192 nodes
PNNL ClusterHPPhase 1: Dec. 2002
Phase 2: Jul. 2003
7.5 MW2,500 ft2Upgrade Sep. 2003
ORNL X1/X2CraySep. 20030.4 MW100 ft24 cabinets in 2003, 10 in 2004, 50 in 2005
LLNL Blue Gene/LIBM2004/20051.5 MW1,200 ft2 
NERSC Blue PlanetIBMJun. 20056 MW12,000 ft2Later expansion
  • Power consumption for some machines includes cooling.

Hardware: Node Level

MachineCPU typeCPUs / nodeMem / CPUNICCPU / NIC
ASCI QAlpha EV-68, 1.25 GHz, 2.5 GF, 16 MB cache42,4, & 8 GB/CPU, 4 GB/sQuadrics Elan 34?
PSC ClusterAlpha EV-68, 1 GHz, 2 GF, 8MB cache44 GB, 4 GB/sQuadrics Elan 31/2
NOAA ClusterIntel P4 Xeon, 2.2GHz20.5 GB, 400 MHzMyrinet 20002?
French CEAAlpha EV-68, 1.0 GHz41 GBQuadrics1
SNL Red StormAMD Sledgehammer, 2 GHz, 1 MB cache11 GB DDR @ 333 MHzCray1
LANL Pink ClusterIntel P4, 2.4 GHz, 2.4 GF? 512 kB cache21 GBMyrinet LANai 92?
PNNL ClusterMcKinley, 1 GHz, 4 GF, 3 MB cache
Phase2 upgrade: Madison, 1.5 GHz, 6 GF, 6 MB cache
26 GB; 6.4GB/sPhase2 Upgrade: 1 Elan3 (270 MB/s), 1 Elan4 (>700 MB/s)1/3?
ORNL X1/X2Cray vector, 12.8 GF644 GB; 51 GB/sCray; 100 GB/s 
LLNL Blue Gene/L440 PowerPC; 700 MHz; 2.8 GF; 32 kB L1 cache; 4 MB cache2128 MB; 5.6 GB/sIBM 175 MB/s2
NERSC Blue PlanetPower-5, 8-10 GF816 GBIBM 

Hardware: Machine Level

Machinetot. CPUstot. Memtot. Mem BWPeak TFLinpack
ASCI Q8192   7.7 TF (10T)
PSC Cluster300012 TB12 TB/s6 TF4.5 TF
NOAA Cluster18160.9 TB  3.337 TF
French CEA25603 TB5 TB/s5 TF3.98 TF
SNL Red Storm10,36810 TB DDR @ 333 MHz~55 TB/s~40 TF> 14 TF
LANL Pink Cluster2048    
PNNL ClusterPhase2 Upgrade: > 1,900  11.4 TF8 TF
ORNL X1/X232 3/03; 128 6/03; 256 9/03  3.2 TF 
LLNL Blue Gene/L131,07216 TB  ~200 TF
NERSC Blue Planet16,384256 TB  40-50 TF

Hardware: Network and Reliability Features

MachineTopologyBi-section BWMTBIFeatures
ASCI QFat tree128 GB/s (20T)6.5 h per 10T (2.1 h for 30T?) 
PSC ClusterFat tree165 GB/s11 hRedundant Power, NIC, network
NOAA ClusterClos   
French CEAQSW Dual Rail   
SNL Red StormFull 3D mesh> 1.5 TB/s> 50 hSophisticated RAS system
LANL Pink Cluster   Redundant BIOS & fans; no mechanical parts
PNNL Cluster3 Fat trees (Elan3, Elan4 and GigE)164 GB/s (Elan3), 900 GB/s (Elan4), 10 GB/s (GigE) chckpoint/restart, RAID5, multiple SAN paths, failover I/O, login, mgmt nodes
ORNL X1/X2    
LLNL Blue Gene/L3D torus 64x32x32700 GB/s  
NERSC Blue Planet2 separate federation switches, each with a third stage 8192 switch links   

Hardware: I/O

Machinetot. DiskAggregate BW (local FS)Aggregate BW (global FS)Off machine I/Ocat5
ASCI Q442 TB19.2 GB/s (10T)19.2 GB/s (10T)  
PSC Cluster30 TB < 32 GB/s  
NOAA Cluster20 TB    
French CEA50 TB 7.5GB/s  
SNL Red Storm240 TB50 GB/s50 GB/s25 GB/s 
LANL Pink Cluster     
PNNL Cluster256 TB; 200 TB local and 56 TB Global132 GB/s3.2 GB/s  
ORNL X1/X2     
LLNL Blue Gene/L   128 GB/s 
NERSC Blue Planet2,500 TB    

Modified on: March 26, 2003