Revised March 26, 2003
Introduction
There will be two sessions at SOS7 dedicated for presentations about machines. The Tuesday session will discuss already operational machines, while the Wednesday session is for planned machines. Instead of presenting technical details of their machines, the panelists were asked to answer three or four specific questions. As a reminder and for people less familier with the individual machines, we summarize here, and briefly before each session, the main technical details for each machine presented.
The panelists for the already exisitng machines session are:
- ASCI White: Mark Seager
- ASCI Q: John Morrison
- NCSA Cluster: Dan Reed
- PSC Cluster: Mike Levine
- NOAA Cluster: Leslie Hart
- French CEA Machine: Jean Gonnord
Here are the questions these panelists were asked to answer:
- Is your machine living up to the performance expectations? If yes, how? If not, what is the root cause?
- What is the MTBI (mean time between interrupts)? What are the topmost reasons for interrupts? What is the average utilization rate?
- What is the primary complaint, if any, from the users?
The panelists for the planned machines session are:
- LLNL Purple & MCR Cluster: Mark Seager
- SNL Red Storm: Jim Tomkins
- LANL Pink Cluster: Ron Minnich
- PNNL Cluster: Scott Studham
- ORNL X1/X2: Buddy Bland
- LLNL/IBM Blue Gene/L: Jose Moreira (IBM Watson Center)
- NERSC Blue Planet: Brent Gorda
Here are the questions these panelists were asked to answer:
- What is unique in structure and function of your machine?
- What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data.
- What prior experience guided you to this choice?
- Other than your own machine, for your needs what are the best and worst machines? And, why?
Technical Data
We will now present a series of tables comparing the technical aspects of these machines.
Hardware: General
Machine | Maker | Available | ·Power | Size | Plans |
ASCI Q | HP | QA Aug. 2002 QB Feb. 2003 | 3.5 MW | > 12,000 ft2 (2 segments) | |
PSC Cluster | Compaq/PSC | Apr. 2002 | 0.46 MW | 2500 ft2 | Add small number of EV7 |
NOAA Cluster | HPTi | Oct. 2002 | 1,200 ft2 | ||
French CEA | Compaq | Feb. 2001 | 0.6 MW | Phase 2 in 2003: 50 TF peak | |
SNL Red Storm | Cray/Sandia | Aug. 2004 | < 2 MW | < 3,000 ft2 | Expandable to > 100 TF |
LANL Pink Cluster | Linux Networx | Jun. 2003 | 1,000 ft2 | Could expand to 8192 nodes | |
PNNL Cluster | HP | Phase 1: Dec. 2002 Phase 2: Jul. 2003 | 7.5 MW | 2,500 ft2 | Upgrade Sep. 2003 |
ORNL X1/X2 | Cray | Sep. 2003 | 0.4 MW | 100 ft2 | 4 cabinets in 2003, 10 in 2004, 50 in 2005 |
LLNL Blue Gene/L | IBM | 2004/2005 | 1.5 MW | 1,200 ft2 | |
NERSC Blue Planet | IBM | Jun. 2005 | 6 MW | 12,000 ft2 | Later expansion |
- Power consumption for some machines includes cooling.
Hardware: Node Level
Machine | CPU type | CPUs / node | Mem / CPU | NIC | CPU / NIC |
ASCI Q | Alpha EV-68, 1.25 GHz, 2.5 GF, 16 MB cache | 4 | 2,4, & 8 GB/CPU, 4 GB/s | Quadrics Elan 3 | 4? |
PSC Cluster | Alpha EV-68, 1 GHz, 2 GF, 8MB cache | 4 | 4 GB, 4 GB/s | Quadrics Elan 3 | 1/2 |
NOAA Cluster | Intel P4 Xeon, 2.2GHz | 2 | 0.5 GB, 400 MHz | Myrinet 2000 | 2? |
French CEA | Alpha EV-68, 1.0 GHz | 4 | 1 GB | Quadrics | 1 |
SNL Red Storm | AMD Sledgehammer, 2 GHz, 1 MB cache | 1 | 1 GB DDR @ 333 MHz | Cray | 1 |
LANL Pink Cluster | Intel P4, 2.4 GHz, 2.4 GF? 512 kB cache | 2 | 1 GB | Myrinet LANai 9 | 2? |
PNNL Cluster | McKinley, 1 GHz, 4 GF, 3 MB cache Phase2 upgrade: Madison, 1.5 GHz, 6 GF, 6 MB cache | 2 | 6 GB; 6.4GB/s | Phase2 Upgrade: 1 Elan3 (270 MB/s), 1 Elan4 (>700 MB/s) | 1/3? |
ORNL X1/X2 | Cray vector, 12.8 GF | 64 | 4 GB; 51 GB/s | Cray; 100 GB/s | |
LLNL Blue Gene/L | 440 PowerPC; 700 MHz; 2.8 GF; 32 kB L1 cache; 4 MB cache | 2 | 128 MB; 5.6 GB/s | IBM 175 MB/s | 2 |
NERSC Blue Planet | Power-5, 8-10 GF | 8 | 16 GB | IBM |
Hardware: Machine Level
Machine | tot. CPUs | tot. Mem | tot. Mem BW | Peak TF | Linpack |
ASCI Q | 8192 | 7.7 TF (10T) | |||
PSC Cluster | 3000 | 12 TB | 12 TB/s | 6 TF | 4.5 TF |
NOAA Cluster | 1816 | 0.9 TB | 3.337 TF | ||
French CEA | 2560 | 3 TB | 5 TB/s | 5 TF | 3.98 TF |
SNL Red Storm | 10,368 | 10 TB DDR @ 333 MHz | ~55 TB/s | ~40 TF | > 14 TF |
LANL Pink Cluster | 2048 | ||||
PNNL Cluster | Phase2 Upgrade: > 1,900 | 11.4 TF | 8 TF | ||
ORNL X1/X2 | 32 3/03; 128 6/03; 256 9/03 | 3.2 TF | |||
LLNL Blue Gene/L | 131,072 | 16 TB | ~200 TF | ||
NERSC Blue Planet | 16,384 | 256 TB | 40-50 TF |
Hardware: Network and Reliability Features
Machine | Topology | Bi-section BW | MTBI | Features |
ASCI Q | Fat tree | 128 GB/s (20T) | 6.5 h per 10T (2.1 h for 30T?) | |
PSC Cluster | Fat tree | 165 GB/s | 11 h | Redundant Power, NIC, network |
NOAA Cluster | Clos | |||
French CEA | QSW Dual Rail | |||
SNL Red Storm | Full 3D mesh | > 1.5 TB/s | > 50 h | Sophisticated RAS system |
LANL Pink Cluster | Redundant BIOS & fans; no mechanical parts | |||
PNNL Cluster | 3 Fat trees (Elan3, Elan4 and GigE) | 164 GB/s (Elan3), 900 GB/s (Elan4), 10 GB/s (GigE) | chckpoint/restart, RAID5, multiple SAN paths, failover I/O, login, mgmt nodes | |
ORNL X1/X2 | ||||
LLNL Blue Gene/L | 3D torus 64x32x32 | 700 GB/s | ||
NERSC Blue Planet | 2 separate federation switches, each with a third stage 8192 switch links |
Hardware: I/O
Machine | tot. Disk | Aggregate BW (local FS) | Aggregate BW (global FS) | Off machine I/O | cat5 |
ASCI Q | 442 TB | 19.2 GB/s (10T) | 19.2 GB/s (10T) | ||
PSC Cluster | 30 TB | < 32 GB/s | |||
NOAA Cluster | 20 TB | ||||
French CEA | 50 TB | 7.5GB/s | |||
SNL Red Storm | 240 TB | 50 GB/s | 50 GB/s | 25 GB/s | |
LANL Pink Cluster | |||||
PNNL Cluster | 256 TB; 200 TB local and 56 TB Global | 132 GB/s | 3.2 GB/s | ||
ORNL X1/X2 | |||||
LLNL Blue Gene/L | 128 GB/s | ||||
NERSC Blue Planet | 2,500 TB |
Modified on: March 26, 2003