Sandia LabNews

Sandia purchases, installs high-capacity ‘Thunderbird’ supercomputing cluster


Sandia purchases, installs high-capacity ‘Thunderbird’ supercomputing cluster

Sandia has purchased a 4,096-node Dell high-performance computer cluster, called Thunderbird, that will provide more than 8,000 processors of compute capacity to meet the laboratory’s high demand for cluster computing. The aggregated capacity of the computer will have approximately 24 terabytes memory and 60 tera-OPS (trillion operations per second) speed.

Sandia, with Dell Professional Services and Albuquerque’s Technology Integration Group, will install the system at Sandia’s Central Computing Facility in Albuquerque. Delivery of Thunderbird should be completed by the end of July and integration and testing will occur over the next several months. The system is expected to be fully operational in early October.

Thunderbird is Sandia’s second installment of an institutionally maintained cluster. Sandia’s first institutional cluster was installed October 2003 and provides approximately seven tera-OPS of capacity to the laboratory.

“Our first institutional cluster was an important investment for the lab, but it has been fully utilized from the first day it was installed,” says Ken Washington, CIO and director of Sandia’s Information Systems and Services Program. “Thunderbird will make a huge impact by more than quadrupling our institutional capacity. The increase allows the Labs to meet a significant fraction of previously unmet institutional capacity computing requirements in one fell swoop.”

Thunderbird is referred to as a capacity cluster because it is ideally suited to perform many mid-sized tasks with extreme rapidity, rather than one huge task across its entire system like Sandia’s highly customized and tightly coupled Red Storm supercompter.

Thunderbird consists of 4,096 Dell PowerEdge 1850 servers, each equipped with two Intel 64-bit (EM64T) processors, for a total of more than 8,000 processors.

A high-performance Infiniband interconnect from Cisco was chosen because it scales more linearly than most proprietary technologies for building large clusters — an important consideration in assembling a large number of processors. Lower cost was another factor in Sandia’s selection of this widely used interconnect.

The procurement also includes a smaller 128-node developmental cluster to be installed in the Distributed Information Systems Lab at Sandia’s California site. It will enable Sandia to develop and test system software solutions required to successfully integrate and deploy Thunderbird for production use.

“Thunderbird makes important strategic connections between Sandia, Dell, and other vendors,” says Bill Camp, director of Sandia’s Computation, Computers, Information and Mathematics Center. “Our purchase opens a venue to them in high-performance cluster computing. Together we will break new ground by deploying a cluster with commodity processors and an lnfiniband interconnect at the scale of thousands of processors.”

“Sandia has been a leader in putting Infiniband on the high-performance computing map,” Ken Washington says. “It is only natural that we be the place where such a large Infiniband cluster is first realized for meeting an institutional computing requirement.”

“Specific thanks,” says John Zepper (9320), “go to Facilities for power and cooling modifications, Purchasing for rapid JIT placement of the order; for technical contributions, Matt Leininger (8961), Geoff McGirt, Carl Leishman, and Kevin Kelsey (all 9324), David Martinez and Archie Gibson (9335), Chris Maestas (9326), Josh England (8963), Sean Taylor (9328), Jerry Friesen (8963), and Rob Leland (9300), Jim Ang (9224), and Art Hale (9900).
“Expanded capacity computing will deliver on the modeling and simulation vision for the Sandia community,” says John.

Both the Ethernet Input/Output and the command and control of the Thunderbird cluster are based on the Force 10 E-Series switch/routers. The Force 10 E1200, which supports 1,260 gigabit Ethernet ports, offers the industry’s leading gigabit and 10-gigabit port density — providing the scalable performance required to support the largest cluster computers in deployment.