By Neal Singer
A $15 million upgrade to Sandia's Red Storm computer has increased its peak speed from 41.5 to 124.4 teraflops in a computing terrain in which a single teraflop was a big deal only six years ago.
The machine, built by Cray Inc., is now rated second fastest in the world, with a Linpack speed of 101.4 teraflops. The widely recognized Linpack test measures a supercomputer’s speed as applied to a computing problem.
In peak speed, Red Storm remains well behind BlueGene/L at Lawrence Livermore National Laboratory, but, “in terms of scalability, Red Storm is the best in the world,” says Bill Camp (1400), director of Sandia's Computation, Computers, Information, and Math Center.
Scalability refers to a supercomputer’s computational efficiency as the number of processors on a job is increased. “You want to use more processors to get large jobs done more quickly,” says Bill, “but if the computer doesn’t scale well you can lose much of that speedup.” Red Storm loses very little efficiency on large numbers of processors.
“The Cray XT3 supercomputers now dominating the highest end of computing worldwide are based upon Sandia’s Red Storm,” says Bill, who together with Sandia colleague Jim Tomkins (1420) led the design of the machine. “Scientists love it because they can do bigger science more quickly on it than any other computer in existence, except for molecular dynamics studies on BlueGene. Otherwise, it’s the best thing since night baseball.”
“The machine’s also a computational workhorse. It gets the job done,” says Sandia researcher Steve Attaway (1534), winner of several national computing awards. He runs large engineering simulations on the machine.
Red Storm, designed under NNSA’s Simulation & Computing program, became the basis for the Cray XT3™ massively parallel processor supercomputer that has been installed at supercomputing centers around the world.
Purchasers of this design include Oak Ridge National Laboratory, which will create an even bigger supercomputer than Red Storm based on the same design, as well as Lawrence Berkeley Lab, Pittsburgh Supercomputing Center (which is the largest National Science Foundation site); the US Army; the United Kingdom’s Atomic Weapons Establishment program; the national computing centers in Finland, Switzerland, and the UK; and other US and allied government sites.
Thrifty in its use of power
Red Storm is Sandia’s largest high-performance computer, but is thrifty in its use of power. It uses 2.2 megawatts, compared, for example, to IBM Purple, another highly capable NNSA platform — which requires 4.5 megawatts of power. This means that comparatively less of Red Storm’s energy is converted to useless heat.
Red Storm also takes up a relatively small area — about 3,500 square feet.
Its Linpack test demonstrated high reliability, repeatedly running for nine hours on more than 26,000 processor cores without a failure.
The machine was created in less than three years from concept to customer shipment. It was relatively inexpensive to develop and build — $77.5 million including engineering and design costs — and is used for large scientific and technical problems.
Sandia developed the architectural specifications of the machine and did much of the software development. “The hardware at Cray was built to meet our specifications,” says Jim Tomkins.
The upgrade included the addition of a fifth row of cabinets and upgrading the entire system with dual-core AMD Opteron™ processors, resulting in a supercomputer with more than 26,000 processor cores. Dual-core technology fits two processor cores on a single die, doubling processing capacity with minimal impact on power consumption and temperature levels.
Why is Red Storm so efficient? In part, says Sandia researcher Robert Balance (4328), because its operating system is based on minimalist software — termed a lightweight kernel — which carries just enough functionality to load the job, put it on the network, and stop it. Any other software is job-specific; thus, each computer node (at which two chips are located) in effect lugs no useless software on its back.
The original technology was pioneered by Sandia on its ASCI Red machine, built by Intel Corporation, which became the world’s first terascale supercomputer.
Sandia’s 8,960-processor Thunderbird Linux cluster, developed in collaboration with Dell Inc. and Cisco, maintained its sixth position in the Top500 Supercomputers list by achieving an improved overall performance of 53 teraflops, an increase of more than 18 percent over last year’s performance testing.
The Top500 ranking of supercomputers is based on the Linpack benchmark, a yardstick of performance to test processor speed, scalability, and accuracy.
“This achievement represents a long-term investment to meet our mission to transform engineering and provide greater processing capacity,” says Computing Systems Senior Manager John Zepper (4320).
Sandia researchers use Thunderbird to perform a broad range of weapons simulations, including atomistic scale-to-device modeling of radiation effects on semiconductor electronics, assessing weapon-response safety in extreme thermal and impact environments, and quantifying uncertainties in weapon performance.
The level of detail being modeled in these assessments was not practical without the new level of scalable capacity that Thunderbird provides.
With its 4,480 commodity compute servers linked with an Infiniband message-passing interconnect, Thunderbird is the largest cluster of its type in the world.
The improvements in Thunderbird’s performance were propelled by its switch to OpenFabric Enterprise Distribution (OFED) and OpenMPI — together, a Linux-based open-source software stack qualified by the OpenFabrics Alliance to operate with multi-vendor Infiniband hardware and implement open-source Message Passing Interface (MPI) protocol.
The achievement was a joint venture involving Sandia and Cisco. Cisco, an active developer in the OFED and OpenMPI projects, had its engineers on site at Sandia to assist with monitoring, diagnosing, and fine-tuning Thunderbird’s performance.
The new software-stack environment allows for more memory per node to be available for parallel jobs at runtime, as well as an increase in reliability and scalability of users’ jobs. Sandia’s extensive use of the new software ironed out bugs and tweaked performance — improvements that benefit the entire high-performance-computing community.
Infiniband is widely regarded as one of the most attractive commodity interconnect technologies because of its high bandwidth, low latency, and low cost. This is the first time Infiniband, OpenMPI, and OFED have been used in such a massive configuration as Thunderbird. -- Neil Singer
By integrating readily available generic sensors with a more sophisticated sensor, Sandia researchers have developed a detection system that promises to make it easier to catch perpetrators trying to infiltrate prohibited areas.
Researchers from Embedded Sensor System Dept. 2623 and Exploratory Real Time Systems Dept. 5432 spent the last four months of FY06 figuring out how small, low-cost, low-power, commercially available sensors can supplement their in-house customized sensors developed between 2002 and 2005. During that time, numerous projects — Target Acquisition, Location, Observation, and Neutralization (TALON), Hard and Deeply Buried Target Grand Challenge (HDBT), Sensor Dart, and Virtual Perimeter System (VPS) — contributed to the advancement of unattended ground sensor (UGS) technology.
As a result, Sandia has solidified a sensor system complete with onboard GPS, compass, local and long-haul radios, digital signal processor, and video capabilities. However, it is significantly larger than the off-the-shelf sensors and is not currently available for mass production.
“We wanted inexpensive sensors to act as a first line of defense identifying potential targets and then through a series of radio signals wake up the UGS package. The Sandia-developed UGS package could then use advanced pattern-recognition techniques to classify four-legged animals, two-legged humans, or civilian and military vehicles,” says Hung Nguyen (5432), project investigator. “The significance of this is that by combining commercial sensors with our UGS, we can cover more ground for less.”
The integration of the more powerful sensor and the smaller ones will increase detection range, lower false alarms, and increase the area of coverage per dollar spent in complex terrains.
The $75,000 in funding for the off-the-shelf sensor work came through Sandia’s internal Laboratory Directed Research and Development (LDRD) program. It was “late start” money awarded near the end of the fiscal year to help solve a specific problem.
The commercial sensors, provided by Crossbow Technology, Inc. were modified with Sandia algorithms and some minor hardware changes. They can be powered by either a battery or solar panel, depending on customer needs. The sensor uses a geophone equipped with a four-inch pointed spike planted in the ground to detect movement by measuring seismic waves. To complete the situational awareness package Isaac Toledo (5432) describes how the system is both “an elegant and seamless network configuration capable of self-configuring and self-healing.” Any events detected are reported back to the UGS via this network.
“Our customized unattended ground sensors work extremely well for monitoring various situations but for wide areas can be very costly,” says Mark Ladd, Dept. 2623 manager. “Using the commercial sensors in combination with a handful of our UGS devices is a viable alternate solution.”
Researcher Jonathan Van Houten (2623) says one potential application of the sensor system would be to strategically place off-the-shelf sensors at out-of-sight locations around a secure facility. The Sandia UGS would be placed nearby and video-linked to a security station monitored by guards.
“You could put them in arroyos or other places guards can’t immediately see,” Jeremy Giron (2623) says. “If an intruder shows up, the commercial sensors can send a signal to the Sandia UGS, which in turn performs more analysis and notifies the guard via Google Earth.”
Now that the initial integration of commercial sensors with custom UGS has been demonstrated, Mark is quick to point out that the next logical step is to seek out customers interested in both advancing and deploying this architecture. These sensors will also become part of the intrusion detection work done by 6429.
“We are eager to propel this system to the next level and meet a need that we know is out there,” Mark says. “Eventually the technology would be transferred to a manufacturer.” -- Chris Burroughs
At the end of a long, dusty gravel road, behind yellow barricades that warn wanderers away from active testing sites, a group of scientists from Sandia, two Canadian institutions, and experts from the UK are conducting experiments to better understand what happens on the ground and in the air when explosives detonate on specific surfaces.
After the second five-pound charge of the day detonates, the group receives the all-clear to investigate the blast site.
“Bit of a shelf in that crater,” remarks John Marriage from the Atomic Weapons Establishment in the UK. Fred Harper (6417), senior scientist in High Consequence Assessment and Technology, feigns skepticism. “That's half a shelf, at best,” he teases.
Testing allows scientists to better predict the consequences of explosive and nonexplosive radioactive dispersal devices (RDDs). The purpose of this series of tests performed at the 9920 test site was to learn more about the interactions between explosive fireballs and different ground surfaces and to better characterize the buoyant behavior of the resulting plumes. This information will be combined with the results of the explosive aerosolization work that has been performed at the 9920 test site to understand the impact of RDDs detonated in urban environments.
Much of Fred's work has been in indoor experiments. Working with the Canadian teams and the UK experts allowed him to bring his work outdoors to study different aspects of dispersal.
Other Sandians involved in the testing include Will Wente, Paul Johnson, Mark Naro, Weldon Teague, Roger Goode, Chris Parchert, Lindsay Dvorak (all 6417), Byron Demosthenous (1535), and Gary Zender (1822).
“The soot and dust swept into the fireball can combine to change the nature of the aerosol originally produced by the dispersal device,” Fred says. “This can significantly change the impact on the population. The indoor experiments are done on a smaller scale in a clean environment, and tell you what is produced after the interaction between the material and the shock wave on the microsecond time scale.
“The outdoor experiments tell you what happens to the material when it is exposed to soot and dust in the fireball on the millisecond time scale and how high the material initially rises on the second time scale.”
During this round of testing, 50 one-, five-, and 10-pound charges were detonated on asphalt, concrete, grass, play sand, loose dirt, and packed dirt so scientists could study the characteristics of the resulting debris clouds and the resulting surface damage.
After each explosion, the Canadian team tracked the resulting plume with a light detection and ranging (lidar) system that provides a four-dimensional model of the plume's progress.
The lidar system is similar to Doppler weather forecasting systems. It generates a four-dimensional map that includes plume information about density, distance, and dispersion over time.
Each test tracks the movements of debris plumes for up to seven minutes. This allows researchers to better understand how variables like wind, explosive charge size, and the impacted surface affect the quality of materials released into the air and the rate at which they travel.
After the plume disperses, the UK team characterizes the resulting impact crater on a variety of surfaces including hard-packed dirt, sand, concrete, grass, and steel plates.
Gilles Roy from Defence R&D, Valcartier, says the lidar “sees more particles for a longer period of time than other systems.” The Canadians drove the lidar system with the delicate instrumentation on top all the way from Quebec, Canada, for the tests.
This is the Canadian team’s seventh trip to Sandia. Roy’s colleague Patrick Brousseau says that since Fred’s group had already done considerable research in the field, it seemed logical to share information to advance knowledge in the field without duplicating effort. The research programs and facilities in Canada were modeled after the program at Sandia.
Brousseau says work such as Fred's helped them get into the field of detecting and analyzing radiological threats. He also says that after 9/11, the Canadian government decided it was something they should study further for their own national security.
Project team members from the Royal Military College of Canada and Environment Canada are using the measurements of the plume evolution to validate atmospheric dispersal models. Information coming out of these joint experiments is used by the health ministry in Canada to model biological effects of radiological dispersals.
John Marriage says work such as this provides emergency preparedness personnel and first-responders invaluable data to identify potentially dangerous source items.Lorne Erhardt from Defence Research and Development Canada says this research contributes to the ongoing work to prepare those involved in emergency activities to assess the possibility and consequence of potentially harmful agents, both radiological and biological. -- Stephanie Holinka