Red Storm is ranked the world’s most efficient supercomputer in two of six categories

A new series of measurements — the next step in evolution of criteria to more accurately determine the efficiency of supercomputers — has rated Sandia’s Red Storm computer — already judged sixth fastest in the world on the old but more commonly accepted Linpack test — the best in the world in two of six new categories, and very high in two other important categories.

The two first-place benchmarks measure the efficiency of keeping track of individual data (called random access memory), and of communicating data between processors. This is the equivalent of how well a good basketball team works its offense, rapidly passing the ball many times to score against a tough opponent.

NNSA Administrator Linton Brooks got an advance peek at the results and touted them publicly during his recent Sandia visit (Lab News, Feb. 17). Sandia president Tom Hunter proudly noted them in his talks to Sandians and the community last week (see pages 1 and 5).

To understand why success in the new categories is more definitive than the more easily understandable measurement of mere speed (and is not just shopping for a hard-to-understand test that gives the most favorable result for the home team), it’s probably worth a moment to examine how technical ratings were established in the first place.

In the mid-19th century, researchers had all they could do to figure out the speed with which electricity traversed a simple wire.

Much more complicated in the late 20th and early 21st centuries were measurements made of currents flowing along the intricate circuitry of a computer chip.

Basic task of supercomputer

Still more difficult was to arrive at a meaningful number describing the information flowing electrically between many chips intended to work together like an orchestra — each instrument coming in at exactly the right time to solve small portions of large problems, and then pass along that information to the next set of chips waiting to continue the symphony.

This is the basic task of a modern super- computer.

The only way to know whether all the pieces are playing together would be to check the output of each chip, and there are thousands and thousands of chips in computers processing information in parallel circuits. Such tests would be expensive and time-consuming.

Thus, in the early 1990s, supercomputer manufacturers distinguished the capabilities of their products by announcing Theoretical Peak numbers, says Sudip Dosangh (1420). This figure represented how fast a computer with many chips in parallel circuits could run if all processors worked perfectly and in unison. The number was best considered a hopeful estimate.

Next came the Linpack benchmark, which provided a real but relatively simple series of algorithms for a supercomputer to solve. Since 1993, that part of the world interested in supercomputers has watched for the new Linpack numbers, published every six months, to determine the 500 fastest computers in the world, and which entrant is the fastest of them all. For several years, this was the Sandia ASCI Red supercomputer.

More recently, the limitations of this approach have encouraged the Linpack founders, in conjunction with supercomputer manufacturers, to develop still more realistic tests. These indicate how well supercomputers handle essential functions like the passing between processors of large amounts of data necessary to solve real-world problems.

It is in this revised series of tests, called the High Performance Computing Challenge (HPCC) test suite, that Sandia’s Red Storm supercomputer — funded by NNSA’s Advanced Simulation & Computing (ASC) program — has done extremely well.

Rob Leland, director of Computing and Network Services Center 4300, offers this example of a complicated problem: “Suppose your computer is modeling a car crash,” he told the Lab News. “You’re doing calculations about when the windshield is going to break. And then the hood goes through it. This is a very discontinuous event: Out of the blue, something else enters the picture dramatically.”

“You have to remesh every point [of your visualization],” agrees John Zepper (4320).

Fundamental problem solved

Continues Rob, “This is the fundamental problem that Sandia solved in Red Storm: how to monitor what’s coming at you, in every stage of your calculations. You need very good communications infrastructure, because the information is concise, very intense. You need a lot of bandwidth and low latency [to be able to transmit a lot of information with minimum delays], and because the incoming information is very unpredictable, you have to be good [read, ‘aware’] in every direction.”

Rob gives particular credit to Steve Plimpton (1412) and Courtenay Vaughan (1422) for their contributions to solving these problems.

David Womble, acting director of Computation, Computers, and Math Dept. 1400, uses another metaphor. “The question,” he says, “is how much traffic can you move how fast through crowded city streets.” Red Storm, he says, does so well because it has “a balance that doesn’t exist in other machines between communication bandwidth [the ability of a processor to get data it needs from anywhere in the machine quickly] and floating point computation [how fast each processor can do the additions, multiplications, and other operations it needs to do in solving problems].”

More technically, Red Storm posted 1.8 TB/sec (1.8 trillion bytes per second) on one HPCC test: an interconnect bandwidth challenge called PTRANS, for parallel matrix transpose. This test, requiring repeated “reads,” “stores,” and communications among processors, is a measure of the total communication capacity of the internal interconnect. Sandia’s achievement in this category represents 40 times more communications power per teraflop (trillion floating point operations per second) than the PTRANS result posted by IBM’s Blue Gene system that has more than 10 times as many processors. Red Storm is the first computer to surpass the 1 tera-

byte-per-second (1 TB/sec) performance mark measuring communications among processors — a measure that indicates the capacity of the network to communicate when dealing with the most complex situations.

Random access benchmark

The “random access” benchmark checks performance in moving individual data rather than large arrays of data. Moving individual data quickly and well means that the computer can handle chaotic situations efficiently.

The computer has already modeled how much explosive power it would take to destroy an asteroid targeting Earth, how a raging fire would affect critical machinery, and elements of Earth’s atmosphere, in addition to the basic stockpile calculations the machine is designed to address.

It would be effective in visualizing complex defense-related events like an aircraft crashing with nuclear weapons on it, says Jim Tomkins (1420).

Red Storm also did very well in categories it did not win, says Courtenay, finishing second in the world behind Blue Gene in fft (“Fast Fourier Transform,” a method of transforming data into frequencies or logarithmic forms easier to work with); and third behind Purple and Blue Gene in the “streams” category (total memory bandwidth measurement). Higher memory bandwidth helps prevent processors from being starved for data.

The two remaining tests involve the effectiveness of individual chips, rather than overall computer design.

In a normalization of benchmarks, which involves dividing them by the Linpack speed, Jim found that Red Storm had the best ratio. That is, Red Storm — of all the supercomputers — was best balanced to do real work.

An unusual feature of Red Storm’s architecture, says Jim, is that the computer can do both classified and unclassified work with the physical throwing of a secure switch, similar to the way a railroad switch can divert a train from one track to another. “That’s important at Sandia because we have a whole community here that does science. We can allocate part or even the whole machine to a science problem, and then move to DOE interests and do secure work.” The secure transfer does not require any movement of discs. There are no hard drives in any Red Storm cabinet.

“We get the value of a big machine that can do classified and unclassified,” says John. The capability of the machine to put its entire computing weight behind science jobs enabled one Sandia researcher to get an entire year’s worth of calculations done in a month, he says.

Red Storm’s architecture was designed by Jim and Bill Camp. The pair’s work has helped Sandia partner Cray Computers Inc. already sell 15 copies of the supercomputer in various sizes to US government agencies and universities, Canadians, and overseas customers in England, Switzerland, and Japan.

Cray holds Sandia licenses to reproduce Red Storm architecture and some system software, says Jim. “The operating system was written here, but the IO [input-output] is Cray’s.”

Sandia LabNews

Red Storm is ranked the world's most efficient supercomputer in two of six categories

Basic task of supercomputer

Fundamental problem solved

Random access benchmark