A better benchmark for supercomputer performance

A new benchmark to more accurately measure the capabilities of modern supercomputers has been crafted by Sandia researcher Mike Heroux (1426), in collaboration with the creator of the widely used LINPACK benchmark, Jack Dongarra and his colleagues at the University of Tennessee and Oak Ridge National Laboratory.

The new test — a relatively small program called HPCG, for “high performance conjugate gradient” — is being field-tested on a number of NNSA supercomputers. It is expected to be released at SC13, the supercomputing 2013 conference in Denver in November.

Says Mike, “We have known for quite a few years that LINPACK was not a good performance proxy for complex modern applications. But we could still design a cost-effective machine that satisfied two increasingly distinct design points: real application performance and LINPACK performance. Thus we got improvements for our application base and still posted a good TOP500 number [that certified the new machine was one of the 500 fastest in the world].”

But, he says, the two goals have diverged far enough that, like classroom teachers rebelling against “teaching for the test” rather than improving overall knowledge, “computer designers feel that designing for both is no longer appropriate.”

The National Nuclear Security Administration has supported work on the new test because, Mike says, “while NNSA realizes it needs to invest in new supercomputers over the coming decades, it is unwilling to spend public money to develop architecture solely to do well on LINPACK. NNSA wants a more meaningful measure. “

LINPACK’s influential semi-annual TOP500 listing of the 500 fastest machines has been noted worldwide for more than 25 years, initially because it had been considered a simple and accurate metric, readily understood and appreciated by non-experts.

“The TOP500 was and continues to be the best advertising supercomputing gets,” Mike says. “Twice a year when the new rankings come out, we get articles in media around the world. My 6-year-old can appreciate what it means.”

However, in recent years the gap between LINPACK performance and real applications performance has grown dramatically.

In the early years of supercomputing, applications and problems were simpler, better matching the algorithms and data structures of LINPACK. Since then, applications and problems have become much more complex, demanding a broader collection of capabilities from the computer system than LINPACK.

“The specifications of the LINPACK benchmark are like telling race cars designers to build the fastest car for a completely flat, open terrain,” says Heroux. “In that setting the car has to satisfy only a single design goal. It does not need brakes, a steering wheel, or other control features, making it impractical for real driving situations.”

The LINPACK benchmark pushes computer designers to built systems that have lots of arithmetic units but very weak data networks and primitive execution models.

“Because modern applications cannot use all the arithmetic units without better access to data and more flexible execution models,” Mike says, “the extra arithmetic units are useless.”

Mike led development of the new benchmark, starting with a teaching code he wrote to instruct students and junior staff members on how to develop parallel applications. This code later became the first “miniapp” in the Mantevo project, which recently won a 2013 R&D 100 Award.

The technical challenge of HPCG is to develop a very small program that captures as much of the essential performance of a large application as possible without being too complicated to use. “We created a program with 4,000 lines that behaves a lot like a real code of 1 million lines but is much simpler,” Mike says. “If we run HPCG on a simulator or new system and modify the code or computer design so that the code runs faster, we can make the same changes to make the real code run faster. The beauty of the approach is that it really works.”

HPCG generates a large collection of algebraic equations that must be satisfied simultaneously. The conjugate gradient algorithm used in HPCG to solve these equations is an iterative method. It is the simplest practical method of its kind, so it is both a real algorithm that people care about, and not too complicated to implement.

One basis of the method’s relevance is that it uses data structures that more closely match real applications. The data structures used by LINPACK are no longer used for large problems in real applications because they require the storage of many zero values. Decades ago, when application problems and computer memory sizes were much smaller, LINPACK data storage techniques were acceptable. Presently problem sizes are so large that data structures are designed to pay attention to what is zero and not zero, which is what HPCG does.

“By providing a new benchmark, we hope system designers will build hardware and software that will run faster for our very large and complicated problems,” Mike says.

Some testing already done indicates the approach will work. More formal testing by November will show whether Mike, Sandia, and its collaborating labs have a product worth its salt.

The HPCG code tests science and engineering problems involving complex equations, and is not related to another Sandia-led benchmark effort known as Graph 500, which assesses and ranks the capabilities of super”‘big data” problems that search for relationships through graphs.

Sandia LabNews