Sandia LabNews

Sandia and Cray Inc. to tackle big data in new supercomputing institute


Says Rob Leland, director of Sandia’s computing research center, speaking of the relationship with Cray Inc.: “This is a great example of how Sandia engages our industrial partners. The XMT was originally developed at Sandia’s suggestion. It combined an older processor technology Cray had developed with the Red Storm infrastructure we jointly designed, giving birth to a new class of machines. That’s now come full circle. The institute will help leverage this technology to help us in our national security mission work, benefiting the Labs and the nation as well as our partner.”

Sandia and supercomputer manufacturer Cray Inc. have signed a cooperative research and development agreement (CRADA) to form an institute focused on data-intensive supercomputers.

The Supercomputing Institute for Learning and Knowledge Systems (SILKS), to be located at Sandia/New Mexico, is expected to leverage the strengths of both Sandia and Cray by making software and hardware resources available to researchers who focus on a relatively new application of supercomputing. The task of such supercomputers is to make sense of huge collections of data rather than the traditional modeling and simulation of scientific problems.

“It’s an unusual opportunity,” says Bruce Hendrickson, Sandia senior manager of computational sciences and math (1440). “Cray has an exciting machine [the XMT] and we know how to use it well. This CRADA should help originate new technologies for efficiently analyzing large data sets. New capabilities will be applicable to Sandia’s fundamental science and mission work.”

Shoaib Mufti, director of knowledge management in Cray’s custom engineering group, says, “Sandia is a leading national lab with strong expertise in areas of data analysis. The concept of big data in the HPC [high-performing computing] environment is an important area of focus for Cray, and we are excited about the prospect of new solutions that may result from this collaborative effort with Sandia.”

Says Rob Leland, director of computing research (1400), “This is a great example of how Sandia engages our industrial partners. The XMT was originally developed at Sandia’s suggestion. It combined an older processor technology Cray had developed with the Red Storm infrastructure we jointly designed, giving birth to a new class of machines. That’s now come full circle. The institute will help leverage this technology to help us in our national security mission work, benefiting the Labs and the nation as well as our partner.”

The XMT has a different mode of operation from conventional parallel-processing systems. Says Bruce, “Think about your desktop: The memory system’s main job is to keep the processor fed. It achieves this through a complex hierarchy of intermediate memory caches that stage data that might be needed soon. The XMT does away with this hierarchy. Though its memory accesses are distant and time-consuming to reach, the processor keeps busy by finding something else to do in the meantime.”

In a desktop machine or ordinary supercomputer, Bruce says, high performance can only be achieved if the memory hierarchy is successful at getting data to the processor fast enough. But for many important applications, this isn’t possible and so processors are idle for most of the time. Said another way, traditional machines try to avoid latency (waiting for data) though the use of complex memory hierarchies.

XMT embraces latency

The XMT doesn’t avoid latency; instead, it embraces it. By supporting many fine-grained snippets of a program called “threads,” the processor switches to a new thread when a memory access would otherwise make it wait for data.

“Traditional machines are pretty good for many science applications, but the XMT’s latency tolerance is a superior approach for lots of complex data applications,” Bruce says. “For example, following a chain of data links to draw some inference totally trashes memory locality because the data may be anywhere.”

More broadly, he says, the XMT is very good at working with large data collections that can be represented as graphs. Such computations appear in biology, law enforcement, business intelligence, and in various national security applications. Instead of a single answer, results are often best viewed as graphs. Sandia and other labs have already built software to run graph algorithms, though “the software is still pretty immature,” Bruce says. “That’s one reason for the institute. As semantic database technology grows in popularity, these kinds of applications may become ubiquitous.”

Among its other virtues, the XMT saves power because it runs at slower speeds.

SILKS’ primary objectives, as described in the CRADA, are to accelerate the development of high-performance computing, overcome barriers to implementation, and apply new technologies to enable discovery and innovation in science, engineering, and for homeland security. The CRADA’s main technical categories include software, hardware, services, outreach, education, and training.

University students and faculty, as well as scientists and engineers from industry and government, are expected to be invited to take part in and benefit from the institute’s research. CRADAs are written agreements between a private company and a government agency to work together on a project.

A CRADA allows the federal government and non-federal partners to optimize their resources, share technical expertise in a protected environment, share intellectual property emerging from the effort, and speed the commercialization of federally developed technology.