There’s a new supercomputer being born at Sandia, and it will stand on the shoulders of giants. But it will also be a more democratic supercomputer: of Sandia, by Sandia, and for all Sandians. Red Sky, now being assembled in the space where legendary system ASCI Red once stood, will replace Thunderbird, which currently serves as Sandia’s “everyday” computer.
Red Sky will deliver more than 160 tera-flops peak performance and will provide roughly three times the computing capacity of Thunderbird. Red Sky is designed to be expanded economically to several times this initial capacity to meet future growth in demand. That’s important because in-house requests for institutional computing cycles currently outpace supply by a factor of four.
“One thing that’s really exciting to me about this project,” says Rob Leland, director of Computing and Network Services Center 9300, “is that we’re taking the architectural philosophy and design principles that we pioneered in systems for the weapons program such as ASCI Red and Red Storm and building a machine that will be broadly available to the entire Laboratory.”
Rob says Red Sky is intended to be a capacity machine (intended to support a large number of small and medium-sized jobs) but to be much more scalable, (allowing larger, more complex jobs to be run) than typical commercially provided capacity systems. The trick, Rob says, is to leverage the economics of commodity parts and yet incorporate the design principles learned from previous generations of specialized high-performance computing systems.
The use of “red” in the name is meant to evoke the successful supercomputing systems and programs of Sandia’s past. The designers and builders of Red Sky built on the design principles and successes of earlier machines such as ASCI Red from the mid-1990s and Red Storm from early 2000.
“We use ‘red’ in the name to convey that we intend to deliver a very high-caliber system to the user community at Sandia,” Rob says. “It’s also intended to convey to the broader computing community that this is a machine consistent in its approach and its philosophy with those previous machines that were so highly regarded. We’re trying to create that continuity and that sense of legacy.”
Red Sky will be more than just a capacity machine. “It’s a scalable design,” Rob says, “which means that application codes can run very efficiently using lots of processors on the system. But it’s also an extensible machine, meaning we can physically build it out. We can add more computing power to it in an efficient way that doesn’t require us to rework the machine.”
Rob says that in a more typical machine designers have to add a whole bunch of infrastructure to it in a nonlinear way to improve performance.
“Normally,” Rob says, “if you doubled the size of the machine, for example, you’d need four times as much cabling. That’s not true here because the machine is very replicable and very extensible.”
One key feature that enables physically extending the machine is the simple “topology” used, Rob says, meaning that things are connected in a three-dimensional mesh-type grid.
“It turns out,” Rob says “that structure is a good choice for mapping physical codes onto the machine because physics is typically expressed mathematically in a three-dimensional grid that matches well to the machine.”
A project of this complexity and ambition requires a close partnership with leading-edge vendors. In this case, Sandia worked with Sun Microsystems and Intel.
“Sun was willing to take substantial risks and create and invest in technology for the partnership,” Rob says, “so it was a very good fit for our needs and goals.” Sun was also willing to work with Sandia to innovate in several key dimensions, he says.
“Intel gave us early access to their latest processing technology and very competitive pricing for that new technology,” Rob says. Intel was a natural choice because it has been very actively reestablishing itself in the scientific high-performance computing market in recent years. Rob says Intel’s processor technology is moving intentionally toward incorporating certain key technologies and design features that support Sandia’s goals for the machine.
The Red Sky project, Rob notes, required both a commitment to strong technical innovation and strong value because the machine must provide the highest-quality service for the lowest prices possible. The Labs also wanted the project to continue Sandia’s legacy of innovation and excellence in high-performance computing and leadership in the field.
Red Sky is expected to be online in full production service later this year.