Sandia Lab News

Inside Cronus, the hybrid supercomputer built for AI and ModSim


Sandians invited to submit jobs on the new system

<strong>DEPTH TO CRONUS </strong>— AI-forward GPUs and many new interconnects help make Cronus a fast, powerful and versatile addition to Sandia’s computing resources. (Photo by Craig Fritz)
DEPTH TO CRONUS — AI-forward GPUs and many new interconnects help make Cronus a fast, powerful and versatile addition to Sandia’s computing resources. (Photo by Craig Fritz)

Sandia is beefing up its high-performance computing capabilities with a new, AI-forward system that recently became available to members of the workforce.

“Cronus is the largest current generation NVIDIA-based, AI-capable system at Sandia,” said Steve Monk, manager of the Labs’ high-performance computing team.

Sandia data centers house 18 computing clusters, eight of which were ranked in November 2025 as among the fastest supercomputers in the world by the organization Top 500. Put them all together, and Sandia computing facilities can crunch 160 quadrillion calculations per second, also called 160 petaflops.

And while the bulk of these calculations are devoted to high-precision modeling and simulation, researchers across the Labs are increasingly interested in training and using AI models. These tools benefit from different kinds of chips than you find in traditional supercomputing powerhouses like Sandia’s El Dorado, which was the world’s 20th fastest supercomputer when it launched in 2024.

Cronus balances these demands, expanding what’s possible for the Labs’ AI work and traditional scientific computing on a shared, versatile platform.

New system a response to evolving needs

<strong>TIME-SENSITIVE </strong>— The decision to build a high-performance computing system with advanced AI capabilities came in response to growing demand at Sandia for AI model training and inference. (Photo by Craig Fritz)
TIME-SENSITIVE — The decision to build a high-performance computing system with advanced AI capabilities came in response to growing demand at Sandia for AI model training and inference. (Photo by Craig Fritz)

The team modeled Cronus after another Sandia system called Hops. Both perform AI and HPC workloads, but Cronus has newer graphics processing units that accelerate computations, and more of them.

“Hops has four GPUs per node,” Steve said. “Cronus has eight.”

But building and running a supercomputer, according to Jeff Ogden, a software stack engineer on Steve’s team, is like getting an orchestra to perform a complicated piece. You don’t get better performance just by adding more instruments. “We’re trying to make them all play together in tune,” he said.

Jeff and the rest of the HPC staff are the conductors, floor managers and repair techs. They integrate hardware, software, networking, storage, schedulers and cybersecurity so the systems run reliably. They also regularly fix components that inevitably misbehave now and then.

Cronus has 16 nodes, each like its own section in the orchestra made of many chips. Sometimes a workload only needs one node, but other simulations require more, which means those nodes must work together.

One of the design goals was better node-to-node performance than earlier systems. On Hops, two cables send information in and out of each node. But on Cronus, “each GPU has a dedicated connection to the high-speed interconnect,” Steve said, which helps keep performance high when workloads scale across multiple nodes.

How high? Jeff said he has seen data transfer rates hit a terabyte per second.

“Any time you connect things as fast as possible together, it makes them appear closer” for computation, which can dramatically improve data-heavy workflows like AI and ModSim, Jeff said.

Testing and validation recently completed

Cronus was installed in late 2025. After extensive testing, Steve and Jeff began inviting select groups to use the system in February to benchmark and validate real workflows. In early May, it became available to all members of the workforce.

<strong>SHARING THE LOAD</strong> — Although their hardware components come from industry, Sandia supercomputers are custom machines designed for the Labs’ unique computing needs. Cronus supports mission and enterprise workloads. (Photo by Craig Fritz)
SHARING THE LOAD — Although their hardware components come from industry, Sandia supercomputers are custom machines designed for the Labs’ unique computing needs. Cronus supports mission and enterprise workloads. (Photo by Craig Fritz)

Sandia’s Atlas team, which develops and maintains Sandia’s homegrown, locally hosted generative AI tool by the same name, was one of the first users. Two of their bigger AI models needed more than four GPUs per node to run, exceeding the maximum capacity of Hops.

“State-of-the-art GPUs allow for quick model exploration, greatly increase uptime and ultimately limit supply chain risks by keeping our entire tech stack in-house,” said Atlas developer Shane Poldervaart.

Modeling and simulation teams are beginning to use the new nodes as well.

“Though Sierra porting to the new Cronus cluster is in its early stages, we are confident this new machine and its even more powerful H200 GPUs will further accelerate the trend towards real time computational informed decision making across Sandia engineering disciplines,” said Nate Crane from the Computational Simulation center.

For now, Steve said, the message from the HPC team is that if your group has an AI workload, a hybrid AI-simulation workflow, or a compute-heavy problem you’ve been shelving for lack of the right platform, this is your invitation to bring it to the orchestra.

Members of the workforce can click here for details on the system and how to access it. 

Recent articles by Troy Rummler

Top