Computing Collaborations

Sandia and NVIDIA explore new advanced memory architectures

Image of NVIDIA-article-photo

Figure 1a. Traditional 2.5D Integration. Traditional 2.5D designs place components side-by-side, resulting in long distance data transmission that increases power consumption and limits bandwidth scaling. While easier to cool, this horizontal layout creates large physical footprints and energy bottlenecks that challenge post-exascale performance.

Figure 1b. Advanced 3D Integration. Advanced 3D integration stacks memory directly on compute layers to significantly reduce data transmission distance and energy while maximizing available bandwidth. This vertical architecture requires innovative thermal strategies and next generation hybrid copper bonding solutions to enable post-exascale computing efficiency. (Graphic courtesy of NVIDIA)


Sandia has embarked on a second round of collaboration with NVIDIA as part of the Advanced Simulation and Computing’s Advanced Memory Technology program. The AMT program brings together industry and NNSA laboratories—Sandia, Lawrence Livermore and Los Alamosto inspire and support technology research and advancement in critical areas that impact ASC mission needs. Building on investments made in the previous round, this phase will focus on developing future multi-layer memory architectures that address the growing energy cost of moving data, while improving bandwidth to meet the requirements of the post-exascale computing era.  

“Although much of the attention today focuses on the compute capabilities of chips to support AI, our experience shows we continue to need significant innovation in the memory subsystem,” said Simon Hammond, program director for advanced computing in the Office of Advanced Simulation & Computing at the NNSA. “Efficiently feeding each compute engine with data requires the most advanced memory technologies to ensure we can maximize the performance of each device.”

With DOE’s announcement of the Genesis Mission in the fall of 2025, this collaboration is positioned to respond to growing needs in artificial intelligence, while meeting current modeling and simulation demands. Future systems that combine traditional Mod-Sim and AI workflows to support the ASC mission will require accelerators that use advanced 3D integration of ultra-high bandwidth, low-latency and energy-efficient memory designs to meet aggressive application performance and efficiency targets outlined at the inception of the AMT program.

“The merging of modeling and simulation with AI for science and engineering has reached escape velocity and is now driving scientific progress around the world. Because performance of all parts of these hybrid workflows is directly dependent on memory technologies, accelerating the adoption of 3D stacked memories that can scale bandwidth, while simultaneously reducing power, is essential for extending the NNSA’s supercomputing leadership into the future,” said Dan Ernst, Senior Director of Supercomputing Products at NVIDIA.

The energy cost of moving data in and out of the processor is a challenge the computing industry faces that this collaboration is working towards optimizing. The further the processor must go to get the data, the more time and energy are required.

In current 2D designs, where components are mostly side-by-side, the distance can be quite large but cooling solutions can be placed directly against the hot component. In a 3D design, where components are piled in layers on top of each other, such as high-bandwidth memory, the distance data must travel can be reduced significantly but each layer produces its own heat. The heat is currently limited to dissipation through a surface layer, increasing the complexity of the cooling design. 

With current technology, the significant thermal complexities and manufacturing bottlenecks introduced by full 3D stacking would limit the performance of both AI and Mod-Sim workloads. This collaboration focused on overcoming these limitations to unlock future performance and energy efficiency. Clay Hughes, principal member of technical staff, said, “The tighter integration of logic and memory in 3D packages creates significant opportunities, but important challenges remain in thermal management, power delivery and yield.”

By investing in early research development and collaborating with industry partners like NVIDIA, the AMT program can advance future 2.5D and 3D memory architectures to address current and future performance requirements of ASC mission codes. In addition, researchers can explore advancements in power efficiency and cooling methodologies required for these future architectures. This early engagement is key to mitigating risks and ensuring the successful deployment of these technologies.

Though AMT’s primary focus currently remains on traditional Mod-Sim, better memory technologies will also have a significant impact on AI-driven applications like those in the Genesis Mission. The advancements in memory technology developed through this partnership will play a crucial role in shaping the future of high-performance computing, ensuring that the ASC mission is well-equipped to meet the challenges of tomorrow.

By taking a co-design approach together, Sandia, Los Alamos, Lawrence Livermore, and NVIDIA, can ensure the most important Mod-Sim and AI workloads run at peak performance and energy efficiency on future architectures.

“The Tri-labs have had a long successful relationship with NVIDIA, leveraging multiple generations of accelerator technology for production ASC mission cycles,” said James H. Laros III, Senior Scientist and AMT program lead. “We are very happy to continue our leading-edge research and development efforts with NVIDIA so future technologies can likewise benefit evolving ASC mission requirements.”

Recent Highlights

Figure 1a. Traditional 2.5D Integration. Traditional 2.5D designs place components side-by-side, resulting in long distance data...

Advancing memory technologies through strategic partnerships HARNESSING INNOVATION - This memory tester probe card with the...

Team leads from Sandia, Los Alamos, and Lawrence Livermore National Laboratories stand in front of Sandia’s Computer Science Research Institute for a workshop to collaborate in person. Twenty two individuals are photographed.

THE TRI-LABS AND CEREBRAS MEET AT SANDIA – The program and team leads from Sandia, Los...

New neuromorphic computing system arrives at Sandia THE HARDWARE BEHIND THE BRAIN — NERL Braunfels consists...

Technician positioning Cerebras wafer enclosure for installation in Kingfisher.

In a partnership just reaching two years, Sandia and Cerebras Systems have unveiled a cluster composed...

Physicists from Sandia worked with a team of researchers at Google Quantum AI and other affiliated...

Concept rendering of advanced memory packaging

Sandia National Laboratories has announced a new project with NVIDIA to pursue critical research and development...

A schematic of the DDN Infinia architecture showing the Control Plane, Data Services and the Data Plane

Sandia National Laboratories and DataDirect Networks (DDN) have partnered over the last 3.5+ years to design...

A computer board featuring first of its kind runtime reconfigurable accelerator technology

Sandia National Laboratories, leading a tri-lab consortium with Lawrence Livermore National Laboratory (LLNL) and Los Alamos...

Sandia National Laboratories has announced a partnership with AI and neuromorphic computing company, SpiNNcloud. Leveraging a next...

About NNSA: Established by Congress in 2000, NNSA is a semi-autonomous agency within the U.S. Department of Energy responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and militarily effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad.