Publications Details
Scalable Parallel Crash Simulations
We are pleased to submit our efforts in parallelizing the PRONTO application suite for con- sideration in the SuParCup 99 competition. PRONTO is a finite element transient dynamics simulator which includes a smoothed particle hydrodynamics (SPH) capability; it is similar in scope to the well-known DYNA, PamCrash, and ABAQUS codes. Our efforts over the last few years have produced a fully parallel version of the entire PRONTO code which (1) runs fast and scalably on thousands of processors, (2) has performed the largest finite-element transient dynamics simulations we are aware of, and (3) includes several new parallel algorithmic ideas that have solved some difficult problems associated with contact detection and SPH scalability. We motivate this work, describe the novel algorithmic advances, give performance numbers for PRONTO running on Sandia's Intel Teraflop machine, and highlight two prototypical large-scale computations we have performed with the parallel code. We have successfully parallelized a large-scale production transient dynamics code with a novel algorithmic approach that utilizes multiple decompositions for different key segments of the computations. To be able to simulate a more than ten million element model in a few tenths of second per timestep is unprecedented for solid dynamics simulations, especially when full global contact searches are required. The key reason is our new algorithmic ideas for efficiently parallelizing the contact detection stage. To our knowledge scalability of this computation had never before been demonstrated on more than 64 processors. This has enabled parallel PRONTO to become the only solid dynamics code we are aware of that can run effectively on 1000s of processors. More importantly, our parallel performance compares very favorably to the original serial PRONTO code which is optimized for vector supercomputers. On the container crush problem, a Teraflop node is as fast as a single processor of the Cray Jedi. This means that on the Teraflop machine we can now run simulations with tens of millions of elements thousands of times faster than we could on the Jedi! This is enabling transient dynamics simulations of unprecedented scale and fidelity. Not only can previous applications be run with vastly improved resolution and speed, but qualitatively new and different analyses have been made possible.