Controlling uncertainty when optimizing supercomputer simulations
Working to solve a problem, supercomputing researchers may encounter incomplete data or flawed programs. For both issues, Sandia researcher Drew Kouri has attracted interest from the broad computing community for his ability to mitigate uncertainty in both supercomputer programs and data, optimizing each to reach the best solutions.
His research was awarded a best-paper designation for 2019 in the journal Optimization Letters and now has earned him a DOE Early Career Research Program grant, titled “Adaptive and Fault-Tolerant Algorithms for Data-Driven Optimization, Design, and Learning.”
The DOE grant, for Advanced Scientific Computing Research, provides about $500,000 per year for five years and is expected to cover Drew’s salary and research expenses, including the salaries of postdoctoral assistants.
“Maintaining our nation’s brain trust of world-class scientists and researchers is one of DOE’s top priorities, and that means we need to give them the resources they need to succeed early on in their careers,” Secretary of Energy Jennifer M. Granholm said. The grant is one of 83 distributed this year by the 12-year-old program.
Drew’s optimization algorithms solve complex problems in technical fields that may involve uncertain responses from subcomponents. Among those of interest to him are interactions between ice sheets and sea ice in climate models, and between fuel pellets and protective cladding in light-water nuclear reactors. Other applications for Drew’s optimization algorithms are radio frequency cavity designs for particle accelerators, energy network resource allocation, parameter estimation in seismology and the training of machine-learning models.
His methods involve novel online modeling approaches that he expects to ensure rapid convergence to an optimal solution. He overcomes the performance degradation common to conventional algorithms as problem sizes increase by using the technique of randomized sketching, which uses randomized projections to reduce the dimensionality of the data.
Already the joint author of more than 20 papers on the theme of developing algorithms for risk minimization, Drew earned his doctorate in 2012 from Rice University in Houston, Texas, in computational and applied mathematics. (His dissertation title, which seems a signpost for Drew’s later work, is “An Approach for the Adaptive Solution of Optimization Problems Governed by Partial Differential Equations with Uncertain Coefficients.”)
He served as the J. H. Wilkinson Fellow at Argonne National Laboratory before coming to Sandia in 2013, where he became a lead developer of the Rapid Optimization Library, an optimization software package.
Resilient algorithms to optimize extreme-scale simulations
Among the challenges that draw Drew’s interest are uncertainties about how next-generation computing platforms will perform, as well as uncertainties associated with the operating and environmental conditions for the system being modelled.
“I am developing optimization algorithms to produce solutions that are resilient to faults and errors induced by three factors: next-generation supercomputing hardware, physical data insufficiencies and uncertainties in the model,” he said. “While it may not always be possible to ‘reduce uncertainty,’ still, one must make a decision that accounts for the uncertainty.”
His algorithms, which handle uncertainties by mathematically quantifying their effects, are a way for supercomputer programmers to work around mistakes caused by error-prone hardware or software “without throwing an entire day’s work away,” he suggests.
Since uncertainties may grow as researchers at the national laboratories and other supercomputing locations upgrade their computers from petascale (a million billion operations per second) to exascale (a billion billion operations per second), Drew notes that the increased speed and data flow of the incoming machines may magnify omissions and other errors.
“My work,” he said, “aims to help engineers formulate and efficiently solve optimization problems that account for these uncertainties.”
AI and machine learning relationships
He sees a relation between his work, artificial intelligence and machine learning, all of which use optimization techniques to reach their solutions.
“Machine learning and AI problems are typically posed as optimization problems,” he said, “and the algorithms that I develop could be applied to solve them.” The difference lies in how the problems are modelled.
“Machine learning and artificial intelligence models are often not motivated by physics. For the problems that I consider, the models typically come from physical laws,” he said.
According to the Early Career grant description of Drew’s intent, his “work will permit inexpensive approximations during early iterations (and)… employ randomized compression of individual components to lessen the memory burden and to safeguard against hardware faults and failures.”