Machine learning models are promising as surrogates in optimization when replacing difficult to solve equations or black-box type models. This work demonstrates the viability of linear model decision trees as piecewise-linear surrogates in decision-making problems. Linear model decision trees can be represented exactly in mixed-integer linear programming (MILP) and mixed-integer quadratic constrained programming (MIQCP) formulations. Furthermore, they can represent discontinuous functions, bringing advantages over neural networks in some cases. We present several formulations using transformations from Generalized Disjunctive Programming (GDP) formulations and modifications of MILP formulations for gradient boosted decision trees (GBDT). We then compare the computational performance of these different MILP and MIQCP representations in an optimization problem and illustrate their use on engineering applications. We observe faster solution times for optimization problems with linear model decision tree surrogates when compared with GBDT surrogates using the Optimization and Machine Learning Toolkit (OMLT).
This report documents the Resilience Enhancements through Deep Learning Yields (REDLY) project, a three-year effort to improve electrical grid resilience by developing scalable methods for system operators to protect the grid against threats leading to interrupted service or physical damage. The computational complexity and uncertain nature of current real-world contingency analysis presents significant barriers to automated, real-time monitoring. While there has been a significant push to explore the use of accurate, high-performance machine learning (ML) model surrogates to address this gap, their reliability is unclear when deployed in high-consequence applications such as power grid systems. Contemporary optimization techniques used to validate surrogate performance can exploit ML model prediction errors, which necessitates the verification of worst-case performance for the models.
This work focuses on estimation of unknown states and parameters in a discrete-time, stochastic, SEIR model using reported case counts and mortality data. An SEIR model is based on classifying individuals with respect to their status in regards to the progression of the disease, where S is the number individuals who remain susceptible to the disease, E is the number of individuals who have been exposed to the disease but not yet infectious, I is the number of individuals who are currently infectious, and R is the number of recovered individuals. For convenience, we include in our notation the number of infections or transmissions, T, that represents the number of individuals transitioning from compartment S to compartment E over a particular interval. Similarly, we use C to represent the number of reported cases.
Sandia National Laboratories has developed a capability to estimate parameters of epidemiological models from case reporting data to support responses to the COVID-19 pandemic. A differentiating feature of this work is the ability to simultaneously estimate county-specific disease transmission parameters in a nation-wide model that considers mobility between counties. The approach is focused on estimating parameters in a stochastic SEIR model that considers mobility between model patches (i.e., counties) as well as additional infectious compartments. The inference engine developed by Sandia includes (1) reconstruction and (2) transmission parameter inference. Reconstruction involves estimating current population counts within each of the compartments in a modified SEIR model from reported case data. Reconstruction produces input for the inference formulations, and it provides initial conditions that can be used in other modeling and planning efforts. Inference involves the solution of a large-scale optimization problem to estimate the time profiles for the transmission parameters in each county. These provide quantification of changes in the transmission parameter over time (e.g., due to impact of intervention strategies). This capability has been implemented in a Python-based software package, epi_inference, that makes extensive use of Pyomo [5] and IPOPT [10] to formulate and solve the inference formulations.
This report summarizes the activities performed as part of the Science and Engineering of Cybersecurity by Uncertainty quantification and Rigorous Experimentation (SECURE) Grand Challenge LDRD project. We provide an overview of the research done in this project, including work on cyber emulation, uncertainty quantification, and optimization. We present examples of integrated analyses performed on two case studies: a network scanning/detection study and a malware command and control study. We highlight the importance of experimental workflows and list references of papers and presentations developed under this project. We outline lessons learned and suggestions for future work.
PAO is a Python-based package for Adversarial Optimization. The goal of this package is to provide a general modeling and analysis capability for bilevel, trilevel and other multilevel optimization forms that express adversarial dynamics. PAO integrates two different modeling abstractions: 1. Algebraic models extend the modeling concepts in the Pyomo algebraic modeling language to express problems with an intuitive algebraic syntax. Thus, we expect that this modeling abstraction will commonly be used by PAO end-users. 2. Compact models express objective and constraints in a manner that is typically used to express the mathematical form of these problems (e.g. using vector and matrix data types). PAO denes custom Multilevel Problem Representations (MPRs) that simplify the implementation of solvers for bilevel, trilevel and other multilevel optimization problems.
Securing cyber systems is of paramount importance, but rigorous, evidence-based techniques to support decision makers for high-consequence decisions have been missing. The need for bringing rigor into cybersecurity is well-recognized, but little progress has been made over the last decades. We introduce a new project, SECURE, that aims to bring more rigor into cyber experimentation. The core idea is to follow the footsteps of computational science and engineering and expand similar capabilities to support rigorous cyber experimentation. In this paper, we review the cyber experimentation process, present the research areas that underlie our effort, discuss the underlying research challenges, and report on our progress to date. This paper is based on work in progress, and we expect to have more complete results for the conference.
Remote sensing systems have firmly established a role in providing immense value to commercial industry, scientific exploration, and national security. Continued maturation of sensing technology has reduced the cost of deploying highly-capable sensors while at the same time increased reliance on the information these sensors can provide. The demand for time on these sensors is unlikely to diminish. Coordination of next-generation sensor systems, larger constellations of satellites, unmanned aerial vehicles, ground telescopes, etc. is prohibitively complex for existing heuristics-based scheduling techniques. The project was a two-year collaboration spanning multiple Sandia centers and included a partnership with Texas A&M University. We have developed algorithms and software for collection scheduling, remote sensor field-of-view pointing models, and bandwidth-constrained prioritization of sensor data. Our approach followed best practices from the operations research and computational geometry communities. These models provide several advantages over state of the art techniques. In particular, our approach is more flexible compared to heuristics that tightly couple models and solution techniques. First, our mixed-integer linear models afford a rigorous analysis so that sensor planners can quantitatively describe a schedule relative to the best possible. Optimal or near-optimal schedules can be produced with commercial solvers in operational run-times. These models can be modified and extended to incorporate different scheduling and resource constraints and objective function definitions. Further, we have extended these models to proactively schedule sensors under weather and ad hoc collection uncertainty. This approach stands in contrast to existing deterministic schedulers which assume a single future weather or ad hoc collection scenario. The field-of-view pointing algorithm produces a mosaic with the fewest number of images required to fully cover a region of interest. The bandwidth-constrained algorithms find the highest priority information that can be transmitted. All of these are based on mixed-integer linear programs so that, in the future, collection scheduling, field-of-view, and bandwidth prioritization can be combined into a single problem. Experiments conducted using the developed models, commercial solvers, and benchmark datasets have demonstrated that proactively scheduling against uncertainty regularly and significantly outperforms deterministic schedulers.
We describe new capabilities for modeling bilevel programs within the Pyomo modeling software. These capabilities include new modeling components that represent subproblems, modeling transformations for re-expressing models with bilevel structure in other forms, and optimize bilevel programs with meta-solvers that apply transformations and then perform op- timization on the resulting model. We illustrate the breadth of Pyomo's modeling capabilities for bilevel programs, and we describe how Pyomo's meta-solvers can perform local and global optimization of bilevel programs.
Sandia has approached the analysis of big datasets with an integrated methodology that uses computer science, image processing, and human factors to exploit critical patterns and relationships in large datasets despite the variety and rapidity of information. The work is part of a three-year LDRD Grand Challenge called PANTHER (Pattern ANalytics To support High-performance Exploitation and Reasoning). To maximize data analysis capability, Sandia pursued scientific advances across three key technical domains: (1) geospatial-temporal feature extraction via image segmentation and classification; (2) geospatial-temporal analysis capabilities tailored to identify and process new signatures more efficiently; and (3) domain- relevant models of human perception and cognition informing the design of analytic systems. Our integrated results include advances in geographical information systems (GIS) in which we discover activity patterns in noisy, spatial-temporal datasets using geospatial-temporal semantic graphs. We employed computational geometry and machine learning to allow us to extract and predict spatial-temporal patterns and outliers from large aircraft and maritime trajectory datasets. We automatically extracted static and ephemeral features from real, noisy synthetic aperture radar imagery for ingestion into a geospatial-temporal semantic graph. We worked with analysts and investigated analytic workflows to (1) determine how experiential knowledge evolves and is deployed in high-demand, high-throughput visual search workflows, and (2) better understand visual search performance and attention. Through PANTHER, Sandia's fundamental rethinking of key aspects of geospatial data analysis permits the extraction of much richer information from large amounts of data. The project results enable analysts to examine mountains of historical and current data that would otherwise go untouched, while also gaining meaningful, measurable, and defensible insights into overlooked relationships and patterns. The capability is directly relevant to the nation's nonproliferation remote-sensing activities and has broad national security applications for military and intelligence- gathering organizations.
We describe new capabilities for modeling MPEC problems within the Pyomo modeling software. These capabilities include new modeling components that represent complementarity conditions, modeling transformations for re-expressing models with complementarity conditions in other forms, and meta-solvers that apply transformations and numeric optimization solvers to optimize MPEC problems. We illustrate the breadth of Pyomo's modeling capabilities for MPEC problems, and we describe how Pyomo's meta-solvers can perform local and global optimization of MPEC problems.
The Water Security Toolkit (WST) is a suite of open source software tools that can be used by water utilities to create response strategies to reduce the impact of contamination in a water distribution network . WST includes hydraulic and water quality modeling software , optimizati on methodologies , and visualization tools to identify: (1) sensor locations to detect contamination, (2) locations in the network in which the contamination was introduced, (3) hydrants to remove contaminated water from the distribution system, (4) locations in the network to inject decontamination agents to inactivate, remove, or destroy contaminants, (5) locations in the network to take grab sample s to help identify the source of contamination and (6) valves to close in order to isolate contaminate d areas of the network. This user manual describes the different components of WST , along w ith examples and case studies. License Notice The Water Security Toolkit (WST) v.1.2 Copyright c 2012 Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000, there is a non-exclusive license for use of this work by or on behalf of the U.S. government. This software is distributed under the Revised BSD License (see below). In addition, WST leverages a variety of third-party software packages, which have separate licensing policies: Acro Revised BSD License argparse Python Software Foundation License Boost Boost Software License Coopr Revised BSD License Coverage BSD License Distribute Python Software Foundation License / Zope Public License EPANET Public Domain EPANET-ERD Revised BSD License EPANET-MSX GNU Lesser General Public License (LGPL) v.3 gcovr Revised BSD License GRASP AT&T Commercial License for noncommercial use; includes randomsample and sideconstraints executable files LZMA SDK Public Domain nose GNU Lesser General Public License (LGPL) v.2.1 ordereddict MIT License pip MIT License PLY BSD License PyEPANET Revised BSD License Pyro MIT License PyUtilib Revised BSD License PyYAML MIT License runpy2 Python Software Foundation License setuptools Python Software Foundation License / Zope Public License six MIT License TinyXML zlib License unittest2 BSD License Utilib Revised BSD License virtualenv MIT License Vol Common Public License vpykit Revised BSD License Additionally, some precompiled WST binary distributions might bundle other third-party executables files: Coliny Revised BSD License (part of Acro project) Dakota GNU Lesser General Public License (LGPL) v.2.1 PICO Revised BSD License (part of Acro project) i Revised BSD License Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Sandia National Laboratories nor Sandia Corporation nor the names of its con- tributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IM- PLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUD- ING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ii Acknowledgements This work was supported by the U.S. Environmental Protection Agency through its Office of Research and Development (Interagency Agreement # DW8992192801). The material in this document has been subject to technical and policy review by the U.S. EPA, and approved for publication. The views expressed by individual authors, however, are their own, and do not necessarily reflect those of the U.S. Environmental Protection Agency. Mention of trade names, products, or services does not convey official U.S. EPA approval, endorsement, or recommendation. The Water Security Toolkit is an extension of the Threat Ensemble Vulnerability Assessment-Sensor Place- ment Optimization Tool (TEVA-SPOT), which was also developed with funding from the U.S. Environ- mental Protection Agency through its Office of Research and Development (Interagency Agreement # DW8992192801). The authors acknowledge the following individuals for their contributions to the devel- opment of TEVA-SPOT: Jonathan Berry (Sandia National Laboratories), Erik Boman (Sandia National Laboratories), Lee Ann Riesen (Sandia National Laboratories), James Uber (University of Cincinnati), and Jean-Paul Watson (Sandia National Laboratories). iii Acronyms ATUS American Time-Use Survey BLAS Basic linear algebra sub-routines CFU Colony-forming unit CVAR Conditional value at risk CWS Contamination warning system EA Evolutionary algorithm EDS Event detection system EPA U.S. Environmental Protection Agency EC Extent of Contamination ERD EPANET results database file GLPK GNU Linear Programming Kit GRASP Greedy randomized adaptive sampling process HEX Hexadecimal HTML HyperText markup language INP EPANET input file LP Linear program MC Mass consumed MILP Mixed integer linear program MIP Mixed integer program MSX Multi-species extension for EPANET NFD Number of failed detections NS Number of sensors NZD Non-zero demand PD Population dosed PE Population exposed PK Population killed TAI Threat assessment input file TCE Tailed-conditioned expectation TD Time to detection TEC Timed extent of contamination TEVA Threat ensemble vulnerability assessment TSB Tryptic soy broth TSG Threat scenario generation file TSI Threat simulation input file VAR Value at risk VC Volume consumed WST Water Security Toolkit YML YAML configuration file format for WST iv Symbols Notation Definition Example { , } set brackets { 1,2,3 } means a set containing the values 1,2, and 3. [?] is an element of s [?] S means that s is an element of the set S . [?] for all s = 1 [?] s [?] S means that the statement s = 1 is true for all s in set S . P summation P n i =1 s i means s 1 + s 2 + * * * + s n . \ set minus S \ T means the set that contains all those elements of S that are not in set T . %7C given %7C is used to define conditional probability. P ( s %7C t ) means the prob- ability of s occurring given that t occurs. %7C ... %7C cardinality Cardinality of a set is the number of elements of the set. If set S = { 2,4,6 } , then %7C S %7C = 3. v
Decision makers increasingly rely on large-scale computational models to simulate and analyze complex man-made systems. For example, computational models of national infrastructures are being used to inform government policy, assess economic and national security risks, evaluate infrastructure interdependencies, and plan for the growth and evolution of infrastructure capabilities. A major challenge for decision makers is the analysis of national-scale models that are composed of interacting systems: effective integration of system models is difficult, there are many parameters to analyze in these systems, and fundamental modeling uncertainties complicate analysis. This project is developing optimization methods to effectively represent and analyze large-scale heterogeneous system of systems (HSoS) models, which have emerged as a promising approach for describing such complex man-made systems. These optimization methods enable decision makers to predict future system behavior, manage system risk, assess tradeoffs between system criteria, and identify critical modeling uncertainties.
World Environmental and Water Resources Congress 2011: Bearing Knowledge for Sustainability - Proceedings of the 2011 World Environmental and Water Resources Congress
We consider the problem of placing a limited number of sensors in a municipal water distribution network to minimize the impact over a given suite of contamination incidents. In its simplest form, the sensor placement problem is a p-median problem that has structure extremely amenable to exact and heuristic solution methods. We describe the solution of real-world instances using integer programming or local search or a Lagrangian method. The Lagrangian method is necessary for solution of large problems on small PCs. We summarize a number of other heuristic methods for effectively addressing issues such as sensor failures, tuning sensors based on local water quality variability, and problem size/approximation quality tradeoffs. These algorithms are incorporated into the TEVA-SPOT toolkit, a software suite that the US Environmental Protection Agency has used and is using to design contamination warning systems for US municipal water systems.
These slides describe different strategies for installing Python software. Although I am a big fan of Python software development, robust strategies for software installation remains a challenge. This talk describes several different installation scenarios. The Good: the user has administrative privileges - Installing on Windows with an installer executable, Installing with Linux application utility, Installing a Python package from the PyPI repository, and Installing a Python package from source. The Bad: the user does not have administrative privileges - Using a virtual environment to isolate package installations, and Using an installer executable on Windows with a virtual environment. The Ugly: the user needs to install an extension package from source - Installing a Python extension package from source, and PyCoinInstall - Managing builds for Python extension packages. The last item referring to PyCoinInstall describes a utility being developed for the COIN-OR software, which is used within the operations research community. COIN-OR includes a variety of Python and C++ software packages, and this script uses a simple plug-in system to support the management of package builds and installation.
The Python Optimization Modeling Objects (Pyomo) package [1] is an open source tool for modeling optimization applications within Python. Pyomo provides an objected-oriented approach to optimization modeling, and it can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers. While Pyomo provides a capability that is commonly associated with algebraic modeling languages such as AMPL, AIMMS, and GAMS, Pyomo's modeling objects are embedded within a full-featured high-level programming language with a rich set of supporting libraries. Pyomo leverages the capabilities of the Coopr software library [2], which integrates Python packages (including Pyomo) for defining optimizers, modeling optimization applications, and managing computational experiments. A central design principle within Pyomo is extensibility. Pyomo is built upon a flexible component architecture [3] that allows users and developers to readily extend the core Pyomo functionality. Through these interface points, extensions and applications can have direct access to an optimization model's expression objects. This facilitates the rapid development and implementation of new modeling constructs and as well as high-level solution strategies (e.g. using decomposition- and reformulation-based techniques). In this presentation, we will give an overview of the Pyomo modeling environment and model syntax, and present several extensions to the core Pyomo environment, including support for Generalized Disjunctive Programming (Coopr GDP), Stochastic Programming (PySP), a generic Progressive Hedging solver [4], and a tailored implementation of Bender's Decomposition.
We consider the problem of placing sensors in a municipal water network when we can choose both the location of sensors and the sensitivity and specificity of the contamination warning system. Sensor stations in a municipal water distribution network continuously send sensor output information to a centralized computing facility, and event detection systems at the control center determine when to signal an anomaly worthy of response. Although most sensor placement research has assumed perfect anomaly detection, signal analysis software has parameters that control the tradeoff between false alarms and false negatives. We describe a nonlinear sensor placement formulation, which we heuristically optimize with a linear approximation that can be solved as a mixed-integer linear program. We report the results of initial experiments on a real network and discuss tradeoffs between early detection of contamination incidents, and control of false alarms.
Biofouling, the unwanted growth of biofilms on a surface, of water-treatment membranes negatively impacts in desalination and water treatment. With biofouling there is a decrease in permeate production, degradation of permeate water quality, and an increase in energy expenditure due to increased cross-flow pressure needed. To date, a universal successful and cost-effect method for controlling biofouling has not been implemented. The overall goal of the work described in this report was to use high-performance computing to direct polymer, material, and biological research to create the next generation of water-treatment membranes. Both physical (micromixers - UV-curable epoxy traces printed on the surface of a water-treatment membrane that promote chaotic mixing) and chemical (quaternary ammonium groups) modifications of the membranes for the purpose of increasing resistance to biofouling were evaluated. Creation of low-cost, efficient water-treatment membranes helps assure the availability of fresh water for human use, a growing need in both the U. S. and the world.
Currently, electrical power generation uses about 140 billion gallons of water per day accounting for over 39% of all freshwater withdrawals thus competing with irrigated agriculture as the leading user of water. Coupled to this water use is the required pumping, conveyance, treatment, storage and distribution of the water which requires on average 3% of all electric power generated. While water and energy use are tightly coupled, planning and management of these fundamental resources are rarely treated in an integrated fashion. Toward this need, a decision support framework has been developed that targets the shared needs of energy and water producers, resource managers, regulators, and decision makers at the federal, state and local levels. The framework integrates analysis and optimization capabilities to identify trade-offs, and 'best' alternatives among a broad list of energy/water options and objectives. The decision support framework is formulated in a modular architecture, facilitating tailored analyses over different geographical regions and scales (e.g., national, state, county, watershed, NERC region). An interactive interface allows direct control of the model and access to real-time results displayed as charts, graphs and maps. Ultimately, this open and interactive modeling framework provides a tool for evaluating competing policy and technical options relevant to the energy-water nexus.