We propose a novel global solution algorithm for the network-constrained unit commitment problem that incorporates a nonlinear alternating current (ac) model of the transmission network, which is a nonconvex mixed-integer nonlinear programming problem. Our algorithm is based on the multi-tree global optimization methodology, which iterates between a mixed-integer lower-bounding problem and a nonlinear upper-bounding problem. We exploit the mathematical structure of the unit commitment problem with ac power flow constraints and leverage second-order cone relaxations, piecewise outer approximations, and optimization-based bounds tightening to provide a globally optimal solution at convergence. Numerical results on four benchmark problems illustrate the effectiveness of our algorithm, both in terms of convergence rate and solution quality.
Many physical systems are modeled using partial differential equations (PDEs) with uncertain or random inputs. For such systems, naively propagating a fixed number of samples of the input probability law (or an approximation thereof) through the PDE is often inadequate to accurately quantify the “risk” associated with critical system responses. In this paper, we develop a goal-oriented, adaptive sampling and local reduced basis approximation for PDEs with random inputs. Our method determines a set of samples and an associated (implicit) Voronoi partition of the parameter domain on which we build local reduced basis approximations of the PDE solution. The samples are selected in an adaptive manner using an a posteriori error indicator. A notable advantage of the proposed approach is that the computational cost of the approximation during the adaptive process remains constant. We provide theoretical error bounds for our approximation and numerically demonstrate the performance of our method when compared to widely used adaptive sparse grid techniques. In addition, we tailor our approach to accurately quantify the risk of quantities of interest that depend on the PDE solution. We demonstrate our method on an advection–diffusion example and a Helmholtz example.
Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.
We develop a general risk quadrangle that gives rise to a large class of spectral risk measures. The statistic of this new risk quadrangle is the average value-at-risk at a specific confidence level. As such, this risk quadrangle generates a continuum of error measures that can be used for superquantile regression. For risk-averse optimization, we introduce an optimal approximation of spectral risk measures using quadrature. We prove the consistency of this approximation and demonstrate our results through numerical examples.
Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms of performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.
Recent advances in nanotechnology have enabled researchers to manipulate small collections of quantum-mechanical objects with unprecedented accuracy. In semiconductor quantum-dot qubits, this manipulation requires controlling the dot orbital energies, the tunnel couplings, and the electron occupations. These properties all depend on the voltages placed on the metallic electrodes that define the device, the positions of which are fixed once the device is fabricated. While there has been much success with small numbers of dots, as the number of dots grows, it will be increasingly useful to control these systems with as few electrode voltage changes as possible. Here, we introduce a protocol, which we call the "compressed optimization of device architectures" (CODA), in order both to efficiently identify sparse sets of voltage changes that control quantum systems and to introduce a metric that can be used to compare device designs. As an example of the former, we apply this method to simulated devices with up to 100 quantum dots and show that CODA automatically tunes devices more efficiently than other common nonlinear optimizers. To demonstrate the latter, we determine the optimal lateral scale for a triple quantum dot, yielding a simulated device that can be tuned with small voltage changes on a limited number of electrodes.
Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel acceleration where the context for thousands of concurrent threads are resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. The design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. To address the need for a scalable, decentralized GPU model that can model large GPUs, chiplet-based GPUs and multi-node GPUs, this report details the first steps in integrating the open-source, execution driven GPGPU-Sim into the SST framework. The first stage of this project, creates two elements: a kernel scheduler SST element accepts work from SST CPU models and schedules it to an SM-collection element that performs cycle-by-cycle timing using SSTs Mem Hierarchy to model a flexible memory system.
This document is a summary of the mathematical models that are used in the DARPA TRADES project for the solid rocket motor design challenge. It is hoped that this brief description of these models will be of use to those that are working on the project.
Since the attacks carried out against the United States on September 11, 2001, which involved the commandeering of commercial aircraft, interest has increased in performing trajectory analysis of vehicle types not constrained by roadways or railways, i.e., aircraft and watercraft. Anomalous trajectories need to be automatically identified along with other trajectories of interest to flag them for further investigation. There is also interest in analyzing trajectories without a focus on anomaly detection. Various approaches to analyzing these trajectories have been undertaken with useful results to date. In this research, we seek to augment trajectory analysis by carrying out analysis of the trajectory curvature along with other parameters, including distance and total deflection (change in direction). At each point triplet in the ordered sequence of points, these parameters are computed. Adjacent point triplets with similar values are grouped together to form a higher level of semantic categorization. These categorizations are then analyzed to form a yet higher level of categorization which has more specific semantic meaning. This top level of categorization is then summarized for all trajectories under study, allowing for fast identification of trajectories with various semantic characteristics.
Laky, Daniel; Xu, Shu; Rodriguez, Jose S.; Vaidyaraman, Shankar; Munoz, Salvador G.; Laird, Carl D.
To increase manufacturing flexibility and system understanding in pharmaceutical development, the FDA launched the quality by design (QbD) initiative. Within QbD, the design space is the multidimensional region (of the input variables and process parameters) where product quality is assured. Given the high cost of extensive experimentation, there is a need for computational methods to estimate the probabilistic design space that considers interactions between critical process parameters and critical quality attributes, as well as model uncertainty. In this paper we propose two algorithms that extend the flexibility test and flexibility index formulations to replace simulation-based analysis and identify the probabilistic design space more efficiently. The effectiveness and computational efficiency of these approaches is shown on a small example and an industrial case study.
In January 2019, the U.S. Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions for in situ data management (ISDM). The workshop defined ISDM as the practices, capabilities, and procedures to control the organization of data and enable the coordination and communication among heterogeneous tasks, executing simultaneously in a high-performance computing system, cooperating toward a common objective. The workshop revealed two primary, interdependent motivations for processing and managing data in situ. The first motivation is that the in situ methodology enables scientific discovery from a broad range of data sources over a wide scale of computing platforms: leadership-class systems, clusters, clouds, workstations, and embedded devices at the edge. The successful development of ISDM capabilities will benefit real-time decision-making, design optimization, and data-driven scientific discovery. The second motivation is the need to decrease data volumes. ISDM can make critical contributions to managing large data volumes from computations and experiments to minimize data movement, save storage space, and boost resource efficiency, often while simultaneously increasing scientific precision.
Lipscomb, William H.; Price, Stephen F.; Hoffman, Matthew J.; Leguy, Gunter R.; Bennett, Andrew R.; Bradley, Sarah L.; Evans, Katherine J.; Fyke, Jeremy G.; Kennedy, Joseph H.; Perego, Mauro P.; Ranken, Douglas M.; Sacks, William J.; Salinger, Andrew G.; Vargo, Lauren J.; Worley, Patrick H.
We describe and evaluate version 2.1 of the Community Ice Sheet Model (CISM). CISM is a parallel, 3-D thermomechanical model, written mainly in Fortran, that solves equations for the momentum balance and the thickness and temperature evolution of ice sheets. CISM's velocity solver incorporates a hierarchy of Stokes flow approximations, including shallow-shelf, depth-integrated higher order, and 3-D higher order. CISM also includes a suite of test cases, links to third-party solver libraries, and parameterizations of physical processes such as basal sliding, iceberg calving, and sub-ice-shelf melting. The model has been verified for standard test problems, including the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) experiments, and has participated in the initMIP-Greenland initialization experiment. In multimillennial simulations with modern climate forcing on a 4 km grid, CISM reaches a steady state that is broadly consistent with observed flow patterns of the Greenland ice sheet. CISM has been integrated into version 2.0 of the Community Earth System Model, where it is being used for Greenland simulations under past, present, and future climates. The code is open-source with extensive documentation and remains under active development.
We present VideoSwarm, a system for visualizing video ensembles generated by numerical simulations. VideoSwarm is a web application, where linked views of the ensemble each represent the data using a different level of abstraction. VideoSwarm uses multidimensional scaling to reveal relationships between a set of simulations relative to a single moment in time, and to show the evolution of video similarities over a span of time. VideoSwarm is a plug-in for Slycat, a web-based visualization framework which provides a web-server, database, and Python infrastructure. The Slycat framework provides support for managing multiple users, maintains access control, and requires only a Slycat supported commodity browser (such as Firefox, Chrome, or Safari).
In order to support the codesign needs of ECP applications in current and future hardware in the area of machine learning, the ExaLearn team at Sandia studied the different machine learning use cases in three different ECP applications. This report is a summary of the needs of the three applications. The Sandia ExaLearn team will develop a proxy application representative of ECP application needs, specifically the ExaSky and EXAALT ECP projects. The proxy application will allow us to demonstrate performance portable kernels within machine learning codes. Furthermore, current training scalability of machine learning networks in these applications is negatively affected by large batch sizes. Training throughput of the network will increase as batch size increases, but network accuracy and generalization worsens. The proxy application will contain hybrid model- and data-parallelism to improve training efficiency while maintaining network accuracy. The proxy application will also target optimizing 3D convolutional layers, specific to scientific machine learning, which have not been as thoroughly explored by industry.
The STDA05-17 milestone comprises the following 3 deliverables. VTK-m Release 2 We will provide a release of VTK-m software and associated documentation. The source code repository will be tagged at a stable state, and, at a minimum, tarball captures of the source code will be made available from the web site. A version of the VTK-m User's Guide documenting this release will also be made available. Productionize zfp compression The "ZFP: Compressed Floating-Point Arrays" project (WBS 1.3.4.13) is creating an implementation of ZFP compression in VTK-m. Their implementation will be focused on operating in CUDA. The VTK-m project will assist by generalizing the implementation to other devices (such as multi-core CPUs). We will also assist in productionizing the code such that it can be used by external projects and products. Clip Clip operations intersect meshes with implicit functions. It is the foundation of spatial subsetting algorithms, such as "box," and the foundation of data-based subsetting, such as "isovolume." The algorithm requires considering thousands of possible cases, and is thus quite difficult to implement. This milestone will implement clipping to be sufficient for Visit's and ParaView's needs.
The STDA05-16 milestone comprises the following 3 distinct deliverables. OpenMP VTK-m currently supports three types of devices: serial CPU, TBB, and CUDA. To run algorithms on multicore CPU-type devices (such as Xeon and Xeon Phi), TBB is required. However, there are known issues with integrating a software product using TBB with another one using OpenMP. Therefore, we will add an OpenMP device to the VTK-m software. When engaged, this device will run parallel algorithms using OpenMP directives. This will mesh more nicely with other code also using OpenMP. Rendering Topological Entities VTK-m currently supports surface rendering by tessellation of data structures,and rendering the resulting triangles. We will extend current functionality to include face, edge, and point rendering. Better Dynamic Types Impl For the best efficiency across all platforms, VTK-m algorithms use static typing with C++ templates. However, many libraries like VTK, ParaView, and Visit use dynamic types with virtual functions because data types often cannot be determined at compile time. We have an interface in VTK-m to merge these two typing mechanisms by generating all possible combinations of static types when faced with a dynamic type. Although this mechanism works, it generates very large executables and takes a very long time to compile. As we move forward, it is clear that these problems will get worse and become infeasible at exascale. We will rectify the problem by introducing some level of virtual methods, which require only a single code path, within VTK-m algorithms. This first milestone produces a design document to propose an approach to the new system.