Cyber defense is an asymmetric battle today. We need to understand better what options are available for providing defenders with possible advantages. Our project combines machine learning, optimization, and game theory to obscure our defensive posture from the information the adversaries are able to observe. The main conceptual contribution of this research is to separate the problem of prediction, for which machine learning is used, and the problem of computing optimal operational decisions based on such predictions, coupled with a model of adversarial response. This research includes modeling of the attacker and defender, formulation of useful optimization models for studying adversarial interactions, and user studies to measure the impact of the modeling approaches in realistic settings.
We consider four-dimensional qudits as qubit pairs and their qudit Pauli operators as qubit Clifford operators. This introduces a nesting, C21 ⊂ C42 ⊂ C23, where Cmn is the nth level of the m-dimensional qudit Clifford hierarchy. If we can convert between logical qubits and qudits, then qudit Clifford operators are qubit non-Clifford operators. Conversion is achieved by qubit fusion and qudit fission using stabilizer circuits that consume a resource state. This resource is a fused qubit stabilizer state with a fault- tolerant state preparation using stabilizer circuits.
Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with data generation rates. In situ analytics is an increasingly popular alternative in which post-simulation processing is embedded into an application, running as part of the same MPI job. This can reduce data movement costs but introduces a new potential source of interference for the application. Using a validated simulation-based approach, we investigate how best to mitigate the interference from time-shared in situ tasks for a number of key extreme-scale workloads. This paper makes a number of contributions. First, we show that the independent scheduling of in situ analytics tasks can significantly degradation application performance, with slowdowns exceeding 1000%. Second, we demonstrate that the degree of synchronization found in many modern collective algorithms is sufficient to significantly reduce the overheads of this interference to less than 10% in most cases. Finally, we show that many applications already frequently invoke collective operations that use these synchronizing MPI algorithms. Therefore, the syncronization introduced by these MPI collective algorithms can be leveraged to efficiently schedule analytics tasks with minimal changes to existing applications. This paper provides critical analysis and guidance for MPI users and developers on the importance of scheduling in situ analytics tasks. It shows the degree of synchronization needed to mitigate the performance impacts of these time-shared coupled codes and demonstrates how that synchronization can be realized in an extreme-scale environment using modern collective algorithms.
Holmes, Daniel; Mohror, Kathryn; Grant, Ryan E.; Skjellum, Anthony; Schulz, Martin; Bland, Wesley; Squyres, Jeffrey M.
MPI includes all processes in MPI COMM WORLD; this is untenable for reasons of scale, resiliency, and overhead. This paper offers a new approach, extending MPI with a new concept called Sessions, which makes two key contributions: a tighter integration with the underlying runtime system; and a scalable route to communication groups. This is a fundamental change in how we organise and address MPI processes that removes well-known scalability barriers by no longer requiring the global communicator MPI COMM - WORLD.
Exascale systems promise the potential for computation atunprecedented scales and resolutions, but achieving exascale by theend of this decade presents significant challenges. A key challenge isdue to the very large number of cores and components and the resultingmean time between failures (MTBF) in the order of hours orminutes. Since the typical run times of target scientific applicationsare longer than this MTBF, fault tolerance techniques will beessential. An important class of failures that must be addressed isprocess or node failures. While checkpoint/restart (C/R) is currentlythe most widely accepted technique for addressing processor failures, coordinated, stable-storage-based global C/R might be unfeasible atexascale when the time to checkpoint exceeds the expected MTBF. This paper explores transparent recovery via implicitly coordinated, diskless, application-driven checkpointing as a way to tolerateprocess failures in MPI applications at exascale. The discussedapproach leverages User Level Failure Mitigation (ULFM), which isbeing proposed as an MPI extension to allow applications to createpolicies for tolerating process failures. Specifically, this paper demonstrates how different implementations ofapplication-driven in-memory checkpoint storage and recovery comparein terms of performance and scalability. We also experimentally evaluate the effectiveness and scalability ofthe Fenix online global recovery framework on a production system-the Titan Cray XK7 at ORNL-and demonstrate the ability of Fenix totolerate dynamically injected failures using the execution of fourbenchmarks and mini-applications with different behaviors.
Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016
Baseman, Elisabeth; Debardeleben, Nathan; Ferreira, Kurt B.; Levy, Scott L.; Raasch, Steven; Sridharan, Vilas; Siddiqua, Taniya; Guan, Qiang
As high-performance computing systems continue to grow in scale and complexity, the study of faults and errors is critical to the design of future systems and mitigation schemes. Fault modes in system DRAM are a frequently-investigated key aspect of memory reliability. While current schemes require offline analysis for proper classification, current state-of-the-art mitigation techniques require accurate online prediction for optimal performance. In this work, we explore the predictive performance of an online machine learning-based approach in classifying DRAM fault modes from two leadership-class supercomputing facilities. Our results compare the predictive performance of this online approach with the current rule-based approach based on expert knowledge, finding a 12% predictive performance improvement. We also investigate the universality of our classifiers by evaluating predictive performance using training data from disparate computing systems to achieve a 7% improvement in predictive performance. Our work provides a critical analysis of this online learning technique and can benefit system designers to help inform best practices for dealing with reliability on future systems.
A critical aspect of applying modern computational solution methods to complex multiphysics systems of relevance to nuclear reactor modeling, is the assessment of the predictive capability of specific proposed mathematical models. In this respect the understanding of numerical error, the sensitivity of the solution to parameters associated with input data, boundary condition uncertainty, and mathematical models is critical. Additionally, the ability to evaluate and or approximate the model efficiently, to allow development of a reasonable level of statistical diagnostics of the mathematical model and the physical system, is of central importance. In this study we report on initial efforts to apply integrated adjoint-based computational analysis and automatic differentiation tools to begin to address these issues. The study is carried out in the context of a Reynolds averaged Navier-Stokes approximation to turbulent fluid flow and heat transfer using a particular spatial discretization based on implicit fully-coupled stabilized FE methods. Initial results are presented that show the promise of these computational techniques in the context of nuclear reactor relevant prototype thermal-hydraulics problems.
Proceedings of the National Academy of Sciences of the United States of America
Du, Huiyun; Deng, Wei; Aimone, James B.; Ge, Minyan; Parylak, Sarah; Walch, Keenan; Cook, Jonathan; Zhang, Wei; Song, Huina; Wang, Liping; Gage, Fred H.; Mu, Yangling
Rewarding experiences are often well remembered, and such memory formation is known to be dependent on dopamine modulation of the neural substrates engaged in learning and memory; however, it is unknown how and where in the brain dopamine signals bias episodic memory toward preceding rather than subsequent events. Here we found that photostimulation of channelrhodopsin-2-expressing dopaminergic fibers in the dentate gyrus induced a long-term depression of cortical inputs, diminished theta oscillations, and impaired subsequent contextual learning. Computational modeling based on this dopamine modulation indicated an asymmetric association of events occurring before and after reward in memory tasks. In subsequent behavioral experiments, preexposure to a natural reward suppressed hippocampus-dependent memory formation, with an effective time window consistent with the duration of dopamine-induced changes of dentate activity. Overall, our results suggest a mechanism by which dopamine enables the hippocampus to encode memory with reduced interference from subsequent experience.
Newton–Krylov solvers for ocean tracers have the potential to greatly decrease the computational costs of spinning up deep-ocean tracers, which can take several thousand model years to reach equilibrium with surface processes. One version of the algorithm uses offline tracer transport matrices to simulate an annual cycle of tracer concentrations and applies Newton's method to find concentrations that are periodic in time. Here we present the impact of time-averaging the transport matrices on the equilibrium values of an ideal-age tracer. We compared annually-averaged, monthly-averaged, and 5-day-averaged transport matrices to an online simulation using the ocean component of the Community Earth System Model (CESM) with a nominal horizontal resolution of 1° × 1° and 60 vertical levels. We found that increasing the time resolution of the offline transport model reduced a low age bias from 12% for the annually-averaged transport matrices, to 4% for the monthly-averaged transport matrices, and to less than 2% for the transport matrices constructed from 5-day averages. The largest differences were in areas with strong seasonal changes in the circulation, such as the Northern Indian Ocean. For many applications the relatively small bias obtained using the offline model makes the offline approach attractive because it uses significantly less computer resources and is simpler to set up and run.