Protecting against multi-step attacks of uncertain start times and duration forces the defenders into indefinite, always ongoing, resource-intensive response. To allocate resources effectively, the defender must analyze and respond to an uncertain stream of potentially undetected multiple multi-step attacks and take measures of attack and response intensity over time into account. Such response requires estimation of overall attack success metrics and evaluating effect of defender strategies and actions associated with specific attack steps on overall attack metrics. We present a novel game-theoretic approach GPLADD to attack metrics estimation and demonstrate it on attack data derived from MITRE's ATT&CK Framework and other sources. In GPLADD, the time to complete attack steps is explicit; the attack dynamics emerges from attack graph and attacker-defender capabilities and strategies and therefore reflects 'physics' of attacks. The time the attacker takes to complete an attack step is drawn from a probability distribution determined by attacker and defender strategies and capabilities. This makes time a physical constraint on attack success parameters and enables comparing different defender resource allocation strategies across different attacks. We solve for attack success metrics by approximating attacker-defender games as discrete-time Markov chains and show evaluation of return on detection investments associated with different attack steps. We apply GPLADD to MITRE's APT3 data from ATT&CK Framework and show that there are substantial and un-intuitive differences in estimated real-world vendor performance against a simplified APT3 attack. We focus on metrics that reflect attack difficulty versus attacker ability to remain hidden in the system after gaining control. This enables practical defender optimization and resource allocation against multi-step attacks.
Network intrusion detection systems (NIDS) are commonly used to detect malware communications, including command-and-control (C2) traffic from botnets. NIDS performance assessments have been studied for decades, but mathematical modeling has rarely been used to explore NIDS performance. This paper details a mathematical model that describes a NIDS performing packet inspection and its detection of malware's C2 traffic. Here, the paper further describes an emulation testbed and a set of cyber experiments that used the testbed to validate the model. These experiments included a commonly used NIDS (Snort) and traffic with contents from a pervasive malware (Emotet). Results are presented for two scenarios: a nominal scenario and a “stressed” scenario in which the NIDS cannot process all incoming packets. Model and experiment results match well, with model estimates mostly falling within 95 % confidence intervals on the experiment means. Model results were produced 70-3000 times faster than the experimental results. Consequently, the model's predictive capability could potentially be used to support decisions about NIDS configuration and effectiveness that require high confidence results, quantification of uncertainty, and exploration of large parameter spaces. Furthermore, the experiments provide an example for how emulation testbeds can be used to validate cyber models that include stochastic variability.
Virtual machine emulation environments provide ideal testbeds for cybersecurity evaluations because they run real software binaries in a scalable, offline test setting that is suitable for assessing the impacts of software security flaws on the system. Verification of such emulations determines whether the environment is working as intended. Verification can focus on various aspects such as timing realism, traffic realism, and resource realism. In this paper, we study resource realism and issues associated with virtual machine resource utilization. We examine telemetry metrics gathered from a series of structured experiments which involve large numbers of parallel emulations meant to oversubscribe resources at some point. We present an approach to use telemetry metrics for emulation verification, and we demonstrate this approach on two cyber scenarios. Descriptions of the experimental configurations are provided along with a detailed discussion of statistical tests used to compare telemetry metrics. Results demonstrate the potential for a structured experimental framework, combined with statistical analysis of telemetry metrics, to support emulation verification. We conclude with comments on generalizability and potential future work.
This report summarizes the activities performed as part of the Science and Engineering of Cybersecurity by Uncertainty quantification and Rigorous Experimentation (SECURE) Grand Challenge LDRD project. We provide an overview of the research done in this project, including work on cyber emulation, uncertainty quantification, and optimization. We present examples of integrated analyses performed on two case studies: a network scanning/detection study and a malware command and control study. We highlight the importance of experimental workflows and list references of papers and presentations developed under this project. We outline lessons learned and suggestions for future work.
This report presents the results of the “Foundations of Rigorous Cyber Experimentation” (FORCE) Laboratory Directed Research and Development (LDRD) project. This project is a companion project to the “Science and Engineering of Cyber security through Uncertainty quantification and Rigorous Experimentation” (SECURE) Grand Challenge LDRD project. This project leverages the offline, controlled nature of cyber experimentation technologies in general, and emulation testbeds in particular, to assess how uncertainties in network conditions affect uncertainties in key metrics. We conduct extensive experimentation using a Firewheel emulation-based cyber testbed model of Invisible Internet Project (I2P) networks to understand a de-anonymization attack formerly presented in the literature. Our goals in this analysis are to see if we can leverage emulation testbeds to produce reliably repeatable experimental networks at scale, identify significant parameters influencing experimental results, replicate the previous results, quantify uncertainty associated with the predictions, and apply multi-fidelity techniques to forecast results to real-world network scales. The I2P networks we study are up to three orders of magnitude larger than the networks studied in SECURE and presented additional challenges to identify significant parameters. The key contributions of this project are the application of SECURE techniques such as UQ to a scenario of interest and scaling the SECURE techniques to larger network sizes. This report describes the experimental methods and results of these studies in more detail. In addition, the process of constructing these large-scale experiments tested the limits of the Firewheel emulation-based technologies. Therefore, another contribution of this work is that it informed the Firewheel developers of scaling limitations, which were subsequently corrected.
Cyber testbeds provide an important mechanism for experimentally evaluating cyber security performance. However, as an experimental discipline, reproducible cyber experimentation is essential to assure valid, unbiased results. Even minor differences in setup, configuration, and testbed components can have an impact on the experiments, and thus, reproducibility of results. This paper documents a case study in reproducing an earlier emulation study, with the reproduced emulation experiment conducted by a different research group on a different testbed. We describe lessons learned as a result of this process, both in terms of the reproducibility of the original study and in terms of the different testbed technologies used by both groups. This paper also addresses the question of how to compare results between two groups' experiments, identifying candidate metrics for comparison and quantifying the results in this reproduction study.
Protecting against multi-step attacks of uncertain duration and timing forces defenders into an indefinite, always ongoing, resource-intensive response. To effectively allocate resources, a defender must be able to analyze multi-step attacks under assumption of constantly allocating resources against an uncertain stream of potentially undetected attacks. To achieve this goal, we present a novel methodology that applies a game-theoretic approach to the attack, attacker, and defender data derived from MITRE´s ATT&CK® Framework. Time to complete attack steps is drawn from a probability distribution determined by attacker and defender strategies and capabilities. This constraints attack success parameters and enables comparing different defender resource allocation strategies. By approximating attacker-defender games as Markov processes, we represent the attacker-defender interaction, estimate the attack success parameters, determine the effects of attacker and defender strategies, and maximize opportunities for defender strategy improvements against an uncertain stream of attacks. This novel representation and analysis of multi-step attacks enables defender policy optimization and resource allocation, which we illustrate using the data from MITRE´ s APT3 ATT&CK® Framework.
Port scanning is a commonly applied technique in the discovery phase of cyber attacks. As such, defending against them has long been the subject of many research and modeling efforts. Though modeling efforts can search large parameter spaces to find effective defensive parameter settings, confidence in modeling results can be hampered by limited or omitted validation efforts. In this paper, we introduce a novel, mathematical model that describes port scanning progress by an attacker and intrusion detection by a defender. The paper further describes a set of emulation experiments that we conducted with a virtual testbed and used to validate the model. Results are presented for two scanning strategies: a slow, stealthy approach and a fast, loud approach. Estimates from the model fall within 95% confidence intervals on the means estimated from the experiments. Consequently, the model's predictive capability provides confidence in its use for evaluation and development of defensive strategies against port scanning.
Securing cyber systems is of paramount importance, but rigorous, evidence-based techniques to support decision makers for high-consequence decisions have been missing. The need for bringing rigor into cybersecurity is well-recognized, but little progress has been made over the last decades. We introduce a new project, SECURE, that aims to bring more rigor into cyber experimentation. The core idea is to follow the footsteps of computational science and engineering and expand similar capabilities to support rigorous cyber experimentation. In this paper, we review the cyber experimentation process, present the research areas that underlie our effort, discuss the underlying research challenges, and report on our progress to date. This paper is based on work in progress, and we expect to have more complete results for the conference.