International Journal of Networked and Distributed Computing
Shrestha, Madhukar; Kim, Yonghyun; Oh, Jeehyun; Rhee, Junghwan (John); Choe, Yung R.; Zuo, Fei; Park, Myungah; Qian, Gang
System provenance forensic analysis has been studied by a large body of research work. This area needs fine granularity data such as system calls along with event fields to track the dependencies of events. While prior work on security datasets has been proposed, we found a useful dataset of realistic attacks and details that are needed for high-quality provenance tracking is lacking. We created a new dataset of eleven vulnerable cases for system forensic analysis. It includes the full details of system calls including syscall parameters. Realistic attack scenarios with real software vulnerabilities and exploits are used. For each case, we created two sets of benign and adversary scenarios which are manually labeled for supervised machine-learning analysis. In addition, we present an algorithm to improve the data quality in the system provenance forensic analysis. We demonstrate the details of the dataset events and dependency analysis of our dataset cases.
Proceedings - 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications, SERA 2023
Shrestha, Madhukar; Kim, Yonghyun; Oh, Jeehyun; Rhee, Junghwan; Choe, Yung R.; Zuo, Fei; Park, Myungah; Qian, Gang
System provenance forensic analysis has been studied by a large body of research work. This area needs fine granularity data such as system calls along with event fields to track the dependencies of events. While prior work on security datasets has been proposed, we found a useful dataset of realistic attacks and details that can be used for provenance tracking is lacking. We created a new dataset of eleven vulnerable cases for system forensic analysis. It includes the full details of system calls including syscall parameters. Realistic attack scenarios with real software vulnerabilities and exploits are used. Also, we created two sets of benign and adversary scenarios which are manually labeled for supervised machine-learning analysis. We demonstrate the details of the dataset events and dependency analysis.
The recent paradigm shift introduced by the Internet of Things (IoT) has brought embedded systems into focus as a target for both security analysts and malicious adversaries. Typified by their lack of standardized hardware, diverse software, and opaque functionality, IoT devices present unique challenges to security analysts due to the tight coupling between their firmware and the hardware for which it was designed. In order to take advantage of modern program analysis techniques, such as fuzzing or symbolic execution, with any kind of scale or depth, analysts must have the ability to execute firmware code in emulated (or virtualized) environments. However, these emulation environments are rarely available and are cumbersome to create through manual reverse engineering, greatly limiting the analysis of binary firmware. In this work, we explore the problem of firmware re-hosting, the process by which firmware is migrated from its original hardware environment into a virtualized one. We show that an approach capable of creating virtual, interactive environments in an automated manner is a necessity to enable firmware analysis at scale. We present the first proof-of-concept system aiming to achieve this goal, called PRETENDER, which uses observations of the interactions between the original hardware and the firmware to automatically create models of peripherals, and allows for the execution of the firmware in a fully-emulated environment. Unlike previous approaches, these models are interactive, stateful, and transferable, meaning they are designed to allow the program to receive and process new input, a requirement of many analyses. We demonstrate our approach on multiple hardware platforms and firmware samples, and show that the models are flexible enough to allow for virtualized code execution, the exploration of new code paths, and the identification of security vulnerabilities.
Outlier detection has been shown to be a promising machine learning technique for a diverse array of felds and problem areas. However, traditional, supervised outlier detection is not well suited for problems such as network intrusion detection, where proper labelled data is scarce. This has created a focus on extending these approaches to be unsupervised, removing the need for explicit labels, but at a cost of poorer performance compared to their supervised counterparts. Recent work has explored ways of making up for this, such as creating ensembles of diverse models, or even diverse learning algorithms, to jointly classify data. While using unsupervised, heterogeneous ensembles of learning algorithms has been proposed as a viable next step for research, the implications of how these ensembles are built and used has not been explored.
Outlier detection has been shown to be a promising machine learning technique for a diverse array of felds and problem areas. However, traditional, supervised outlier detection is not well suited for problems such as network intrusion detection, where proper labelled data is scarce. This has created a focus on extending these approaches to be unsupervised, removing the need for explicit labels, but at a cost of poorer performance compared to their supervised counterparts. Recent work has explored ways of making up for this, such as creating ensembles of diverse models, or even diverse learning algorithms, to jointly classify data. While using unsupervised, heterogeneous ensembles of learning algorithms has been proposed as a viable next step for research, the implications of how these ensembles are built and used has not been explored.
In the past decade, we have come to rely on computers for various safety and security-critical tasks, such as securing our homes, operating our vehicles, and controlling our finances. To facilitate these tasks, chip manufacturers have begun including trusted execution environments (TEEs) in their processors, which enable critical code (e.g., cryptographic functions) to run in an isolated hardware environment that is protected from the traditional operating system (OS) and its applications. While code in the untrusted environment (e.g., Android or Linux) is forbidden from accessing any memory or state within the TEE, the code running in the TEE, by design, has unrestricted access to the memory of the untrusted OS and its applications. However, due to the isolation between these two environments, the TEE has very limited visibility into the untrusted environment’s security mechanisms (e.g., kernel vs. application memory). In this paper, we introduce BOOMERANG, a class of vulnerabilities that arises due to this semantic separation between the TEE and the untrusted environment. These vulnerabilities permit untrusted user-level applications to read and write any memory location in the untrusted environment, including security-sensitive kernel memory, by leveraging the TEE’s privileged position to perform the operations on its behalf. BOOMERANG can be used to steal sensitive data from other applications, bypass security checks, or even gain full control of the untrusted OS. To quantify the extent of this vulnerability, we developed an automated framework for detecting BOOMERANG bugs within the TEEs of popular mobile phones. Using this framework, we were able to confirm the existence of BOOMERANG on four different TEE platforms, affecting hundreds of millions of devices on the market today. Moreover, we confirmed that, in at least two instances, BOOMERANG could be leveraged to completely compromise the untrusted OS (i.e., Android). While the implications of these vulnerabilities are severe, defenses can be quickly implemented by vendors, and we are currently in contact with the affected TEE vendors to deploy adequate fixes. To this end, we evaluated the two most promising defense proposals and their inherent trade-offs. This analysis led the proposal of a novel BOOMERANG defense, addressing the major shortcomings of the existing defenses with minimal performance overhead. Our findings have been reported to and verified by the corresponding vendors, who are currently in the process of creating security patches.
The Center for Strategic and International Studies estimates the annual cost from cyber crime to be more than $400 billion. Most notable is the recent digital identity thefts that compromised millions of accounts. These attacks emphasize the security problems of using clonable static information. One possible solution is the use of a physical device known as a Physically Unclonable Function (PUF). PUFs can be used to create encryption keys, generate random numbers, or authenticate devices. While the concept shows promise, current PUF implementations are inherently problematic: inconsistent behavior, expensive, susceptible to modeling attacks, and permanent. Therefore, we propose a new solution by which an unclonable, dynamic digital identity is created between two communication endpoints such as mobile devices. This Physically Unclonable Digital ID (PUDID) is created by injecting a data scrambling PUF device at the data origin point that corresponds to a unique and matching descrambler/hardware authentication at the receiving end. This device is designed using macroscopic, intentional anomalies, making them inexpensive to produce. PUDID is resistant to cryptanalysis due to the separation of the challenge response pair and a series of hash functions. PUDID is also unique in that by combining the PUF device identity with a dynamic human identity, we can create true two-factor authentication. We also propose an alternative solution that eliminates the need for a PUF mechanism altogether by combining tamper resistant capabilities with a series of hash functions. This tamper resistant device, referred to as a Quasi-PUDID (Q-PUDID), modifies input data, using a black-box mechanism, in an unpredictable way. By mimicking PUF attributes, Q-PUDID is able to avoid traditional PUF challenges thereby providing high-performing physical identity assurance with or without a low performing PUF mechanism. Three different application scenarios with mobile devices for PUDID and Q-PUDID have been analyzed to show their unique advantages over traditional PUFs and outline the potential for placement in a host of applications.
Developers and security analysts have been using static analysis for a long time to analyze programs for defects and vulnerabilities. Generally a static analysis tool is run on the source code for a given program, flagging areas of code that need to be further inspected by a human analyst. These tools tend to work fairly well – every year they find many important bugs. These tools are more impressive considering the fact that they only examine the source code, which may be very complex. Now consider the amount of data available that these tools do not analyze. There are many additional pieces of information available that would prove useful for finding bugs in code, such as a history of bug reports, a history of all changes to the code, information about committers, etc. By leveraging all this additional data, it is possible to find more bugs with less user interaction, as well as track useful metrics such as number and type of defects injected by committer. This paper provides a method for leveraging development metadata to find bugs that would otherwise be difficult to find using standard static analysis tools. We showcase two case studies that demonstrate the ability to find new vulnerabilities in large and small software projects by finding new vulnerabilities in the cpython and Roundup open source projects.