Publications Search

Developing an Active Learning algorithm for learning Bayesian classifiers under the Multiple Instance Learning scenario

In the Multiple Instance Learning scenario, the training data consists of instances grouped into bags, and each bag is labelled with whether it is positive, i.e. contains at least one positive instance. First, Active Learning, in which additional labels can be iteratively requested, has the potential to allow more accurate classifiers to be learned with less labels. Active Learning has been applied to the Multiple Instance Learning under two settings: when bag labels of unlabelled bags can be requested, and when instance labels within bags known to be positive can be requested. Second, Bayesian Active learning methods have the potential to learn accurate classifiers with few labels, because they explicitly track the classifier uncertainty and can thus address its knowledge gaps. Yet, there does not exist any Bayesian Active Learning method for the Multiple Instance Learning Scenario. In this work, we develop the first such method. We develop a Bayesian classifier for the Multiple Instance Learning scenario, show how it can be efficiently used for Bayesian Active Learning, and perform experiments assessing its performance. While its performance exceeds that when no Active Learning is used, it is sometimes better, sometimes worse than the naive baseline of uncertainty sampling, depending on the situation. This suggests future work: building more customizable Bayesian Active Learning methods for the Multiple Instance Scenario, customizable to whether bag or instance label accuracy is targeted, and the labeling budget.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Cyber threat modeling and validation: Port scanning and detection

ACM International Conference Proceeding Series

Vugrin, Eric; Cruz, Gerardo J.; Reedy, Christian; Tarman, Thomas D.; Pinar, Ali P.

Port scanning is a commonly applied technique in the discovery phase of cyber attacks. As such, defending against them has long been the subject of many research and modeling efforts. Though modeling efforts can search large parameter spaces to find effective defensive parameter settings, confidence in modeling results can be hampered by limited or omitted validation efforts. In this paper, we introduce a novel, mathematical model that describes port scanning progress by an attacker and intrusion detection by a defender. The paper further describes a set of emulation experiments that we conducted with a virtual testbed and used to validate the model. Results are presented for two scanning strategies: a slow, stealthy approach and a fast, loud approach. Estimates from the model fall within 95% confidence intervals on the means estimated from the experiments. Consequently, the model's predictive capability provides confidence in its use for evaluation and development of defensive strategies against port scanning.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Time Series Dimension Reduction for Surrogate Models of Port Scanning Cyber Emulations

Foulk, James W.; Swiler, Laura P.; Pinar, Ali P.

Surrogate model development is a key resource in the scientific modeling community for providing computational expedience when simulating complex systems without loss of great fidelity. The initial step to development of a surrogate model is identification of the primary governing components of the system. Principal component analysis (PCA) is a widely used data science technique that provides inspection of such driving factors, when the objective for modeling is to capture the greatest sources of variance inherent to a dataset. Although an efficient linear dimension reduction tool, PCA makes the fundamental assumption that the data is continuous and normally distributed. Thus, it provides ideal performance when these conditions are met. In the case for which cyber emulations provide realizations of a port scanning scenario, the data to be modeled follows a discrete time series function comprised of monotonically increasing piece-wise constant steps. The sources of variance are related to the timing and magnitude of these steps. Therefore, we consider using XPCA, an extension to PCA for continuous and discrete random variates. This report provides the documentation of the trade-offs between the PCA and XPCA linear dimension reduction algorithms, for the intended purpose to identify key components of greatest variance in our time series data. These components will ultimately provide the basis for future surrogate models of port scanning cyber emulations.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Cyber Threat Modeling & Validation: Port Scanning & Detection

Vugrin, Eric; Cruz, Gerardo J.; Reedy, Christian; Tarman, Thomas D.; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Brief Announcement: Provable neuromorphic advantages for computing constrained shortest paths

Aimone, James B.; Ho, Yang; Parekh, Ojas D.; Phillips, Cynthia A.; Pinar, Ali P.; Severa, William M.; Wang, Yipu

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Brief Announcement: Provable neuromorphic advantages for computing constrained shortest paths

Aimone, James B.; Ho, Yang; Parekh, Ojas D.; Phillips, Cynthia A.; Pinar, Ali P.; Severa, William M.; Wang, Yipu

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

A scalable graph generation algorithm to sample over a given shell distribution

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Ozkaya, M.Y.; Balin, M.F.; Pinar, Ali P.; Catalyurek, Umit V.

Graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalable graph analysis has been the subject of many research efforts. To enable scalable analytics, many researchers have focused on generating realistic graphs that support controlled experiments for understanding how algorithms perform under changing graph features. Significant progress has been made on scalable graph generation which preserve some important graph properties (e.g., degree distribution, clustering coefficients). In this paper, we study how to sample a graph from the space of graphs with a given shell distribution. Shell distribution is related to the k-core, which is the largest subgraph where each vertex is connected to at least kother vertices. A k-shell is the subset of vertices that are in k-core but not ( k +1)-core, and the shell distribution comprises the sizes of these shells. Core decompositions are widely used to extract information from graphs and to assist other computations. We present a scalable shared and distributed memory graph generator that, given a shell decomposition, generates a random graph that conforms to it. Our extensive experimental results show the efficiency and scalability of our methods. Our algorithm generates 233 vertices and 237 edges in less than 50 seconds on 384 cores.11This work is funded by the Laboratory Directed Research and Development program of Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

Extracting Stable Community Information on Relational Data

Gabert, Kasimir G.; Pinar, Ali P.; Anderson, Dylan Z.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Rigorous Cyber Experimentation for Security of Cyber Physical Systems

Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Extracting Stable Community Information on Relational Data

Gabert, Kasimir G.; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Residual core maximization: An efficient algorithm for maximizing the size of the k-core

Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

Laishram, Ricky; Sariyuce, Ahmet E.; Eliassi-Rad, Tina; Pinar, Ali P.; Soundarajan, Sucheta

In many online social networking platforms, the participation of an individual is motivated by the participation of others. If an individual chooses to leave a platform, this may produce a cascade in which that person’s friends then choose to leave, causing their friends to leave, and so on. In some cases, it may be possible to incentivize key individuals to stay active within the network, thus preventing such a cascade. This problem is modeled using the anchored k-core of a network, which, for a network G and set of anchor nodes A, is the maximal subgraph of G in which every node has a total of at least k neighbors between the subgraph and anchors. In this work, we propose Residual Core Maximization (RCM), a novel algorithm for finding b anchor nodes so that the size of the anchored k-core is maximized. We perform a comprehensive experimental evaluation on numerous real-world networks and compare RCM to various baselines. We observe that RCM is more effective and efficient than the state-of-the-art methods: on average, RCM produces anchored k-cores that are 1.65 times larger than those produced by the baseline algorithm, and is approximately 500 times faster on average.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

SECURE: An Evidence-based Approach to Cyber Experimentation

Proceedings - 2019 Resilience Week, RWS 2019

Pinar, Ali P.; Benz, Zachary O.; Castillo, Andrea; Hart, William E.; Swiler, Laura P.; Tarman, Thomas D.

Securing cyber systems is of paramount importance, but rigorous, evidence-based techniques to support decision makers for high-consequence decisions have been missing. The need for bringing rigor into cybersecurity is well-recognized, but little progress has been made over the last decades. We introduce a new project, SECURE, that aims to bring more rigor into cyber experimentation. The core idea is to follow the footsteps of computational science and engineering and expand similar capabilities to support rigorous cyber experimentation. In this paper, we review the cyber experimentation process, present the research areas that underlie our effort, discuss the underlying research challenges, and report on our progress to date. This paper is based on work in progress, and we expect to have more complete results for the conference.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

Data Analytics are Powerful -- Handle with Care

Wendt, Jeremy; Kegelmeyer, William P.; Pinar, Ali P.; Shead, Timothy M.; Saavedra, Gary; Safta, Cosmin; Bertino, Joseph

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Multiple Instance Learning: Application to Seismic Event Detection and Open Problems

Wang, Fulton; Ray, Jaideep; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

RetSynth: Determining all optimal and sub-optimal synthetic pathways that facilitate synthesis of target compounds in chassis organisms

BMC Bioinformatics

Whitmore, Leanne S.; Nguyen, Bernard; Pinar, Ali P.; George, Anthe G.; Hudson, Corey M.

Background: The efficient biological production of industrially and economically important compounds is a challenging problem. Brute-force determination of the optimal pathways to efficient production of a target chemical in a chassis organism is computationally intractable. Many current methods provide a single solution to this problem, but fail to provide all optimal pathways, optional sub-optimal solutions or hybrid biological/non-biological solutions. Results: Here we present RetSynth, software with a novel algorithm for determining all optimal biological pathways given a starting biological chassis and target chemical. By dynamically selecting constraints, the number of potential pathways scales by the number of fully independent pathways and not by the number of overall reactions or size of the metabolic network. This feature allows all optimal pathways to be determined for a large number of chemicals and for a large corpus of potential chassis organisms. Additionally, this software contains other features including the ability to collect data from metabolic repositories, perform flux balance analysis, and to view optimal pathways identified by our algorithm using a built-in visualization module. This software also identifies sub-optimal pathways and allows incorporation of non-biological chemical reactions, which may be performed after metabolic production of precursor molecules. Conclusions: The novel algorithm designed for RetSynth streamlines an arduous and complex process in metabolic engineering. Our stand-alone software allows the identification of candidate optimal and additional sub-optimal pathways, and provides the user with necessary ranking criteria such as target yield to decide which route to select for target production. Furthermore, the ability to incorporate non-biological reactions into the final steps allows determination of pathways to production for targets that cannot be solely produced biologically. With this comprehensive suite of features RetSynth exceeds any open-source software or webservice currently available for identifying optimal pathways for target production.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI Scopus

Dynamic Programming with Spiking Neural Computing

Aimone, James B.; Parekh, Ojas D.; Phillips, Cynthia A.; Pinar, Ali P.; Severa, William M.; Xu, Helen

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI

Interpretable Sequence Classification via Sparse Multiple Instance Learning

Wang, Fulton; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Dynamic Programming with Spiking Neural Computing

Aimone, James B.; Parekh, Ojas D.; Phillips, Cynthia A.; Pinar, Ali P.; Severa, William M.; Xu, Helen

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI

Multiple Instance Learning

Wang, Fulton; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

SECURE: An Evidence-based Approach to Cybersecurity

Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

SECURE: An Evidence-based Approach to Cybersecurity-- Technical Overview

Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

SECURE Overview

Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Inferring Graph Properties with Limited Access to the Graph

Pinar, Ali P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

An Example of Counter-Adversarial Community Detection Analysis

Kegelmeyer, William P.; Wendt, Jeremy; Pinar, Ali P.

Community detection is often used to understand the nature of a network. However, there may exist an adversarial member of the network who wishes to evade that understanding. We analyze one such specific situation, quantifying the efficacy of certain attacks against a particular analytic use of community detection and providing a preliminary assessment of a possible defense.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Chance-constrained economic dispatch with renewable energy and storage

Computational Optimization and Applications

Safta, Cosmin; Cheng, Jianqiang; Najm, Habib N.; Pinar, Ali P.; Chen, Richard L.Y.; Watson, Jean-Paul

Increasing penetration levels of renewables have transformed how power systems are operated. High levels of uncertainty in production make it increasingly difficulty to guarantee operational feasibility; instead, constraints may only be satisfied with high probability. We present a chance-constrained economic dispatch model that efficiently integrates energy storage and high renewable penetration to satisfy renewable portfolio requirements. Specifically, we require that wind energy contribute at least a prespecified proportion of the total demand and that the scheduled wind energy is deliverable with high probability. We develop an approximate partial sample average approximation (PSAA) framework to enable efficient solution of large-scale chance-constrained economic dispatch problems. Computational experiments on the IEEE-24 bus system show that the proposed PSAA approach is more accurate, closer to the prescribed satisfaction tolerance, and approximately 100 times faster than standard sample average approximation. Finally, the improved efficiency of our PSAA approach enables solution of a larger WECC-240 test system in minutes.

More Details

TYPE Journal Article YEAR 2018

DOI OSTI Scopus

Publications

Search results