Publications Details
Time Series Dimension Reduction for Surrogate Models of Port Scanning Cyber Emulations
Laros, James H.; Swiler, Laura P.; Pinar, Ali P.
Surrogate model development is a key resource in the scientific modeling community for providing computational expedience when simulating complex systems without loss of great fidelity. The initial step to development of a surrogate model is identification of the primary governing components of the system. Principal component analysis (PCA) is a widely used data science technique that provides inspection of such driving factors, when the objective for modeling is to capture the greatest sources of variance inherent to a dataset. Although an efficient linear dimension reduction tool, PCA makes the fundamental assumption that the data is continuous and normally distributed. Thus, it provides ideal performance when these conditions are met. In the case for which cyber emulations provide realizations of a port scanning scenario, the data to be modeled follows a discrete time series function comprised of monotonically increasing piece-wise constant steps. The sources of variance are related to the timing and magnitude of these steps. Therefore, we consider using XPCA, an extension to PCA for continuous and discrete random variates. This report provides the documentation of the trade-offs between the PCA and XPCA linear dimension reduction algorithms, for the intended purpose to identify key components of greatest variance in our time series data. These components will ultimately provide the basis for future surrogate models of port scanning cyber emulations.