Background/Objectives: Children’s biological age does not always correspond to their chronological age. In the case of BMI trajectories, this can appear as phase variation, which can be seen as shift, stretch, or shrinking between trajectories. With maturation thought of as a process moving towards the final state - adult BMI, we assessed whether children can be divided into latent groups reflecting similar maturational age of BMI. The groups were characterised by early factors and time-related features of the trajectories. Subjects/Methods: We used data from two general population birth cohort studies, Northern Finland Birth Cohorts 1966 and 1986 (NFBC1966 and NFBC1986). Height (n = 6329) and weight (n = 6568) measurements were interpolated in 34 shared time points using B-splines, and BMI values were calculated between 3 months to 16 years. Pairwise phase distances of 2999 females and 3163 males were used as a similarity measure in k-medoids clustering. Results: We identified three clusters of trajectories in females and males (Type 1: females, n = 1566, males, n = 1669; Type 2: females, n = 1028, males, n = 973; Type 3: females, n = 405, males, n = 521). Similar distinct timing patterns were identified in males and females. The clusters did not differ by sex, or early growth determinants studied. Conclusions: Trajectory cluster Type 1 reflected to the shape of what is typically illustrated as the childhood BMI trajectory in literature. However, the other two have not been identified previously. Type 2 pattern was more common in the NFBC1966 suggesting a generational shift in BMI maturational patterns.
Detecting changepoints in functional data has become an important problem as interest in monitoring of climate phenomenon has increased, where the data is functional in nature. The observed data often contains both amplitude ((Formula presented.) -axis) and phase ((Formula presented.) -axis) variability. If not accounted for properly, true changepoints may be undetected, and the estimated underlying mean change functions will be incorrect. In this article, an elastic functional changepoint method is developed which properly accounts for these types of variability. The method can detect amplitude and phase changepoints which current methods in the literature do not, as they focus solely on the amplitude changepoint. This method can easily be implemented using the functions directly or can be computed via functional principal component analysis to ease the computational burden. We apply the method and its nonelastic competitors to both simulated data and observed data to show its efficiency in handling data with phase variation with both amplitude and phase changepoints. We use the method to evaluate potential changes in stratospheric temperature due to the eruption of Mt. Pinatubo in the Philippines in June 1991. Using an epidemic changepoint model, we find evidence of a increase in stratospheric temperature during a period that contains the immediate aftermath of Mt. Pinatubo, with most detected changepoints occurring in the tropics as expected.
Dynamic shockless compression experiments provide the ability to explore material behavior at extreme pressures but relatively low temperatures. Typically, the data from these types of experiments are interpreted through an analytic method called Lagrangian analysis. In this work, alternative analysis methods are explored using modern statistical methods. Specifically, Bayesian model calibration is applied to a new set of platinum data shocklessly compressed to 570 GPa. Several platinum equation-of-state models are evaluated, including traditional parametric forms as well as a novel non-parametric model concept. The results are compared to those in Paper I obtained by inverse Lagrangian analysis. The comparisons suggest that Bayesian calibration is not only a viable framework for precise quantification of the compression path, but also reveals insights pertaining to trade-offs surrounding model form selection, sensitivities of the relevant experimental uncertainties, and assumptions and limitations within Lagrangian analysis. The non-parametric model method, in particular, is found to give precise unbiased results and is expected to be useful over a wide range of applications. The calibration results in estimates of the platinum principal isentrope over the full range of experimental pressures to a standard error of 1.6%, which extends the results from Paper I while maintaining the high precision required for the platinum pressure standard.
Herein, the International Commission on Illumination (CIE) designed its color space to be perceptually uniform so that a given numerical change in the color code corresponds to perceived change in color. This color encoding is demonstrated to be advantageous in scientific visualization and analysis of vector fields. The specific application is analysis of ice motion in the Arctic where patterns in smooth monthly-averaged ice motion are seen. Furthermore, fractures occurring in the ice cover result in discontinuities in the ice motion. This vector jump in displacement can also be visualized. We then analyze modeled and observed fractures through the use of a metric on the color space, and image amplitude and phase metrics. Amplitude and phase metrics arise from image registration that is accomplished by sampling images using space filling curves, thus reducing the image registration problem to the more reliable functional alignment problem. We demonstrate this through an exploration of the metrics to compare model runs to an observed ice crack.
7th IEEE Electron Devices Technology and Manufacturing Conference: Strengthen the Global Semiconductor Research Collaboration After the Covid-19 Pandemic, EDTM 2023
This paper presents an assessment of electrical device measurements using functional data analysis (FDA) on a test case of Zener diode devices. We employ three techniques from FDA to quantify the variability in device behavior, primarily due to production lot and demonstrate that this has a significant effect in our data set. We also argue for the expanded use of FDA methods in providing principled, quantitative analysis of electrical device data.
Tucker, J.D.; Martinez, Matthew T.; Laborde, Jose M.
With the recent surge in big data analytics for hyperdimensional data, there is a renewed interest in dimensionality reduction techniques. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked, and metrics are typically chosen without consideration of the underlying geometry of the data. In this paper, we present a method for incorporating elastic metrics into the t-distributed stochastic neighbour embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). We apply our method to functional data, which is uniquely characterized by rotations, parameterization and scale. If these properties are ignored, they can lead to incorrect analysis and poor classification performance. Through our method, we demonstrate improved performance on shape identification tasks for three benchmark data sets (MPEG-7, Car data set and Plane data set of Thankoor), where we achieve 0.77, 0.95 and 1.00 F1 score, respectively.
Event-based sensors are a novel sensing technology which capture the dynamics of a scene via pixel-level change detection. This technology operates with high speed (>10 kHz), low latency (10 µs), low power consumption (<1 W), and high dynamic range (120 dB). Compared to conventional, frame-based architectures that consistently report data for each pixel at a given frame rate, event-based sensor pixels only report data if a change in pixel intensity occurred. This affords the possibility of dramatically reducing the data reported in bandwidth-limited environments (e.g., remote sensing) and thus, the data needed to be processed while still recovering significant events. Degraded visual environments, such as those generated by fog, often hinder situational awareness by decreasing optical resolution and transmission range via random scattering of light. To respond to this challenge, we present the deployment of an event-based sensor in a controlled, experimentally generated, well-characterized degraded visual environment (a fog analogue), for detection of a modulated signal and comparison of data collected from an event-based sensor and from a traditional framing sensor.
Event-based sensors are a novel sensing technology which capture the dynamics of a scene via pixel-level change detection. This technology operates with high speed (>10 kHz), low latency (10 µs), low power consumption (<1 W), and high dynamic range (120 dB). Compared to conventional, frame-based architectures that consistently report data for each pixel at a given frame rate, event-based sensor pixels only report data if a change in pixel intensity occurred. This affords the possibility of dramatically reducing the data reported in bandwidth-limited environments (e.g., remote sensing) and thus, the data needed to be processed while still recovering significant events. Degraded visual environments, such as those generated by fog, often hinder situational awareness by decreasing optical resolution and transmission range via random scattering of light. To respond to this challenge, we present the deployment of an event-based sensor in a controlled, experimentally generated, well-characterized degraded visual environment (a fog analogue), for detection of a modulated signal and comparison of data collected from an event-based sensor and from a traditional framing sensor.
Widespread integration of social media into daily life has fundamentally changed the way society communicates, and, as a result, how individuals develop attitudes, personal philosophies, and worldviews. The excess spread of disinformation and misinformation due to this increased connectedness and streamlined communication has been extensively studied, simulated, and modeled. Less studied is the interaction of many pieces of misinformation, and the resulting formation of attitudes. We develop a framework for the simulation of attitude formation based on exposure to multiple cognitions. We allow a set of cognitions with some implicit relational topology to spread on a social network, which is defined with separate layers to specify online and offline relationships. An individual’s opinion on each cognition is determined by a process inspired by the Ising model for ferromagnetism. We conduct experimentation using this framework to test the effect of topology, connectedness, and social media adoption on the ultimate prevalence of and exposure to certain attitudes.
The purpose of our report is to discuss the notion of entropy and its relationship with statistics. Our goal is to provide a manner in which you can think about entropy, its central role within information theory and relationship with statistics. We review various relationships between information theory and statistics—nearly all are well-known but unfortunately are often not recognized. Entropy quantities the "average amount of surprise" in a random variable and lies at the heart of information theory, which studies the transmission, processing, extraction, and utilization of information. For us, data is information. What is the distinction between information theory and statistics? Information theorists work with probability distributions. Instead, statisticians work with samples. In so many words, information theory using samples is the practice of statistics.
Childhood body mass index (BMI) is a widely used measure of adiposity in children (<18 years of age). Children grow with individual tempo and individuals of the same age, or of the same BMI, might be in different phases in their individual growth curves. Variability between different childhood BMI curves can be separated in two components: phase variability (x-axis; time) and amplitude variability (y-axis; BMI). Phase variability can be thought of arising from differences in maturational age between individuals. This is related to the timing of peaks and valleys in a child’s BMI curve.
Inverse prediction models have commonly been developed to handle scalar data from physical experiments. However, it is not uncommon for data to be collected in functional form. When data are collected in functional form, it must be aggregated to fit the form of traditional methods, which often results in a loss of information. For expensive experiments, this loss of information can be costly. In this study, we introduce the functional inverse prediction (FIP) framework, a general approach which uses the full information in functional response data to provide inverse predictions with probabilistic prediction uncertainties obtained with the bootstrap. The FIP framework is a general methodology that can be modified by practitioners to accommodate many different applications and types of data. We demonstrate the framework, highlighting points of flexibility, with a simulation example and applications to weather data and to nuclear forensics. Results show how functional models can improve the accuracy and precision of predictions.
Within the past half-decade, it has become overwhelmingly clear that suppressing the spread of deliberate false and misleading information is of the utmost importance for protecting democratic institutions. Disinformation has been found to come from both foreign and domestic actors, but the effects from either can be disastrous. From the simple encouragement of unwarranted distrust to conspiracy theories promoting violence, the results of disinformation have put the functionality of American democracy under direct threat. Present scientific challenges posed by this problem include detecting disinformation, quantifying its potential impact, and preventing its amplification. We present a model on which we can experiment with possible strategies toward the third challenge: the prevention of amplification. This is a social contagion network model, which is decomposed into layers to represent physical, ''offline'', interactions as well as virtual interactions on a social media platform. Along with the topological modifications to the standard contagion model, we use state-transition rules designed specifically for disinformation, and distinguish between contagious and non-contagious infected nodes. We use this framework to explore the effect of grassroots social movements on the size of disinformation cascades by simulating these cascades in scenarios where a proportion of the agents remove themselves from the social platform. We also test the efficacy of strategies that could be implemented at the administrative level by the online platform to minimize such spread. These top-down strategies include banning agents who disseminate false information, or providing corrective information to individuals exposed to false information to decrease their probability of believing it. We find an abrupt transition to smaller cascades when a critical number of random agents are removed from the platform, as well as steady decreases in the size of cascades with increasingly more convincing corrective information. Finally, we compare simulated cascades on this framework with real cascades of disinformation recorded on Whatsapp surrounding the 2019 Indian election. We find a set of hyperparameter values that produces a distribution of cascades matching the scaling exponent of the distribution of actual cascades recorded in the dataset. We acknowledge the available future directions for improving the performance of the framework and validation methods, as well as ways to extend the model to capture additional features of social contagion.
Functional data registration is a necessary processing step for many applications. The observed data can be inherently noisy, often due to measurement error or natural process uncertainty; which most functional alignment methods cannot handle. A pair of functions can also have multiple optimal alignment solutions, which is not addressed in current literature. In this paper, a flexible Bayesian approach to functional alignment is presented, which appropriately accounts for noise in the data without any pre-smoothing required. Additionally, by running parallel MCMC chains, the method can account for multiple optimal alignments via the multi-modal posterior distribution of the warping functions. To most efficiently sample the warping functions, the approach relies on a modification of the standard Hamiltonian Monte Carlo to be well-defined on the infinite-dimensional Hilbert space. In this work, this flexible Bayesian alignment method is applied to both simulated data and real data sets to show its efficiency in handling noisy functions and successfully accounting for multiple optimal alignments in the posterior; characterizing the uncertainty surrounding the warping functions.
We propose a new family of depth measures called the elastic depths that can be used to greatly improve shape anomaly detection in functional data. Shape anomalies are functions that have considerably different geometric forms or features from the rest of the data. Identifying them is generally more difficult than identifying magnitude anomalies because shape anomalies are often not distinguishable from the bulk of the data with visualization methods. The proposed elastic depths use the recently developed elastic distances to directly measure the centrality of functions in the amplitude and phase spaces. Measuring shape outlyingness in these spaces provides a rigorous quantification of shape, which gives the elastic depths a strong theoretical and practical advantage over other methods in detecting shape anomalies. A simple boxplot and thresholding method is introduced to identify shape anomalies using the elastic depths. We assess the elastic depth’s detection skill on simulated shape outlier scenarios and compare them against popular shape anomaly detectors. Finally, we use hurricane trajectories to demonstrate the elastic depth methodology on manifold valued functional data.
Compression analytics have gained recent interest for application in malware classification and digital forensics. This interest is due to the fact that compression analytics rely on measured similarity between byte sequences in datasets without requiring prior feature extraction; in other words, these methods are featureless. Being featureless makes compression analytics particularly appealing for computer security applications, where good static features are either unknown or easy to circumvent by adversaries. However, previous classification methods based on compression analytics relied on algorithms that scaled with the size of each labeled class and the number of classes. In this work, we introduce an approach that, in addition to being featureless, can perform fast and accurate inference that is independent of the size of each labeled class. Our method is based on calculating a representative sample, the Fréchet mean, for each labeled class and using it at inference time. We introduce a greedy algorithm for calculating the Fréchet mean and evaluate its utility for classification across a variety of computer security applications, including authorship attribution of source code, file fragment type detection, and malware classification.
Functional variables are often used as predictors in regression problems. A commonly used parametric approach, called scalar-on-function regression, uses the L2 inner product to map functional predictors into scalar responses. This method can perform poorly when predictor functions contain undesired phase variability, causing phases to have disproportionately large influence on the response variable. One past solution has been to perform phase–amplitude separation (as a pre-processing step) and then use only the amplitudes in the regression model. Here we propose a more integrated approach, termed elastic functional regression model (EFRM), where phase-separation is performed inside the regression model, rather than as a pre-processing step. This approach generalizes the notion of phase in functional data, and is based on the norm-preserving time warping of predictors. Due to its invariance properties, this representation provides robustness to predictor phase variability and results in improved predictions of the response variable over traditional models. We demonstrate this framework using a number of datasets involving gait signals, NMR data, and stock market prices.
Measurements performed on a population of electronic devices reveal part-to-part variation due to manufacturing process variation. Corner models are a useful tool for the designers to bound the effect of this variation on circuit performance. To accurately simulate the circuit level behavior, compact model parameters for devices within a circuit must be calibrated to experimental data. However, determination of the bounding data for corner model calibration is difficult, primarily because available tolerance bound calculation methods only consider variability along one dimension and, do not adequately consider the variabilities across both the current and voltage axes. This paper presents the demonstration of a novel functional data analysis approach to generate tolerance bounds on these two types of variability separately and these bounds are then transformed to be used in corner model calibration.
We develop a method for constructing tolerance bounds for functional data with random warping variability. In particular, we define a generative, probabilistic model for the amplitude and phase components of such observations, which parsimoniously characterizes variability in the baseline data. Based on the proposed model, we define two different types of tolerance bounds that are able to measure both types of variability, and as a result, identify when the data has gone beyond the bounds of amplitude and/or phase. The first functional tolerance bounds are computed via a bootstrap procedure on the geometric space of amplitude and phase functions. The second functional tolerance bounds utilize functional Principal Component Analysis to construct a tolerance factor. This work is motivated by two main applications: process control and disease monitoring. The problem of statistical analysis and modeling of functional data in process control is important in determining when a production has moved beyond a baseline. Similarly, in biomedical applications, doctors use long, approximately periodic signals (such as the electrocardiogram) to diagnose and monitor diseases. In this context, it is desirable to identify abnormalities in these signals. We additionally consider a simulated example to assess our approach and compare it to two existing methods.
Functional data are fast becoming a preeminent source of information across a wide range of industries. A particularly challenging aspect of functional data is bounding uncertainty. In this unique case study, we present our attempts at creating bounding functions for selected applications at Sandia National Laboratories (SNL). The first attempt involved a simple extension of functional principal component analysis (fPCA) to incorporate covariates. Though this method was straightforward, the extension was plagued by poor coverage accuracy for the bounding curve. This led to a second attempt utilizing elastic methodology which yielded more accurate coverage at the cost of more complexity.