Transparent Risks: The Impact of the Specificity and Visual Encoding of Uncertainty on Decision Making
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Computer Graphics and Applications
Although visualizations are a useful tool for helping people to understand information, they can also have unintended effects on human cognition. This is especially true for uncertain information, which is difficult for people to understand. Prior work has found that different methods of visualizing uncertain information can produce different patterns of decision making from users. However, uncertainty can also be represented via text or numerical information, and few studies have systematically compared these types of representations to visualizations of uncertainty. We present two experiments that compared visual representations of risk (icon arrays) to numerical representations (natural frequencies) in a wildfire evacuation task. Like prior studies, we found that different types of visual cues led to different patterns of decision making. In addition, our comparison of visual and numerical representations of risk found that people were more likely to evacuate when they saw visualizations than when they saw numerical representations. These experiments reinforce the idea that design choices are not neutral: seemingly minor differences in how information is represented can have important impacts on human risk perception and decision making.
Abstract not provided.
Abstract not provided.
Software reverse engineering (RE) requires analysts to closely read and make decisions about code. Little is known about what makes an analyst successful, making it difficult to train new analysts or design tools to augment existing ones. The goal of this project was to quantify the eye movement behaviors supporting RE and code comprehension more generally. We applied eye-tracking methods from the language comprehension literature to understand where analysts direct their attention over time when completing tasks (e.g., function identification, bug detection). Across three studies, we manipulated aspects of code hypothesized to impact comprehension (e.g., variable name meaningfulness, code complexity) and presentation methods (e.g., line-by-line, free viewing, gaze-contingent moving window) to understand effects on accuracy and gaze patterns. Results showed clear benefits of meaningful variable names, and effects of expertise on global and line-specific viewing patterns. Findings could inspire empirically-supported tool or analytic adaptations that help to reduce analyst workload.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The goal of this project was test how different representations of state uncertainty impact human decision making. Across a series of experiments, we sought to answer fundamental questions about human cognitive biases and how they are impacted by visual and numerical information. The results of these experiments identify problems and pitfalls to avoid when for presenting algorithmic outputs that include state uncertainty to human decision makers. Our findings also point to important areas for future research that will enable system designers to minimize biases in human interpretation for the outputs of artificial intelligence, machine learning, and other advanced analytic systems.
This research explores novel methods for extracting relevant information from EEG data to characterize individual differences in cognitive processing. Our approach combines expertise in machine learning, statistics, and cognitive science, advancing the state-of-the art in all three domains. Specifically, by using cognitive science expertise to interpret results and inform algorithm development, we have developed a generalizable and interpretable machine learning method that can accurately predict individual differences in cognition. The output of the machine learning method revealed surprising features of the EEG data that, when interpreted by the cognitive science experts, provided novel insights to the underlying cognitive task. Additionally, the outputs of the statistical methods show promise as a principled approach to quickly find regions within the EEG data where individual differences lie, thereby supporting cognitive science analysis and informing machine learning models. This work lays methodological ground work for applying the large body of cognitive science literature on individual differences to high consequence mission applications.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Creation of streaming video stimuli that allow for strict experimental control while providing ease of scene manipulation is difficult to achieve but desired by researchers seeking to approach ecological validity in contexts that involve processing streaming visual information. To that end, we propose leveraging video game modding tools as a method of creating research quality stimuli. As a pilot effort, we used a video game sandbox tool (Garry’s Mod) to create three steaming video scenarios designed to mimic video feeds that physical security personnel might observe. All scenarios required participants to identify the presences of a threat appearing during the video feed. Each scenario differed in level of complexity, in that one scenario required only location monitoring, one required location and action monitoring, and one required location, action, and conjunction monitoring in that when an action was performed it was only considered a threat when performed by a certain character model. While there was no behavioral effect of scenario in terms of accuracy or response times, in all scenarios we found evidence of a P300 when comparing response to threatening stimuli to that of standard stimuli. Results therefore indicate that sufficient levels of experimental control may be achieved to allow for the precise timing required for ERP analysis. Thus, we demonstrate the feasibility of using existing modding tools to create video scenarios amenable to neuroimaging analysis.
Proceedings of the Annual Hawaii International Conference on System Sciences
With machine learning (ML) technologies rapidly expanding to new applications and domains, users are collaborating with artificial intelligence-assisted diagnostic tools to a larger and larger extent. But what impact does ML aid have on cognitive performance, especially when the ML output is not always accurate? Here, we examined the cognitive effects of the presence of simulated ML assistance-including both accurate and inaccurate output-on two tasks (a domain-specific nuclear safeguards task and domain-general visual search task). Patterns of performance varied across the two tasks for both the presence of ML aid as well as the category of ML feedback (e.g., false alarm). These results indicate that differences such as domain could influence users' performance with ML aid, and suggest the need to test the effects of ML output (and associated errors) in the specific context of use, especially when the stimuli of interest are vague or ill-defined.
Abstract not provided.
Cognitive Research: Principles and Implications
Eye tracking is a useful tool for studying human cognition, both in the laboratory and in real-world applications. However, there are cases in which eye tracking is not possible, such as in high-security environments where recording devices cannot be introduced. After facing this challenge in our own work, we sought to test the effectiveness of using artificial foveation as an alternative to eye tracking for studying visual search performance. Two groups of participants completed the same list comparison task, which was a computer-based task designed to mimic an inventory verification process that is commonly performed by international nuclear safeguards inspectors. We manipulated the way in which the items on the inventory list were ordered and color coded. For the eye tracking group, an eye tracker was used to assess the order in which participants viewed the items and the number of fixations per trial in each list condition. For the artificial foveation group, the items were covered with a blurry mask except when participants moused over them. We tracked the order in which participants viewed the items by moving their mouse and the number of items viewed per trial in each list condition. We observed the same overall pattern of performance for the various list display conditions, regardless of the method. However, participants were much slower to complete the task when using artificial foveation and had more variability in their accuracy. Our results indicate that the artificial foveation method can reveal the same pattern of differences across conditions as eye tracking, but it can also impact participants’ task performance.
Abstract not provided.
Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.
Abstract not provided.
In this project, our goal was to develop methods that would allow us to make accurate predictions about individual differences in human cognition. Understanding such differences is important for maximizing human and human-system performance. There is a large body of research on individual differences in the academic literature. Unfortunately, it is often difficult to connect this literature to applied problems, where we must predict how specific people will perform or process information. In an effort to bridge this gap, we set out to answer the question: can we train a model to make predictions about which people understand which languages? We chose language processing as our domain of interest because of the well- characterized differences in neural processing that occur when people are presented with linguistic stimuli that they do or do not understand. Although our original plan to conduct several electroencephalography (EEG) studies was disrupted by the COVID-19 pandemic, we were able to collect data from one EEG study and a series of behavioral experiments in which data were collected online. The results of this project indicate that machine learning tools can make reasonably accurate predictions about an individual?s proficiency in different languages, using EEG data or behavioral data alone.
Due to their recent increases in performance, machine learning and deep learning models are being increasingly adopted across many domains for visual processing tasks. One such domain is international nuclear safeguards, which seeks to verify the peaceful use of commercial nuclear energy across the globe. Despite recent impressive performance results from machine learning and deep learning algorithms, there is always at least some small level of error. Given the significant consequences of international nuclear safeguards conclusions, we sought to characterize how incorrect responses from a machine or deep learning-assisted visual search task would cognitively impact users. We found that not only do some types of model errors have larger negative impacts on human performance than other errors, the scale of those impacts change depending on the accuracy of the model with which they are presented and they persist in scenarios of evenly distributed errors and single-error presentations. Further, we found that experiments conducted using a common visual search dataset from the psychology community has similar implications to a safeguards- relevant dataset of images containing hyperboloid cooling towers when the cooling tower images are presented to expert participants. While novice performance was considerably different (and worse) on the cooling tower task, we saw increased novice reliance on the most challenging cooling tower images compared to experts. These findings are relevant not just to the cognitive science community, but also for developers of machine and deep learning that will be implemented in multiple domains. For safeguards, this research provides key insights into how machine and deep learning projects should be implemented considering their special requirements that information not be missed.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
As the ability to collect and store data grows, so does the need to efficiently analyze that data. As human-machine teams that use machine learning (ML) algorithms as a way to inform human decision-making grow in popularity it becomes increasingly critical to understand the optimal methods of implementing algorithm assisted search. In order to better understand how algorithm confidence values associated with object identification can influence participant accuracy and response times during a visual search task, we compared models that provided appropriate confidence, random confidence, and no confidence, as well as a model biased toward over confidence and a model biased toward under confidence. Results indicate that randomized confidence is likely harmful to performance while non-random confidence values are likely better than no confidence value for maintaining accuracy over time. Providing participants with appropriate confidence values did not seem to benefit performance any more than providing participants with under or over confident models.
Proceedings of the 43rd Annual Meeting of the Cognitive Science Society: Comparative Cognition: Animal Minds, CogSci 2021
Studies of bilingual language processing typically assign participants to groups based on their language proficiency and average across participants in order to compare the two groups. This approach loses much of the nuance and individual differences that could be important for furthering theories of bilingual language comprehension. In this study, we present a novel use of machine learning (ML) to develop a predictive model of language proficiency based on behavioral data collected in a priming task. The model achieved 75% accuracy in predicting which participants were proficient in both Spanish and English. Our results indicate that ML can be a useful tool for characterizing and studying individual differences.
Proceedings of the 43rd Annual Meeting of the Cognitive Science Society: Comparative Cognition: Animal Minds, CogSci 2021
Studies of bilingual language processing typically assign participants to groups based on their language proficiency and average across participants in order to compare the two groups. This approach loses much of the nuance and individual differences that could be important for furthering theories of bilingual language comprehension. In this study, we present a novel use of machine learning (ML) to develop a predictive model of language proficiency based on behavioral data collected in a priming task. The model achieved 75% accuracy in predicting which participants were proficient in both Spanish and English. Our results indicate that ML can be a useful tool for characterizing and studying individual differences.
Translational Issues in Psychological Science
The testing effect refers to the benefits to retention that result from structuring learning activities in the form of a test. As educators consider implementing testenhanced learning paradigms in real classroom environments, we think it is critical to consider how an array of factors affecting test-enhanced learning in laboratory studies bear on test-enhanced learning in real-world classroom environments. This review discusses the degree to which test feedback, test format (of formative tests), number of tests, level of the test questions, timing of tests (relative to initial learning), and retention duration have import for testing effects in ecologically valid contexts (e.g., classroom studies). Attention is also devoted to characteristics of much laboratory testing-effect research that may limit translation to classroom environments, such as the complexity of the material being learned, the value of the testing effect relative to other generative learning activities in classrooms, an educational orientation that favors criterial tests focused on transfer of learning, and online instructional modalities. We consider how student-centric variables present in the classroom (e.g., cognitive abilities, motivation) may have bearing on the effects of testing-effect techniques implemented in the classroom. We conclude that the testing effect is a robust phenomenon that benefits a wide variety of learners in a broad array of learning domains. Still, studies are needed to compare the benefit of testing to other learning strategies, to further characterize how individual differences relate to testing benefits, and to examine whether testing benefits learners at advanced levels.
Abstract not provided.
Personnel Assessment and Decisions
The Sandia Matrices are a free alternative to the Raven’s Progressive Matrices (RPMs). This study offers a psychometric review of Sandia Matrices items focused on two of the most commonly investigated issues regarding the RPMs: (a) dimensionality and (b) sex differences. Model-data fit of three alternative factor structures are compared using confirmatory multidimensional item response theory (IRT) analyses, and measurement equivalence analyses are conducted to evaluate potential sex bias. Although results are somewhat inconclusive regarding factor structure, results do not show evidence of bias or mean differences by sex. Finally, although the Sandia Matrices software can generate infinite items, editing and validating items may be infeasible for many researchers. Further, to aide implementation of the Sandia Matrices, we provide scoring materials for two brief static tests and a computer adaptive test. Implications and suggestions for future research using the Sandia Matrices are discussed.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The analysis of seismic data for evidence of possible nuclear explosion testing is a critical global security mission that relies heavily on human expertise to identify and mark seismic signals embedded in background noise. To assist analysts in making these determinations, we adapted two compression distance metrics for use with seismic data. First, we demonstrated that the Normalized Compression Distance (NCD) metric can be adapted for use with waveform data and can identify the arrival times of seismic signals. Then we tested an approximation for the NCD called Sliding Information Distance (SLID), which can be computed much faster than NCD. We assessed the accuracy of the SLID output by comparing it to both the Akaike Information Criterion (AIC) and the judgments of expert seismic analysts. Our results indicate that SLID effectively identifies arrival times and provides analysts with useful information that can aid their analysis process.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.