Vulnerability analysts protecting software lack adequate tools for understanding data flow in binaries. We present a case study in which we used human factors methods to develop a taxonomy for understanding data flow and the visual representations needed to support decision making for binary vulnerability analysis. Using an iterative process, we refined and evaluated the taxonomy by generating three different data flow visualizations for small binaries, trained an analyst to use these visualizations, and tested the utility of the visualizations for answering data flow questions. Throughout the process and with minimal training, analysts were able to use the visualizations to understand data flow related to security assessment. Our results indicate that the data flow taxonomy is promising as a mechanism for improving analyst understanding of data flow in binaries and for supporting efficient decision making during analysis.
Recently, an approach for determining the value of a visualization was proposed, one moving beyond simple measurements of task accuracy and speed. The value equation contains components for the time savings a visualization provides, the insights and insightful questions it spurs, the overall essence of the data it conveys, and the confidence about the data and its domain it inspires. This articulation of value is purely descriptive, however, providing no actionable method of assessing a visualization's value. In this work, we create a heuristic-based evaluation methodology to accompany the value equation for assessing interactive visualizations. We refer to the methodology colloquially as ICE-T, based on an anagram of the four value components. Our approach breaks the four components down into guidelines, each of which is made up of a small set of low-level heuristics. Evaluators who have knowledge of visualization design principles then assess the visualization with respect to the heuristics. We conducted an initial trial of the methodology on three interactive visualizations of the same data set, each evaluated by 15 visualization experts. We found that the methodology showed promise, obtaining consistent ratings across the three visualizations and mirroring judgments of the utility of the visualizations by instructors of the course in which they were developed.
National security missions require understanding third-party software binaries, a key element of which is reasoning about how data flows through a program. However, vulnerability analysts protecting software lack adequate tools for understanding data flow in binaries. To reduce the human time burden for these analysts, we used human factors methods in a rolling discovery process to derive user-centric visual representation requirements. We encountered three main challenges: analysis projects span weeks, analysis goals significantly affect approaches and required knowledge, and analyst tools, techniques, conventions, and prioritization are based on personal preference. To address these challenges, we initially focused our human factors methods on an attack surface characterization task. We generalized our results using a two-stage modified sorting task, creating requirements for a data flow visualization. We implemented these requirements partially in manual static visualizations, which we informally evaluated, and partially in automatically generated interactive visualizations, which have yet to be integrated into workflows for evaluation. Our observations and results indicate that 1) this data flow visualization has the potential to enable novel code navigation, information presentation, and information sharing, and 2) it is an excellent time to pursue research applying human factors methods to binary analysis workflows.
Evaluating the effectiveness of data visualizations is a challenging undertaking and often relies on one-off studies that test a visualization in the context of one specific task. Researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles that could be applied to assessing the effectiveness of data visualizations in a more rapid and generalizable manner. One possibility for such a tool is a model of visual saliency for data visualizations. Visual saliency models are typically based on the properties of the human visual cortex and predict which areas of a scene have visual features (e.g. color, luminance, edges) that are likely to draw a viewer's attention. While these models can accurately predict where viewers will look in a natural scene, they typically do not perform well for abstract data visualizations. In this paper, we discuss the reasons for the poor performance of existing saliency models when applied to data visualizations. We introduce the Data Visualization Saliency (DVS) model, a saliency model tailored to address some of these weaknesses, and we test the performance of the DVS model and existing saliency models by comparing the saliency maps produced by the models to eye tracking data obtained from human viewers. Finally, we describe how modified saliency models could be used as general tools for assessing the effectiveness of visualizations, including the strengths and weaknesses of this approach.
This project was inspired by two needs. The first is a need for tools to help scientists and engineers to design effective data visualizations for communicating information, whether to the user of a system, an analyst who must make decisions based on complex data, or in the context of a technical report or publication. Most scientists and engineers are not trained in visualization design, and they could benefit from simple metrics to assess how well their visualization's design conveys the intended message. In other words, will the most important information draw the viewer's attention? The second is the need for cognition-based metrics for evaluating new types of visualizations created by researchers in the information visualization and visual analytics communities. Evaluating visualizations is difficult even for experts. However, all visualization methods and techniques are intended to exploit the properties of the human visual system to convey information efficiently to a viewer. Thus, developing evaluation methods that are rooted in the scientific knowledge of the human visual system could be a useful approach. In this project, we conducted fundamental research on how humans make sense of abstract data visualizations, and how this process is influenced by their goals and prior experience. We then used that research to develop a new model, the Data Visualization Saliency Model, that can make accurate predictions about which features in an abstract visualization will draw a viewer's attention. The model is an evaluation tool that can address both of the needs described above, supporting both visualization research and Sandia mission needs.
Evaluating the effectiveness of data visualizations is a challenging undertaking and often relies on one-off studies that test a visualization in the context of one specific task. Researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles that could be applied to assessing the effectiveness of data visualizations in a more rapid and generalizable manner. One possibility for such a tool is a model of visual saliency for data visualizations. Visual saliency models are typically based on the properties of the human visual cortex and predict which areas of a scene have visual features (e.g. color, luminance, edges) that are likely to draw a viewer's attention. While these models can accurately predict where viewers will look in a natural scene, they typically do not perform well for abstract data visualizations. In this paper, we discuss the reasons for the poor performance of existing saliency models when applied to data visualizations. We introduce the Data Visualization Saliency (DVS) model, a saliency model tailored to address some of these weaknesses, and we test the performance of the DVS model and existing saliency models by comparing the saliency maps produced by the models to eye tracking data obtained from human viewers. In conclusion, we describe how modified saliency models could be used as general tools for assessing the effectiveness of visualizations, including the strengths and weaknesses of this approach.
Data visualizations are used to communicate information to people in a wide variety of contexts, but few tools are available to help visualization designers evaluate the effectiveness of their designs. Visual saliency maps that predict which regions of an image are likely to draw the viewer’s attention could be a useful evaluation tool, but existing models of visual saliency often make poor predictions for abstract data visualizations. These models do not take into account the importance of features like text in visualizations, which may lead to inaccurate saliency maps. In this paper we use data from two eye tracking experiments to investigate attention to text in data visualizations. The data sets were collected under two different task conditions: a memory task and a free viewing task. Across both tasks, the text elements in the visualizations consistently drew attention, especially during early stages of viewing. These findings highlight the need to incorporate additional features into saliency models that will be applied to visualizations.
There is a great deal of debate concerning the benefits of working memory (WM) training and whether that training can transfer to other tasks. Although a consistent finding is that WM training programs elicit a short-term near-transfer effect (i.e., improvement in WM skills), results are inconsistent when considering persistence of such improvement and far transfer effects. In this study, we compared three groups of participants: a group that received WM training, a group that received training on how to use a mental imagery memory strategy, and a control group that received no training. Although the WM training group improved on the trained task, their posttraining performance on nontrained WM tasks did not differ from that of the other two groups. In addition, although the imagery training group’s performance on a recognition memory task increased after training, the WM training group’s performance on the task decreased after training. Participants’ descriptions of the strategies they used to remember the studied items indicated that WM training may lead people to adopt memory strategies that are less effective for other types of memory tasks. These results indicate that WM training may have unintended consequences for other types of memory performance.
In this paper, we argue that information theoretic measures may provide a robust, broadly applicable, repeatable metric to assess how a system enables people to reduce high-dimensional data into topically relevant subsets of information. Explosive growth in electronic data necessitates the development of systems that balance automation with human cognitive engagement to facilitate pattern discovery, analysis and characterization, variously described as "cognitive augmentation" or "insight generation." However, operationalizing the concept of insight in any measurable way remains a difficult challenge for visualization researchers. The "golden ticket" of insight evaluation would be a precise, generalizable, repeatable, and ecologically valid metric that indicates the relative utility of a system in heightening cognitive performance or facilitating insights. Unfortunately, the golden ticket does not yet exist. In its place, we are exploring information theoretic measures derived from Shannon's ideas about information and entropy as a starting point for precise, repeatable, and generalizable approaches for evaluating analytic tools. We are specifically concerned with needle-in-haystack workflows that require interactive search, classification, and reduction of very large heterogeneous datasets into manageable, task-relevant subsets of information. We assert that systems aimed at facilitating pattern discovery, characterization and analysis - i.e., "insight" - must afford an efficient means of sorting the needles from the chaff; and simple compressibility measures provide a way of tracking changes in information content as people shape meaning from data.
The transformation of the distribution grid from a centralized to decentralized architecture, with bi-directional power and data flows, is made possible by a surge in network intelligence and grid automation. While changes are largely beneficial, the interface between grid operator and automated technologies is not well understood, nor are the benefits and risks of automation. Quantifying and understanding the latter is an important facet of grid resilience that needs to be fully investigated. The work described in this document represents the first empirical study aimed at identifying and mitigating the vulnerabilities posed by automation for a grid that for the foreseeable future will remain a human-in-the-loop critical infrastructure. Our scenario-based methodology enabled us to conduct a series of experimental studies to identify causal relationships between grid-operator performance and automated technologies and to collect measurements of human performance as a function of automation. Our findings, though preliminary, suggest there are predictive patterns in the interplay between human operators and automation, patterns that can inform the rollout of distribution automation and the hiring and training of operators, and contribute in multiple and significant ways to the field of grid resilience.
This summary of PANTHER Human Analytics work describes three of the team's major work activities: research with teams to elicit and document work practices; experimental studies of visual search performance and visual attention; and the application of spatio-temporal algorithms to the analysis of eye tracking data. Our intent is to provide basic introduction to the work area and a selected set of representative HA team publications as a starting point for readers interested our team's work.
A critical challenge in data science is conveying the meaning of data to human decision makers. While working with visualizations, decision makers are engaged in a visual search for information to support their reasoning process. As sensors proliferate and high performance computing becomes increasingly accessible, the volume of data decision makers must contend with is growing continuously and driving the need for more efficient and effective data visualizations. Consequently, researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles to assess the effectiveness of data visualizations. In this paper, we compare the performance of three different saliency models across a common set of data visualizations. This comparison establishes a performance baseline for assessment of new data visualization saliency models.
Inferring the cognitive state of an individual in real time during task performance allows for implementation of corrective measures prior to the occurrence of an error. Current technology allows for real time cognitive state assessment based on objective physiological data though techniques such as neuroimaging and eye tracking. Although early results indicate effective construction of classifiers that distinguish between cognitive states in real time is a possibility in some settings, implementation of these classifiers into real world settings poses a number of challenges. Cognitive states of interest must be sufficiently distinct to allow for continuous discrimination in the operational environment using technology that is currently available as well as practical to implement.
‘Big data’ is a phrase that has gained much traction recently. It has been defined as ‘a broad term for data sets so large or complex that traditional data processing applications are inadequate and there are challenges with analysis, searching and visualization’ [1]. Many domains struggle with providing experts accurate visualizations of massive data sets so that the experts can understand and make decisions about the data e.g., [2, 3, 4, 5]. Abductive reasoning is the process of forming a conclusion that best explains observed facts and this type of reasoning plays an important role in process and product engineering. Throughout a production lifecycle, engineers will test subsystems for critical functions and use the test results to diagnose and improve production processes. This paper describes a value-driven evaluation study [7] for expert analyst interactions with big data for a complex visual abductive reasoning task. Participants were asked to perform different tasks using a new tool, while eye tracking data of their interactions with the tool was collected. The participants were also asked to give their feedback and assessments regarding the usability of the tool. The results showed that the interactive nature of the new tool allowed the participants to gain new insights into their data sets, and all participants indicated that they would begin using the tool in its current state.
A critical challenge in data science is conveying the meaning of data to human decision makers. While working with visualizations, decision makers are engaged in a visual search for information to support their reasoning process. As sensors proliferate and high performance computing becomes increasingly accessible, the volume of data decision makers must contend with is growing continuously and driving the need for more efficient and effective data visualizations. Consequently, researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles to assess the effectiveness of data visualizations. In this paper, we compare the performance of three different saliency models across a common set of data visualizations. This comparison establishes a performance baseline for assessment of new data visualization saliency models.
In this study, eye tracking metrics and visual saliency maps were used to assess analysts' interactions with synthetic aperture radar (SAR) imagery. Participants with varying levels of experience with SAR imagery completed a target detection task while their eye movements and behavioral responses were recorded. The resulting gaze maps were compared with maps of bottom-up visual saliency and with maps of automatically detected image features The results showed striking differences between professional SAR analysis and novices in terms of how their visual search patterns related to the visual saliency of features in the imagery. They also revealed patterns that reflect the utility of various features in the images for the professional analysts These findings have implications for system design andfor the design and use of automatic feature classification algorithms.
Researchers at Sandia National Laboratories are integrating qualitative and quantitative methods from anthropology, human factors and cognitive psychology in the study of military and civilian intelligence analyst workflows in the United States’ national security community. Researchers who study human work processes often use qualitative theory and methods, including grounded theory, cognitive work analysis, and ethnography, to generate rich descriptive models of human behavior in context. In contrast, experimental psychologists typically do not receive training in qualitative induction, nor are they likely to practice ethnographic methods in their work, since experimental psychology tends to emphasize generalizability and quantitative hypothesis testing over qualitative description. However, qualitative frameworks and methods from anthropology, sociology, and human factors can play an important role in enhancing the ecological validity of experimental research designs.
Numerous domains, ranging from medical diagnostics to intelligence analysis, involve visual search tasks in which people must find and identify specific items within large sets of imagery. These tasks rely heavily on human judgment, making fully automated systems infeasible in many cases. Researchers have investigated methods for combining human judgment with computational processing to increase the speed at which humans can triage large image sets. One such method is rapid serial visual presentation (RSVP), in which images are presented in rapid succession to a human viewer. While viewing the images and looking for targets of interest, the participant’s brain activity is recorded using electroencephalography (EEG). The EEG signals can be time-locked to the presentation of each image, producing event-related potentials (ERPs) that provide information about the brain’s response to those stimuli. The participants’ judgments about whether or not each set of images contained a target and the ERPs elicited by target and non-target images are used to identify subsets of images that merit close expert scrutiny [1]. Although the RSVP/EEG paradigm holds promise for helping professional visual searchers to triage imagery rapidly, it may be limited by the nature of the target items. Targets that do not vary a great deal in appearance are likely to elicit useable ERPs, but more variable targets may not. In the present study, we sought to extend the RSVP/EEG paradigm to the domain of aviation security screening, and in doing so to explore the limitations of the technique for different types of targets. Professional Transportation Security Officers (TSOs) viewed bag X-rays that were presented using an RSVP paradigm. The TSOs viewed bursts of images containing 50 segments of bag X-rays that were presented for 100 ms each. Following each burst of images, the TSOs indicated whether or not they thought there was a threat item in any of the images in that set. EEG was recorded during each burst of images and ERPs were calculated by time-locking the EEG signal to the presentation of images containing threats and matched images that were identical except for the presence of the threat item. Half of the threat items had a prototypical appearance and half did not. We found that the bag images containing threat items with a prototypical appearance reliably elicited a P300 ERP component, while those without a prototypical appearance did not. These findings have implications for the application of the RSVP/EEG technique to real-world visual search domains.
The potential for bias to affect the results of knowledge elicitation studies is well recognized. Researchers and knowledge engineers attempt to control for bias through careful selection of elicitation and analysis methods. Recently, the development of a wide range of physiological sensors, coupled with fast, portable and inexpensive computing platforms, has added an additional dimension of objective measurement that can reduce bias effects. In the case of an abductive reasoning task, bias can be introduced through design of the stimuli, cues from researchers, or omissions by the experts. We describe a knowledge elicitation methodology robust to various sources of bias, incorporating objective and cross-referenced measurements. The methodology was applied in a study of engineers who use multivariate time series data to diagnose mance of devices throughout the production lifecycle. For visual reasoning tasks, eye tracking is particularly effective at controlling for biases of omission by providing a record of the subject’s attention allocation.
Visual search data describe people’s performance on the common perceptual problem of identifying target objects in a complex scene. Technological advances in areas such as eye tracking now provide researchers with a wealth of data not previously available. The goal of this work is to support researchers in analyzing this complex and multimodal data and in developing new insights into visual search techniques. We discuss several methods drawn from the statistics and machine learning literature for integrating visual search data derived from multiple sources and performing exploratory data analysis. We ground our discussion in a specific task performed by officers at the Transportation Security Administration and consider the applicability, likely issues, and possible adaptations of several candidate analysis methods.
The impact of automation on human performance has been studied by human factors researchers for over 35 years. One unresolved facet of this research is measurement of the level of automation across and within engineered systems. Repeatable methods of observing, measuring and documenting the level of automation are critical to the creation and validation of generalized theories of automation's impact on the reliability and resilience of human-in-the-loop systems. Numerous qualitative scales for measuring automation have been proposed. However these methods require subjective assessments based on the researcher's knowledge and experience, or through expert knowledge elicitation involving highly experienced individuals from each work domain. More recently, quantitative scales have been proposed, but have yet to be widely adopted, likely due to the difficulty associated with obtaining a sufficient number of empirical measurements from each system component. Our research suggests the need for a quantitative method that enables rapid measurement of a system's level of automation, is applicable across domains, and can be used by human factors practitioners in field studies or by system engineers as part of their technical planning processes. In this paper we present our research methodology and early research results from studies of electricity grid distribution control rooms. Using a system analysis approach based on quantitative measures of level of automation, we provide an illustrative analysis of select grid modernization efforts. This measure of the level of automation can be displayed as either a static, historical view of the system's automation dynamics (the dynamic interplay between human and automation required to maintain system performance) or it can be incorporated into real-time visualization systems already present in control rooms.
Electric distribution utilities, the companies that feed electricity to end users, are overseeing a technological transformation of their networks, installing sensors and other automated equipment, that are fundamentally changing the way the grid operates. These grid modernization efforts will allow utilities to incorporate some of the newer technology available to the home user – such as solar panels and electric cars – which will result in a bi-directional flow of energy and information. How will this new flow of information affect control room operations? How will the increased automation associated with smart grid technologies influence control room operators’ decisions? And how will changes in control room operations and operator decision making impact grid resilience? These questions have not been thoroughly studied, despite the enormous changes that are taking place. In this study, which involved collaborating with utility companies in the state of Vermont, the authors proposed to advance the science of control-room decision making by understanding the impact of distribution grid modernization on operator performance. Distribution control room operators were interviewed to understand daily tasks and decisions and to gain an understanding of how these impending changes will impact control room operations. Situation awareness was found to be a major contributor to successful control room operations. However, the impact of growing levels of automation due to smart grid technology on operators’ situation awareness is not well understood. Future work includes performing a naturalistic field study in which operator situation awareness will be measured in real-time during normal operations and correlated with the technological changes that are underway. The results of this future study will inform tools and strategies that will help system operators adapt to a changing grid, respond to critical incidents and maintain critical performance skills.
Tensor (multiway array) factorization and decomposition offers unique advantages for activity characterization in spatio-temporal datasets because these methods are compatible with sparse matrices and maintain multiway structure that is otherwise lost in collapsing for regular matrix factorization. This report describes our research as part of the PANTHER LDRD Grand Challenge to develop a foundational basis of mathematical techniques and visualizations that enable unsophisticated users (e.g. users who are not steeped in the mathematical details of matrix algebra and mulitway computations) to discover hidden patterns in large spatiotemporal data sets.
This report summarizes research conducted through the Sandia National Laboratories Robust Automated Knowledge Capture Laboratory Directed Research and Development project. The objective of this project was to advance scientific understanding of the influence of individual cognitive attributes on decision making. The project has developed a quantitative model known as RumRunner that has proven effective in predicting the propensity of an individual to shift strategies on the basis of task and experience related parameters. Three separate studies are described which have validated the basic RumRunner model. This work provides a basis for better understanding human decision making in high consequent national security applications, and in particular, the individual characteristics that underlie adaptive thinking.