Publications

Results 1–25 of 42
Skip to search filters

Active Learning in the Era of Big Data

Jamieson, Kevin J.; Davis, Warren L.

Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive) machine learning research. To facilitate the development, testing and deployment of active learning for real applications, we have built an open-source software system for large-scale active learning research and experimentation. The system, called NEXT, provides a unique platform for realworld, reproducible active learning research. This paper details the challenges of building the system and demonstrates its capabilities with several experiments. The results show how experimentation can help expose strengths and weaknesses of active learning algorithms, in sometimes unexpected and enlightening ways.

More Details

Data privacy and security considerations for personal assistantsfor learning (PAL)

International Conference on Intelligent User Interfaces, Proceedings IUI

Raybourn, Elaine M.; Fabian, Nathan D.; Davis, Warren L.; Parks, Raymond C.; McClain, Jonathan T.; Trumbo, Derek T.; Regan, Damon; Durlach, Paula J.

A hypothetical scenario is utilized to explore privacy and security considerations for intelligent systems, such as a Personal Assistant for Learning (PAL). Two categories of potential concerns are addressed: factors facilitated by user models, and factors facilitated by systems. Among the strategies presented for risk mitigation is a call for ongoing, iterative dialog among privacy, security, and personalization researchers during all stages of development, testing, and deployment.

More Details

Development of machine learning models for turbulent wall pressure fluctuations

AIAA SciTech Forum - 55th AIAA Aerospace Sciences Meeting

Ling, Julia L.; Barone, Matthew F.; Davis, Warren L.; Chowdhary, K.; Fike, Jeffrey A.

In many aerospace applications, it is critical to be able to model fluid-structure interactions. In particular, correctly predicting the power spectral density of pressure fluctuations at surfaces can be important for assessing potential resonances and failure modes. Current turbulence modeling methods, such as wall-modeled Large Eddy Simulation and Detached Eddy Simulation, cannot reliably predict these pressure fluctuations for many applications of interest. The focus of this paper is on efforts to use data-driven machine learning methods to learn correction terms for the wall pressure fluctuation spectrum. In particular, the non-locality of the wall pressure fluctuations in a compressible boundary layer is investigated using random forests and neural networks trained and evaluated on Direct Numerical Simulation data.

More Details

Grandmaster: Interactive Text-Based Analytics of Social Media

Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015

Fabian, Nathan D.; Davis, Warren L.; Raybourn, Elaine M.; Lakkaraju, Kiran L.; Whetzel, Jonathan H.

People use social media resources like Twitter, Facebook, forums etc. to shareand discuss various activities or topics. By aggregating topic trends acrossmany individuals using these services, we seek to construct a richer profileof a person's activities and interests as well as provide a broader context ofthose activities. This profile may then be used in a variety of ways tounderstand groups as a collection of interests and affinities and anindividual's participation in those groups. Our approach considers that muchof these data will be unstructured, free-form text. By analyzing free-form text directly, we may be able to gain an implicit grouping ofindividuals with shared interests based on shared conversation, and not onexplicit social software linking them. In this paper, we discuss aproof-of-concept application called Grandmaster built to pull short sections oftext, a person's comments or Twitter posts, together by analysis andvisualization to allow a gestalt understanding of the full collection of allindividuals: how groups are similar and how they differ, based on theirtext inputs.

More Details

Grandmaster: Interactive text-based analytics of social media

Fabian, Nathan D.; Davis, Warren L.; Raybourn, Elaine M.; Lakkaraju, Kiran L.; Whetzel, Jonathan H.

People use social media resources like Twitter, Facebook, forums etc. to share and discuss various activities or topics. By aggregating topic trends across many individuals using these services, we seek to construct a richer profile of a person’s activities and interests as well as provide a broader context of those activities. This profile may then be used in a variety of ways to understand groups as a collection of interests and affinities and an individual’s participation in those groups. Our approach considers that much of these data will be unstructured, free-form text. By analyzing free-form text directly, we may be able to gain an implicit grouping of individuals with shared interests based on shared conversation, and not on explicit social software linking them. In this paper, we discuss a proof-of-concept application called Grandmaster built to pull short sections of text, a person’s comments or Twitter posts, together by analysis and visualization to allow a gestalt understanding of the full collection of all individuals: how groups are similar and how they differ, based on their text inputs.

More Details

Hybrid methods for cybersecurity analysis :

Davis, Warren L.; Dunlavy, Daniel D.

Early 2010 saw a signi cant change in adversarial techniques aimed at network intrusion: a shift from malware delivered via email attachments toward the use of hidden, embedded hyperlinks to initiate sequences of downloads and interactions with web sites and network servers containing malicious software. Enterprise security groups were well poised and experienced in defending the former attacks, but the new types of attacks were larger in number, more challenging to detect, dynamic in nature, and required the development of new technologies and analytic capabilities. The Hybrid LDRD project was aimed at delivering new capabilities in large-scale data modeling and analysis to enterprise security operators and analysts and understanding the challenges of detection and prevention of emerging cybersecurity threats. Leveraging previous LDRD research e orts and capabilities in large-scale relational data analysis, large-scale discrete data analysis and visualization, and streaming data analysis, new modeling and analysis capabilities were quickly brought to bear on the problems in email phishing and spear phishing attacks in the Sandia enterprise security operational groups at the onset of the Hybrid project. As part of this project, a software development and deployment framework was created within the security analyst work ow tool sets to facilitate the delivery and testing of new capabilities as they became available, and machine learning algorithms were developed to address the challenge of dynamic threats. Furthermore, researchers from the Hybrid project were embedded in the security analyst groups for almost a full year, engaged in daily operational activities and routines, creating an atmosphere of trust and collaboration between the researchers and security personnel. The Hybrid project has altered the way that research ideas can be incorporated into the production environments of Sandias enterprise security groups, reducing time to deployment from months and years to hours and days for the application of new modeling and analysis capabilities to emerging threats. The development and deployment framework has been generalized into the Hybrid Framework and incor- porated into several LDRD, WFO, and DOE/CSL projects and proposals. And most importantly, the Hybrid project has provided Sandia security analysts with new, scalable, extensible analytic capabilities that have resulted in alerts not detectable using their previous work ow tool sets.

More Details

Incremental learning for automated knowledge capture

Davis, Warren L.; Dixon, Kevin R.; Martin, Nathaniel M.; Wendt, Jeremy D.

People responding to high-consequence national-security situations need tools to help them make the right decision quickly. The dynamic, time-critical, and ever-changing nature of these situations, especially those involving an adversary, require models of decision support that can dynamically react as a situation unfolds and changes. Automated knowledge capture is a key part of creating individualized models of decision making in many situations because it has been demonstrated as a very robust way to populate computational models of cognition. However, existing automated knowledge capture techniques only populate a knowledge model with data prior to its use, after which the knowledge model is static and unchanging. In contrast, humans, including our national-security adversaries, continually learn, adapt, and create new knowledge as they make decisions and witness their effect. This artificial dichotomy between creation and use exists because the majority of automated knowledge capture techniques are based on traditional batch machine-learning and statistical algorithms. These algorithms are primarily designed to optimize the accuracy of their predictions and only secondarily, if at all, concerned with issues such as speed, memory use, or ability to be incrementally updated. Thus, when new data arrives, batch algorithms used for automated knowledge capture currently require significant recomputation, frequently from scratch, which makes them ill suited for use in dynamic, timecritical, high-consequence decision making environments. In this work we seek to explore and expand upon the capabilities of dynamic, incremental models that can adapt to an ever-changing feature space.

More Details
Results 1–25 of 42
Results 1–25 of 42