miniTri a Data Analytics miniApp
MTGL: MultiThreaded Graph Library
PHISH: Parallel Harness for Informatic Stream Hashing
PYOMO: Python Optimization Modeling Objects
Pyomo is a Python-based open-source software package that supports a diverse set of capabilities for formulating, solving, and analyzing optimization models.
A core capability of Pyomo is modeling structured optimization applications. The Pyomo software package can be used to define general symbolic problems, create specific problem instances, and solve these instances using standard commercial and open-source solvers. Pyomo's modeling objects are embedded within a full-featured high-level programming language with a rich set of supporting libraries that distinguishes it from other algebraic modeling languages such as AMPL, AIMMS and GAMS.
Pyomo supports a wide range of problem types, including:
- Linear programming
- Quadratic programming
- Nonlinear programming
- Mixed-integer linear programming
- Mixed-integer quadratic programming
- Mixed-integer nonlinear programming
- Mixed-integer stochastic programming
- Generalized disjunctive programming
- Differential algebraic equations
- Bilevel programming
- Mathematical programming with equilibrium constraints
For further information and to start using Pyomo please visit http://www.pyomo.org/
miniTri is a triangle enumeration-based data analytics miniApp that mimics the computation requirements of an important set of data science applications, not well represented by traditional graph search benchmarks such as Graph500. In particular, the computation in miniTri is similar to computation used in dense subgraph detection, graph characterization and generation, and community detection. Related application areas include cyber security, intelligence, and functional biology. The intent is for miniTri to serve as a proxy for evaluating the performance of these applications on current and future architectures.
In a nutshell, miniTri identifies all triangles for a given graph and computes a statistic for each of these triangles (an upper bound on the max clique size for the triangle). Over 25 variants of miniTri have been developed, including different algorithmic approaches and different programming models. Much of our research has focused on linear-algebra based approaches to miniTri. One of these linear-algebra based approaches served as the basis for the IEEE/DARPA/Amazon Graph Challenge triangle counting challenge problem.
miniTri is released as part of the Mantevo suite of miniApps and can be found in the following github repository: https://github.com/Mantevo/miniTri
- M. Wolf, J. Berry, D. Stark: “A Task-Based Linear Algebra Building Blocks Approach for Scalable Graph Analytics,” Proceedings of 2015 IEEE High Performance Extreme Computing Conference (HPEC), 2015.
- Wolf, Michael M., H. Carter Edwards, and Stephen L. Olivier. “Kokkos/Qthreads task-parallel approach to linear algebra based graph analytics,” Proceedings of 2016 High Performance Extreme Computing Conference (HPEC), 2016.
PHISH stands for Parallel Harness for Informatic Stream Hashing. The phishy metaphor is meant to evoke the image of many small minnows (programs) swimming in a stream (of data).
PHISH is a lightweight framework which a set of independent processes can use to exchange data as they run on the same desktop machine, on processors of a parallel machine, or on different machines across a network. This enables them to work in a coordinated parallel fashion to perform computations on either streaming or archived data.
The PHISH distribution includes a simple, portable library for performing data exchanges in useful patterns either via MPI message-passing or ZMQ sockets. PHISH input scripts are used to describe a data-processing algorithm, and an additional tool provided in the PHISH distribution converts the script into a form that can be launched as a parallel job.
Using a unique algorithm, GazeAppraise offers a new way to understand human performance as it relates to analysis of dynamic soft-copy images. GazeAppraise is a set of software libraries that work on raw eye movement data without assumptions defining fixations and regions of interest. This new approach to scanpath analysis uses multidimensional clustering to interpret collections of eye movements as whole shapes, with geometric features (such as length, angularity, aspect ratio, etc.) that can be used to differentiate and classify variations in eye movement patterns among individuals and across different image sets. Evaluating collections of eye movements as a holistic pattern, rather than a set of discrete spatiotemporal events, provides insight into the strategies of experts as they try to gain information from dynamic imagery display systems—not only with moving targets, but in still scenes as well.
Publications & Awards
- Michael J. Haass, Laura E. Matzen, Karin M. Butler, and Mika Armenta. 2016. A new method for categorizing scanpaths from eye tracking data. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications (ETRA '16). ACM, New York, NY, USA, 35-38.
- 2016 Federal Laboratories Consortium award for notable technology development
- A cooperative research and development agreement (CRADA) with EyeTracking, Inc., a company specializing in this field, gives the Sandia team access to a wide array of eye tracking systems and a pathway to commercial applications for this planned technology transfer.
- Two academic partners, Georgia Tech (GT) and the University of Illinois at Urbana-Champaign (UIUC), have helped Sandia to fine tune the quality of the technology and algorithms in GazeAppraise.
The MultiThreaded Graph Library (MTGL) is an open source collection of algorithms and data structures designed to run on shared-memory platforms, SMP machines and multi-core workstations.
MTGL provides the following:
- Enables multithreaded graph algorithms
- Written in the same style as the Boost Graph Library
- Abstracts data structures and other application specifics
- Hides shared memory issues
- Preserves good multithreaded performance
DYMATICA models are designed to quantitatively represent interactions between key actors to indicate likely outcomes over time. It can help organizations develop, understand, and compare likely effects of potential courses of action (COA) under a variety of geopolitical scenarios. DYMATICA supports hypothesis generation and COA development, analysis, and comparison, while accounting for uncertainty in the environment. It enables comparison and integration of views from multiple subject matter experts in a common, decision theory-based format.
Informs High Consequence Decisions
- Better understand and anticipate the interplay between specific individuals, political/social military organizations, and general society in response to potential courses of actions or events
- Enables analysts to assess higher-order (cascading) influences and reactions to events, as well as determine the uncertainty that the event will produce the desired results over time
The Citrus Text Analysis Library and its associated applications have been developed to rapidly adapt to specific user analytical needs for a wide array of problems and deployment environments, including document caches, web crawls, and database contents with a rich ability to quickly provide custom parsing, ingestion and analysis. The software is a Java-based framework employing best-in-class text analysis tools and related libraries. Citrus is available for government use. The software has been applied to a number of real-world problems ranging from document search, classification, mining, and trending to cyber-security. Applications built on the Citrus library range from thick clients to interactive command-line or scripting sessions, as well as to web-browser applications.
The software is a Java-based framework employing best-in-class text analysis tools and related libraries. Citrus is available for government use. The software has been applied to a number of real-world problems ranging from document search, classification, mining, and trending to cyber-security. Applications built on the Citrus library range from thick clients to interactive command-line or scripting sessions, as well as to web-browser applications.
In addition to the Citrus command line and scripting capabilities, Citrus employs a plugin architecture framework providing rapid expansion and customization for in-field modifications. Citrus capabilities include wide-ranging and customizable parsers for document ingestion, indexing for efficient search and display, vector and machine learning based algorithms for cluster and classification, web crawling, natural language processing, graph creation and visualization, document similarity and recommendations, foreign language support, and trending analysis. http://citrus.sandia.gov
Supervised machine learning is the process of using past experience to predict the future. "Ensembles" are a machine-learning meta-method that can be applied to most machine learning algorithms. Ensembles generally greatly improve accuracy, reduce or remove most of the design issues presented by machine learning, and are admirably suited to parallel and distributed computation.
The Avatar Tools codes are an implementation of ensembles specifically for decision trees.
Some features that distinguish Avatar Tools from other "ensembles for decision trees" codes are:
- Does the bookkeeping necessary for out of bag (OOB) validation.
- Can use OOB validation to automatically determine optimal ensemble size.
- Implements a number of methods for skew correction; in particular, the first production implementation of SMOTE.
- Provides convenient tools for cross-validation, to assess the accuracy provided by a training set.
- Provides convenient post-processing tools for assessing feature importance and correlation.
- Provides convenient post-processing tools for extracting a task specific measure of sample similarity.
The Avatar Tools codes are intended for Unix machines, and are known to build and pass their tests on Linux, Mac OS X, and Solaris machines.
- Banfield, R. E., Hall, L. O., Bowyer, K. W., Bhadoria, D., Kegelmeyer, W. P., and Eschrich, S. A comparison of ensemble creation techniques. In Proceedings of the Fifth International Conference on Multiple Classifier Systems, MCS2004 (2004), J. K. F. Roli and T. Windeatt, Eds., vol. 3077 of Lecture Notes in Computer Science, Springer-Verlag.
- Banfield, R. E., Hall, L. O., Bowyer, K. W., and Kegelmeyer, W. P. Ensemble diversity measures and their application to thinning. Information Fusion Journal 6, 1 (March 2005), 49-62. (PDF)
- Banfield, R. E., Hall, L. O., Bowyer, K. W., and Kegelmeyer, W. P. A comparison of decision tree ensemble creation techniques. IEEE Transactions on Pattern Analysis and Machine Intel ligence 29, 1 (January 2007), 173-180. (PDF)
- Breiman, L. Bagging predictors. Machine Learning 24 (1996), 123-140.
- Chawla, N. V., Hall, L. O., Bowyer, K. W., and Kegelmeyer, W. P. Learning ensembles from bites: A scalable and accurate approach. Journal of Machine Learning Research 5 (2004), 421-451. (PDF)
The qthreads API is designed to make using large numbers of threads convenient and easy, and to allow portable access to threading constructs used in massively parallel shared memory environments. The API maps well to both MTA-style threading and PIM-style threading, and is still quite useful in a standard SMP context. The qthreads API provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.
The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library for spawning and controlling coroutines: threads with small (4k) stacks. The threads are entirely in user-space and use their blocked/unblocked status as part of their scheduling. The library's metaphor is that there are many qthreads and several "shepherds". Shepherds can be thought of as a thread mobility domain; they map to specific processors or memory regions. Qthreads are assigned to specific shepherds and do not migrate unless directed to migrate.
The API includes utility functions for making threaded loops, sorting, and similar operations convenient.
The Qthread library was developed to explore innovations in highly concurrent systems where the ultimate system either does not exist, or is sufficiently hard to obtain that development of software for the system becomes difficult.
Development is currently hosted on GitHub: https://github.com/Qthreads/qthreads.