Scalable Generation of Graphs for Benchmarking HPC Community-Detection Algorithms
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We seek scalable benchmarks for entity resolution problems. Solutions to these problems range from trivial approaches such as string sorting to sophisticated methods such as statistical relational learning. The theoretical and practical complexity of these approaches varies widely, so one of the primary purposes of a benchmark will be to quantify the trade-off between solution quality and runtime. We are motivated by the ubiquitous nature of entity resolution as a fundamental problem faced by any organization that ingests large amounts of noisy text data. A benchmark is typically a rigid specification that provides an objective measure usable for ranking implementations of an algorithm. For example the Top500 and HPCG500 bench- marks rank supercomputers based on their performance of dense and sparse linear algebra problems (respectively). These two benchmarks require participants to report FLOPS counts attainable on various machines. Our purpose is slightly different. Whereas the supercomputing benchmarks mentioned above hold algorithms constant and aim to rank machines, we are primarily interested in ranking algorithms. As mentioned above, entity resolution problems can be approached in completely different ways. We believe that users of our benchmarks must decide what sort of procedure to run before comparing implementations and architectures. Eventually, we also wish to provide a mechanism for ranking machines while holding algorithmic approach constant . Our primary contributions are parallel algorithms for computing solution quality mea- sures per entity. We find in some real datasets that many entities are quite easy to resolve while others are difficult, with a heavy skew toward the former case. Therefore, measures such as global confusion matrices, F measures, etc. do not meet our benchmarking needs. We design methods for computing solution quality at the granularity of a single entity in order to know when proposed solutions do well in difficult situations (perhaps justifying extra computational), or struggling in easy situations. We report on progress toward a viable benchmark for comparing entity resolution algo- rithms. Our work is incomplete, but we have designed and prototyped several algorithms to help evalute the solution quality of competing approaches to these problems. We envision a benchmark in which the objective measure is a ratio of solution quality to runtime.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report summarizes the work performed under the project "Advanced Data Structures for Improved Cyber Resilience and Awareness in Untrusted Environments." The goal of the project was to design, analyze, and test new data structures for cybersecurity applications. We had two major thrusts: 1) using new/improved write-optimized data structures and/or algorithms to better man- age and analyze high-speed massive-volume cyberstreams, and 2) adding security features to data structures at minimum cost. Write optimization allows data structures to better use secondary memory to store and search a larger amount of useful information. Secondary memory is large compared to main memory, but data movement to and from secondary memory must be carefully managed to run quickly enough to keep up with fast streams. The first thrust included managing cyberstreams in parallel, both multi-threaded and distributed, and improving the benchmarking infrastructure for testing new streaming data structures. We considered both (near) real-time discovery of particular patterns, and improved logging for improved forensics. We considered two kinds of security-feature problem. The first was high-performance history- independent external-memory data structures. These provide certain protections to data if a disk is stolen. We also prove some trade-offs between speed and security in this setting. The second data-security problem is more secure data look-up in secret-shared data bases. This report summarizes the project's major accomplishments, with the background to under- stand these accomplishments. It gathers the abstracts and references for the six refereed publications that have appeared as part of this work. We summarize several accomplishments that will be submitted for publication. We then archive one piece of partial work that is not likely to be published in the near future: validation of history-independent data structure implementations.
Abstract not provided.
Abstract not provided.
Abstract not provided.
2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our miniTri data analytics miniapplication [1] and our efforts to pose graph algorithms in the language of linear algebra. We leverage KokkosKernels to implement this approach efficiently on multicore architectures. Our performance results are competitive with the fastest known graph traversal-based approaches and are significantly faster than the Graph Challenge reference implementations, up to 670,000 times faster than the C++ reference and 10,000 times faster than the Python reference on a single Intel Haswell node.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.