MapReduce-MPI

MR-MPI is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing.

MapReduce is the programming paradigm, popularized by Google, which is widely used for processing large data sets in parallel. Its salient feature is that if a task can be formulated as a MapReduce, the user can perform it in parallel without writing any parallel code. Instead the user writes serial functions (maps and reduces) which operate on portions of the data set independently. The data-movement and other necessary parallel operations can be performed in an application-independent fashion, in this case by the MR-MPI library.

The MR-MPI library was developed to solve informatics problems on traditional distributed-memory parallel computers.  It includes C++ and C interfaces callable from most hi-level languages, and also a Python wrapper and our own OINK scripting wrapper, which can be used to develop and chain MapReduce operations together. MR-MPI and OINK are open-source codes, distributed freely under the terms of the modified Berkeley Software Distribution (BSD) license.