MapReduce-MPI WWW Site

MapReduce-MPI (MR-MPI) Library Documentation

7 Apr 2014 version

Version info:

The MR-MPI "version" is the date when it was released, such as 1 May 2010. MR-MPI is updated continuously. Whenever we fix a bug or add a feature, we release it immediately, and post a notice on this page of the WWW site. Each dated copy of MR-MPI contains all the features and bug-fixes up to and including that version date. The version date is printed to the screen every time you run a program that uses MR-MPI. It is also in the file src/version.h and in the MR-MPI directory name created when you unpack a tarball.

The MapReduce-MPI (MR-MPI) library is open-source software that implements the MapReduce operation popularized by Google on top of standard MPI message passing.

The library is designed for parallel execution on distributed-memory platforms, but will also operate on a single processor. It requires no additional software to build and run, except linking with an MPI library if you wish to perform MapReduces in parallel. Similar to the original Google design, a user performs a MapReduce by writing a small program that invokes the library. The user typically provides two application-specific functions, a "map()" and a "reduce()", that are called back from the library when a MapReduce operation is executed. "Map()" and "reduce()" are serial functions, meaning they are invoked independently on individual processors on portions of your data when performing a MapReduce operation in parallel.

The MR-MPI library is written in C++ and is callable from hi-level langauges such as C++, C, Fortran. A Python wrapper is also included, so MapReduce programs can be written in Python, including map() and reduce() user callback methods. A hi-level scripting interface to the MR-MPI library, called OINK, is also included which can be used to develop and chain MapReduce algorithms together in scripts with commands that simplify data management tasks. OINK has its own manual and doc pages.

The goal of the MR-MPI library is to provide a simple and portable interface for users to create their own MapReduce programs, which can then be run on any desktop or large parallel machine using MPI. See the Background section for features and limitations of this implementation.

The distrubution includes a few examples of simple programs that illustrate the use of MR-MPI.

Source code for the library and OINK is freely available for download from the MR-MPI web site and is licensed under the modified Berkeley Software Distribution (BSD) License. This basically means they can be used by anyone for any purpose. See the LICENSE file provided with the distribution for more details.

The authors of the MR-MPI library are Steve Plimpton and Karen Devine who can be contacted via email: sjplimp,kddevin at sandia.gov.


The MR-MPI documentation is organized into the following sections. If you find errors or omissions in this manual or have suggestions for useful information to add, please send an email to the developers so we can improve the MR-MPI documentation.

Once you are familiar with MR-MPI, you may want to bookmark this page at interface_c++.html, since it gives quick access to documentation for all the MR-MPI library methods.

PDF file of the entire manual, generated by htmldoc