MPSCP

 

 
 

 

MPSCP SOFTWARE DESIGN

SAND Number: 2005-4968P

Rationale

In the present, capability-driven model for apex computational and analytic science and engineering, separate, high-performance hardware resources are connected via IP networking, and huge datasets, with singular files many gigabytes to terabytes in size, must be copied from one local filesystem to another. New, faster network technologies, such as ATM and Gigabit Ethernet, provide the foundation for accomplishing this in a reasonable amount of time, that is, where any delta in transfer time has a sub-linear, perhaps logarithmic, relation with the increase in file size. However, with TCP continuing to be the overwhelming choice for protocol, traditional applications like serial FTP are not capable of “filling the pipe”, or capturing a suitable majority of the maximum potential bandwidth for a single transfer from one host to another. High latency, due to geographic distances or other reasons, and heterogeneous end platforms exacerbate this insufficiency. Parallelism in the TCP streams is therefore introduced in new FTP-type applications in order to achieve the desired performance for these cases.

Concept

Unlike the original, command driven FTP, with its own server-daemon listening at a known port, MPSCP follows the rexec paradigm of BSD, which is now most popularly realized in SSH. The user interface looks and feels like SCP 1, and no daemon other than the sshd is required for the remote host, which is no longer necessarily a server side, since the data moving processes are basically symmetrical partners, defined simply by the direction in which sockets are connected (which can be reversed on the command line), and which is the source and sink. This allows the SSH ubiquity to be leveraged, as well as the encryption for authentication and other control communications.

Deployment Requirements

Two, Posix compliant, single OS image platforms must be connected by at least one pair of IP network interfaces, for MPSCP to operate, just as with any INET-type of service. The single OS image platform must include the following capability for all processors:

·        Process creation via fork, with signals, wait and kill.

·        Shared memory.

·        Socket-based connections.

·        Single, uniformly mounted filesystem.

If either end-point has more than one interface on which to move data, some or all of the data movement can be directed over these supplemental routes by either an environment variable or a configuration file. An SSH client-server connection must be obtainable from the local side of the operation, and on the remote side an MPSCP executable must be on an available path.

Functional Detail

The separation with normal SCP comes from switching to a mode more like FTP, and even parallel FTP, where, instead of piping the data through the SSH connection, separate, additional, dedicated sockets are opened for the data moving. Just as with parallel FTP, the parent process on one side allocates one or more sockets for listening on (the number, or stripe-width in parallel I/O terms, has already been determined), and sends the IP addresses, which may be different, and port numbers back to the other side, where sockets will be allocated to connect. Each side uses fork to spawn one process per socket, and shared memory and signals for the communications of each of these back to the parent. The figure below depicts processes and communications paths.

 


Two more performance enhancing features distinguish MPSCP; (a) the ability to define and copy one portion of a file, and (b) next-available queueing of the parallel data blocks over the multiple TCP streams. The first of these is controlled by command line arguments specifying the offset and length of the file portion to copy. This allows for a straightforward means of multi-node or cluster parallelism, which some call striping. Additional, independent MPSCP jobs can be executed on separate nodes, respectively, possibly also connecting to separate nodes of a remote cluster, with each moving a disjoint piece of a single file. For instance, two MPSCP sessions, between two pairs of nodes, with one moving the first half of the file and the other the second, could double the legitimate transfer rate.

The intent and efficacy of the next-available queueing feature emerges in the case of less robust or more contentious network connections that suffer TCP packet loss and retransmittion. The speed of any single file transfer is not defined by how many bits travel on a network, but by the amount of time spent between the reading of the first block to move and the writing of the last. If one process or stream among many is slower and lags behind, it will be the determinate of the overall and true speed. The protocol for the data-block movement has two pieces, the data itself, and a single number for the offset position in the file (the size of a block for a file or portion is established in the initial control negotiation). With the data-moving processes of the source side maintaining, in a mutually-exclusive, shared memory variable, a common offset value for the next block to transfer, the first process finished with its last assigned block will take it. Notably, since each block comes with its own offset, the sink-side needs no special protocol or synchronization.

README

Download: current tar


Man Page

MPSCP(1L)    MISC. REFERENCE

 

NAME

     mpscp - high-performance remote file copy

 

SYNOPSIS

 

     mpscp [-prvBCa] [-S path-to-ssh] [-o ssh-options] [-P port]

           [-c cipher] [-i identity] [-a

           [[user@]host1:]filename1...  [[user@]host2:]filename2

 

DESCRIPTION

 

     Non-encrypted, TCP-streams, host-to-host file copy utility

     intended to enable greater transfer rates, while having the basic

     user-interface concept of BSD rcp.

 

     The underlying design provides the opportunity to maximize

     performance by allocating supplemental IP sockets as dedicated and

     optimizable data-channels, in a mode similar to FTP.

     Additionally, a parallel I/O scheme, based on the HPSS paradigm,

     is also employed, offering potentials for multiple-sockets over a

     single network connection (which, in many circumstances,

     especially those in involving heterogeneous host platforms, or

     large latency situations, can capture more bandwidth), and also

     the distribution of the sockets across multiple, a manifold of parallel IP

     network connections.

 

OPTIONS

 

     -a

           Alternate the direction of the data-socket connections. The

           default has the local side listening and accepting, while

           the remote connects. In alternate mode, the remote side also

           looks to read a configuration file.

 

     -m path

           Specifies a variant path-to-mpscp executable on remote side.

           Default is mpscp (a relative path).

 

     -w width

           Sets width of data-stripe; i.e.; number of TCP-streams used

           in parallel. Default is 1, maximum is 64.

 

     -b blocksize

           Sets the blocksize for reads and writes in each data stream.

           Default is 1048576, maximum is 33554432.

 

     -c cipher  

           Selects the cipher to use for encrypting the controldata transfer.

           This option is directly passed to ssh.

 

     -i identity_file

           Selects the file from which the identity (private key) for

           RSA authentication is read.  This option is directly passed

           to ssh.

 

     -o ssh-options

           Ssh options passed to ssh.

 

     -p    Preserves modification times, access times, and modes from the

           original file.

 

     -r    Recursively copy entire directories.

 

     -v    Verbose mode.  Causes scp and ssh to print debugging

           messages about their progress.  This is helpful in debugging

           connection, authentication, and configuration problems.

 

     -P port

           Sets the port to connect to on the remote host.  Note

           that this option is written with a capital P, because -p is

           already reserved for preserving the times and modes of the

           file in rcp. Default is 22.

 

     -S path-to-ssh

           Specifies the path to ssh program.

 

 

 

ENVIRONMENT VARIABLES

 

     MPSCP_CONFIG ― Local [path]name for configuration file

 

     MPSCP_SERVER ― remote [path]name for executable

 

     MPSCP_DATA_IP ― One local IP address for high-speed, data socket

 

     MPSCP_PORTS ― Port number range for socket binding

FILES

 

     /usr/local/mpscp/config - Default path for configuration file.

                               Can be reassigned dynamically with

                               The environment variable MPSCP_CONFIG

 

     The following show the formatting and an example for this

     file:

 

     # Any numeric value coming as the first, non-white character

     # on a line will be interpreted as an IP address to be used

     # for a data channel. This seems like a bug in the inet_addr

     # routines I have encountered on multiple vendor, unix

     # platforms, but I have accepted it as true.

 

     # This machine has four OC-3 ATM to be utilized in parallel

     132.175.26.150

     192.168.1.3

     192.168.2.3

     192.168.3.3

     # keyword "filesize" the first non-white string

     # filesize key word   file size  I/O blocksize  Stripewidth   Optional TCP window size

         filesize         9999999999    1048576          16                262144

         filesize          999999999    8388608           8   

         filesize           99999999    4194304           4

         filesize            9999999    4194304           1

     # This hierarchy of file sizes is a convoluted example.

     # Also, they do not have to be listed in order as shown, the application will sort

     # them to find the best match, as large as a listed size, less than the next larger.

 

   

INSTALLATION

 

     Similiar to scp, an executable mirroring the local side mpscp

     must be found on the remote host. The default is simply the name

     mpscp; which is the simplest approach: the same name is expected

     to be on a users default login path. This could be changed to a

     full path, /usr/local/bin/mpscp, for instance, by modifying the

     Makefile, and recompiling. Also, dynamically, the local shell

     environment variable, MPSCP_SERVER, can be set, and finally the

     -m option can be used on the command line.

 

     For options such as multiple IP address for data channels, or

     automatic assignments of stripe-widths and blocksizes based on

     file size, a configuration file must be found by the local client

     (or the remote side if the -a option is used). The default for

     this is hardcoded at /usr/local/mpscp/config, and, again, this

     has several opportunties for adjusting, including the environment

     variable, MPSCP_CONFIG

 

 

Top of Page

 


Contact
SMTS
Marty Barnaby
(mlbarna@sandia.gov)
(505)844-4488