SAND
Number: 2005-4968P
Rationale
In the present,
capability-driven
model for apex computational and analytic
science and engineering, separate, high-performance hardware resources are connected
via IP networking, and huge datasets, with singular files many gigabytes to
terabytes in size, must be copied from one local filesystem to another.
New, faster network technologies, such as ATM and Gigabit Ethernet, provide
the foundation for accomplishing this in a reasonable amount of time, that
is, where any delta in transfer time has a sub-linear, perhaps logarithmic,
relation with the increase in file size. However, with TCP continuing to be
the overwhelming choice for protocol, traditional applications like serial
FTP are not capable of “filling the pipe”, or capturing a suitable majority
of the maximum potential bandwidth for a single transfer from one host to
another. High latency, due to geographic distances or other reasons, and
heterogeneous end platforms exacerbate this insufficiency. Parallelism in
the TCP streams is therefore introduced in new FTP-type applications in
order to achieve the desired performance for these cases.
Concept
Unlike the original,
command driven FTP, with its own server-daemon listening at a known port,
MPSCP follows the rexec paradigm of BSD, which is now most popularly
realized in SSH. The user interface looks and feels like SCP 1, and no
daemon other than the sshd is required for the remote host, which is no longer
necessarily a server side, since the data moving processes are basically
symmetrical partners, defined simply by the direction in which sockets are
connected (which can be reversed on the command line), and which is the
source and sink. This allows the SSH ubiquity to be leveraged, as well as
the encryption for authentication and other control communications.
Deployment Requirements
Two, Posix compliant, single
OS image platforms must be connected by at least one pair of IP network
interfaces, for MPSCP to operate, just as with any INET-type of service. The single
OS image platform must include the following capability
for all processors:
· Process
creation via fork, with signals, wait and kill.
· Shared
memory.
·
Socket-based
connections.
· Single,
uniformly mounted filesystem.
If either end-point has
more than one interface on which to move data, some or all of the data
movement can be directed over these supplemental routes by either an
environment variable or a configuration file. An SSH client-server
connection must be obtainable from the local side of the operation, and on
the remote side an MPSCP executable must be on an available path.
Functional Detail
The separation with normal
SCP comes from switching to a mode more like FTP, and even parallel FTP,
where, instead of piping the data through the SSH connection, separate,
additional, dedicated sockets are opened for the data moving. Just as with
parallel FTP, the parent process on one side allocates one or more sockets
for listening on (the number, or stripe-width in parallel I/O terms, has
already been determined), and sends the IP addresses, which may be
different, and port numbers back to the other side, where sockets will be
allocated to connect. Each side uses fork to spawn one process per socket,
and shared memory and signals for the communications of each of these back
to the parent. The figure below depicts processes and communications paths.
Two more performance enhancing features distinguish
MPSCP; (a) the ability to define and copy one portion of a file, and (b)
next-available queueing of the parallel data blocks over the multiple TCP
streams. The first of these is controlled by command line arguments
specifying the offset and length of the file portion to copy. This allows
for a straightforward means of multi-node or cluster parallelism, which
some call striping. Additional, independent MPSCP jobs can be executed on
separate nodes, respectively, possibly also connecting to separate nodes of
a remote cluster, with each moving a disjoint piece of a single file. For
instance, two MPSCP sessions, between two pairs of nodes, with one moving
the first half of the file and the other the second, could double the
legitimate transfer rate.
The intent and efficacy of
the next-available queueing feature emerges in the case of less robust or
more contentious network connections that suffer TCP packet loss and
retransmittion. The speed of any single file transfer is not defined by how
many bits travel on a network, but by the amount of time spent between the
reading of the first block to move and the writing of the last. If one
process or stream among many is slower and lags behind, it will be the
determinate of the overall and true speed. The protocol for the data-block
movement has two pieces, the data itself, and a single number for the
offset position in the file (the size of a block for a file or portion is
established in the initial control negotiation). With the data-moving
processes of the source side maintaining, in a mutually-exclusive, shared
memory variable, a common offset value for the next block to transfer, the
first process finished with its last assigned block will take it. Notably,
since each block comes with its own offset, the sink-side needs no special
protocol or synchronization.
README
Man Page
MPSCP(1L) MISC. REFERENCE
NAME
mpscp - high-performance remote file
copy
SYNOPSIS
mpscp [-prvBCa] [-S path-to-ssh] [-o
ssh-options] [-P port]
[-c cipher] [-i identity] [-a
[[user@]host1:]filename1... [[user@]host2:]filename2
DESCRIPTION
Non-encrypted, TCP-streams,
host-to-host file copy utility
intended to enable greater transfer
rates, while having the basic
user-interface concept of BSD rcp.
The underlying design provides the
opportunity to maximize
performance by allocating supplemental
IP sockets as dedicated and
optimizable data-channels, in a mode
similar to FTP.
Additionally, a parallel I/O scheme,
based on the HPSS paradigm,
is also employed, offering potentials
for multiple-sockets over a
single network connection (which, in
many circumstances,
especially those in involving
heterogeneous host platforms, or
large latency situations, can capture
more bandwidth), and also
the distribution of the sockets across
multiple,
a manifold of parallel IP
network connections.
OPTIONS
-a
Alternate the direction of the data-socket
connections. The
default has the local side listening and accepting, while
the remote connects. In alternate mode, the remote side also
looks to read a configuration file.
-m path
Specifies a variant path-to-mpscp
executable on remote side.
Default is mpscp (a relative
path).
-w width
Sets width of data-stripe; i.e.;
number of TCP-streams used
in parallel. Default is 1,
maximum is 64.
-b blocksize
Sets the blocksize for reads and
writes in each data stream.
Default is 1048576, maximum is
33554432.
-c cipher
Selects the cipher to use for
encrypting the controldata
transfer.
This option is directly passed
to ssh.
-i identity_file
Selects the file from which the
identity (private key) for
RSA authentication is read.
This option is directly passed
to ssh.
-o ssh-options
Ssh options passed to ssh.
-p
Preserves modification times, access times, and modes from the
original file.
-r
Recursively copy entire directories.
-v
Verbose mode. Causes scp and
ssh to print debugging
messages about their progress.
This is helpful in debugging
connection, authentication, and configuration problems.
-P port
Sets the port to connect to on
the remote host. Note
that this option is written with a capital P, because -p is
already reserved for preserving the times and modes of the
file in rcp. Default is 22.
-S path-to-ssh
Specifies the path to ssh
program.
ENVIRONMENT
VARIABLES
MPSCP_CONFIG ― Local [path]name
for configuration file
MPSCP_SERVER ― remote [path]name
for executable
MPSCP_DATA_IP ― One local IP
address for high-speed, data socket
MPSCP_PORTS ― Port number range
for socket binding
FILES
/usr/local/mpscp/config - Default path
for configuration file.
Can be
reassigned dynamically with
The
environment variable MPSCP_CONFIG
The following show the formatting and
an example for this
file:
# Any numeric value coming as the
first, non-white character
# on a line will be interpreted as an
IP address to be used
# for a data channel. This seems like
a bug in the inet_addr
# routines I have encountered on
multiple vendor, unix
# platforms, but I have accepted it as
true.
# This machine has four OC-3 ATM to be
utilized in parallel
132.175.26.150
192.168.1.3
192.168.2.3
192.168.3.3
# keyword "filesize" the
first non-white string
# filesize key word file size I/O blocksize Stripewidth Optional TCP window size
filesize 9999999999 1048576 16 262144
filesize 999999999 8388608 8
filesize 99999999 4194304 4
filesize 9999999 4194304 1
# This hierarchy of file sizes is a
convoluted example.
# Also, they do not have to be listed
in order as shown, the application will sort
# them to find the best match, as
large as a listed size, less than the next larger.
INSTALLATION
Similiar to scp, an executable
mirroring the local side mpscp
must be found on the remote host. The
default is simply the name
mpscp; which is the simplest approach:
the same name is expected
to be on a users default login path.
This could be changed to a
full path, /usr/local/bin/mpscp, for
instance, by modifying the
Makefile, and recompiling. Also,
dynamically, the local shell
environment variable, MPSCP_SERVER,
can be set, and finally the
-m option can be used on the command
line.
For options such as multiple IP
address for data channels, or
automatic assignments of stripe-widths
and blocksizes based on
file size, a configuration file must
be found by the local client
(or the remote side if the -a option
is used). The default for
this is hardcoded at
/usr/local/mpscp/config, and, again, this
has several opportunties for
adjusting, including the environment
variable, MPSCP_CONFIG
Top of Page
|