yod(l) manual page
Table of Contents

NAME

yod - Allocate a SUNMOS mesh partition, load and run a SUNMOS user program.

SYNOPSIS

yod [-D level] [-comm size] [-heap size] [-stack size] [base node] [-proc mode] [-allocation mode] [-help] [-load] [-retry] [[-size n] [-sz n] [-exitonfault] filename [command line arguments] | -F loadfile]

DESCRIPTION

Yod performs mesh allocation, program load/execution, and provides file I/O including stdin, stdout, and stderr. Rudimentary job control is also provided (Ctrl-C). DO NOT issue a kill -9 to yod, because this may hang the Paragon.

OPTIONS

-D level
Turn debugging output on. Level 0 (the default) produces no output, while level 4 probably produces too much information. Level 1 tracks the mesh allocation and program load. This level does not produce any output once the program is running.

-comm
size Reserve communication buffer space of size bytes. The default is 256k bytes. Comm space is a pool where incoming messages are collected when the matching receive has not been posted.

-heap size
Reserve size bytes for the heap. The default is to allocate the remaining memory on each node after the comm, stack, program (text and data), and OS space have been allocated.

-stack size Reserve size bytes for the stack. The default is
256k bytes. If other CPUs on the same node are used for computation (the -proc option), the stack is divided evenly among the CPUs.

-base node
Base (top left corner) of the mesh partition.

Node can be a decimal number representing the physical ID of the node. Physical numbering starts at the top left of the mesh and proceeds to the right; row by row; to the bottom right. All slots (even empty ones) are counted. It does not matter whether the node is running OSF, SUNMOS, or is an I/O node.

An alternate numbering scheme labels the rows, starting at the top, from a to p, and the columns, starting at the left from 0 to n, (n depends on the machine size).

Valid arguments to -base include: 0, a0, p5, 422, etc.

-proc mode
The default mode is 0 which uses only one of the i860 microprocessors on each node board. If mode is set to 1, the second processor is turned on and used as a message coprocessor. Mode 2 allows programs to use the cop() and cop2() functions to execute code simultaneously on all processors.

-size n
The number of nodes that should be allocated. The size can be specified as a single decimal number, indicating how many nodes should be allocated, or as a string of the form ``height x width''. The argument rnd or random to the -size option allocates a random number of nodes. -size all allocates all currently free SUNMOS nodes. The default for n is 1.

-sz n
Same as the -size n option.

-allocation mode
The possible allocation modes are: any, lax, or strict. The default is any, if the size is specified with a single decimal number: -size n. If the size is specified with a height and width (-size height x width), the default is strict.

In strict mode, the allocator tries to find a partition that does not span nodes of other applications. In lax mode, the allocator will use partitions that are non-contiguous, if necessary. If mode is any, then any collection of nodes will be allocated, assuming there are enough free nodes, and no other allocation method has succeeded.

The allocator always starts starts in strict mode, then tries lax, and finally any. The -allocation mode prevents the allocator form trying lower levels than mode. If no partition of the required size (and shape, if size was specified with a width and height) is available, yod aborts the load.

The -allocation option also accepts the argument rnd or random which causes random nodes to be allocated. This option is meant for debug purposes and should not be used during prime time, since it may prevent allocation of rectangular areas for other jobs.

-help
Displays a message explaining briefly all available options.

-load
Load the program, but do not execute it. Mainly for diagnostic purposes.

-retry
If a program cannot be started, retry in 30 seconds. With this option, yod waits for service or compute nodes to become available, so the application can be started.

-exitonfaultIf one or more nodes of a job
fault, and this option is set, then yod will kill all other nodes in the application and terminate. This is the default behavior when running in NQS batch mode.

-F loadfile This
option is used for heterogeneous load (loading different executables on different nodes). The programs to be loaded will be read from the loadfile. When this option is present, the -size and -sz options should not be used on the command line. All the other options remain valid. The option arguments become the default values for all the programs specified in the loadfile. The format of the loadfile is as follows: - the first line should contain the mesh size specification, same as for the -size option. - the commands describing the programs to be loaded follow. A loadfile command's synopsis is: yod [-D level] [-comm size] [-heap size] [stack size] [-proc mode] [-allocation mode] [-size nodespec] [-sz nodespec] filename [command line arguments] All the options this time refer to the individual programs, and they override the default values. The -size option, which specifies the size and relative position of the running program within the global mesh, requires a slightly different syntax than the command line option. nodespec takes the form hxw:y,x, specifying the height and width, followed by the logical y and x offsets into the global mesh. No negative offsets are allowed. This is currently the only submesh size specification allowed. The -allocation option has currently no effect, as the allocation mode for a heterogeneous application is set internally to strict, and cannot be changed by the user. Example: command line: yod -heap 500000 -base e0 -F myload.file myload.file 4x4 yod -heap 300000 -sz 2x2:0,0 prog1 arglist1 yod -stack 500000 -sz 2x4:2,0 prog2 arglist2 yod -sz 2x2:0,2 prog3 arglist3

All options can be abbreviated, as long as the resulting string is a unique prefix of a valid option.

MESH ALLOCATION AND NODE MAPPING

The mesh allocator is controlled with the -size, -base, and -allocation flags. If the -size option is used with a height and width, then the allocator restricts itself to the given aspect ratio. In all other cases the allocator tries to find a contiguous region in the mesh that is ``as square as possible''.

If a base is specified, only partitions with their top left corner at that base are tried.

The -allocation flag determines the restrictions imposed on the allocator. Strict allocation means, that the partition has to be contiguous; i.e. not spanning nodes with other applications running on them, or nodes not running SUNMOS. If allocation is lax, then the allocator may skip rows and/or columns of nodes running other applications or OSF. The weakest allocation mode, any, allows the allocator to use any collection of nodes, assuming there are enough free nodes available.

In all cases the allocator starts out in strict mode and tries to satisfy the request. If this fails and lax mode is allowed, then the allocator continues its search for a suitable partition in that mode. If this also fails, then the any mode is used as a last resort. The weakest mode used by the allocator can be specified with the -allocation option. The default is any, unless a height and width has been given to the -size flag; in which case the default is lax.

The base node (specified by -base option, or chosen by the allocator), receives the logical node ID 0. Node numbering continues to the right, then to the next row, until the bottom right node of the partition is assigned the logical ID n - 1, where n is the number of nodes in the partition.

For heterogeneous loads, the global mesh is allocated as described above, according to the global mesh size specification indicated on the first line of the loadfile. The submeshes specified in the load file are then logically allocated within the global mesh. Application 1 gets logical nodes 0 through h1*w1 - 1, application 2 nodes h1*w1 through (h1*w1 + h2*w2 -1), and so on. The width, height and base offset of each submesh relative to the global mesh is of course determined according to the submesh size string.

PROGRAM LOAD AND EXECUTION

Yod loads the coff image of the user program into memory, calculates the sizes of the text, data, and bss segments, and determines the start address of the executable. The command line arguments following the filename are collected as well as the current environment variables.

This information is then sent to the base node of the allocated partition. The SUNMOS kernel on the base node partitions the physical memory into the requested sections and fans the information out to the other nodes assigned to this application. If SUNMOS determines that there is not enough memory to run the user program, an error message is sent to yod. The load is then aborted and yod terminates.

After a successful initialization, the program text and data is loaded onto all nodes in the application and the bss segment is initialized. Then yod sends the start signal and the application starts running. The first thing it does, is to request the command line arguments (argv) and the environment (env) from yod. The first three file descriptors are opened (stdin, stdout, and stderr), and the user's main() is called.

In a heterogeneous load, a program's parameters, text, data, arguments and environment are being sent to the base node of its logical submesh. The fanout takes place in the submesh only.

NQS AND MACS INTERFACE

If NQS is setup properly to manage SUNMOS partitions, it is possible to queue and execute SUNMOS jobs under NQS. In that case, NQS will set the environment variable NX_DFLT_PART to the partition it has created. A script containing the yod command and its arguments will be executed by NQS. Yod will perform node allocation within the partition assigned by

NQS, and run the SUNMOS job.

If the NX_DFLT_PART environment variable is not set, yod behaves as if NQS and MACS were not present. (The same as it always has.) This mode will probably be removed in later releases and replaced by the following mode as the default.

If the environment variable NX_DFLT_PART is set to .sunmos.interactive and the partition exists, yod will allocate nodes from that partition and allow MACS to keep track of number of nodes allocated, start, and end time. Note: SUNMOS nodes not in the .sunmos.interactive partition are considered to be in use by NQS. Therefore, to run interactive SUNMOS jobs you have to create the .sunmos.interactive partition or unset NX_DFLT_PART.

SERVICES

While the user program is running, yod performs file I/O operations on behalf of the application. Any I/O operation of the application is translated by the C library into a message to yod to execute the requested command on the service node. Therefore, all file systems (local and remote) accessible on the service node, are available to all nodes of the application.

If a kill signal (Ctrl-C) is sent to yod, then the application is terminated and the partition freed, before yod exits. After all nodes in the application exit, yod will also exit.

FILES

/mach_servers/sunmos.config
The current mesh allocation.

/sunmos/yodnodes
List of service nodes used by yod.

/var/adm/compute/run.log
A log recording restarts of the machine and information about parallel application runs under OSF and SUNMOS.

/usr/tmp/fyod.services
List of directories managed by a fyod.

/sunmos/job_id_file
List of job IDs currently in use.

SEE ALSO

showmesh(l), create_yod_config(l), fyod(l)

AUTHORS

Rolf Riesen, Sandia National Laboratories. Stephen Wheat, Sandia National Laboratories. Gabi Istrail, Sandia National Laboratories. (Heterogeneous load capability.)

BUGS

If yod terminates abnormally, or is killed with a signal it cannot catch (e.g. kill -9), then the mesh partition remains allocated.


Table of Contents

Acknowledgement and Disclaimer