2.8. Manual Job Control
The output of a running job can be controlled externally through the use of “shutdown” and “control” files. This mechanism allows for additional output of restart, results, history, and/or heartbeat data to be requested, as well as an optional graceful shutdown of the job.
A graceful shutdown is requested by inserting a “shutdown” file in the working directory of a running Sierra/SM job. The name of this file can be any of the following:
sierra.shutdown
adagio.shutdown
<base_name>.shutdown
If the <base_name>.shutdown variant is used, <base_name> is the base name of the input file. For example, if the input file name is my_analysis.i, then the shutdown file would be my_analysis.shutdown.
If the code detects the existence of a shutdown file, it will dump an output step to any open restart, results, history, or heartbeat file and then gracefully terminate the job. An entry will be written to the log file specifying that a shutdown file was detected and the file will be deleted. The contents of the file are not important; the shutdown file is only checked for existence. Note, the shutdown file will only be looked for at the end of a completed time step. Thus if the solve for a time step is taking an extremely long time the shutdown file will not be detected until that solve completes.
The control file capability is provided to give more control over output and execution than is possible with the shutdown file. The name of the control file is the same as that of the shutdown file except that the file name suffix is .control instead of .shutdown. The contents of the file consist of a single line, and are case insensitive. The syntax is:
DUMP [RESTART] [RESULTS] [HISTORY] [HEARTBEAT] [STEP] [TIME]
[STOP|ABORT|SHUTDOWN|CONTINUE]
The optional strings RESTART, RESULTS, HISTORY, HEARTBEAT, STEP, and TIME are used to specify the type of output that is written before an action is taken. If any output blocks of the specified type were defined in the input file for the model, output will be written to the files. The STEP and TIME options result in the last step and time being written to the log file. Multiple output types may be requested. If no output type is requested, all types of output specified in the input file will be written.
In addition to controlling the type of output that occurs, the .control file also specifies whether the job should be terminated or allowed to continue. If STOP, ABORT, or SHUTDOWN is specified at the end of the line, the job will be gracefully terminated. If CONTINUE is specified or no option is specified, the job will continue.
Several examples of the contents of continue files are shown below. The following examples all result in output being written to all types of output file and the job continuing:
dump
dump continue
dump restart results history heartbeat step continue
In both of the following examples, the current step and time will be written to the log file, but no additional output will be written and the job will continue:
dump step
dump time continue
This example would result in a write to all output types and a graceful shutdown:
dump stop
In the following example, no output would be written to any files, but the current step and time would be written to the log file before a graceful shutdown:
dump step abort
An alternate abbreviated syntax is also supported. The abbreviated commands are shown below along with the full commands to which they are mapped:
sw1 dump restart shutdown
sw2 dump step continue
sw3 dump restart continue
sw4 dump results continue
If either the shutdown or control files are used, a message is output to the log file listing the name of that file, and for the control file, the contents of that file. The control or shutdown file is then deleted.