9.9. Restart Data
BEGIN RESTART DATA <string>restart_name
#
# Database Commands
DATABASE NAME = <string>restart_file
INPUT DATABASE NAME = <string>restart_input_file
OUTPUT DATABASE NAME = <string>restart_output_file
DATABASE TYPE = <string>database_type(exodusII)
OVERWRITE = <string>OFF|ON|TRUE|FALSE|YES|NO
(ON|TRUE|YES)
PROPERTY = <string>propName = <string>propValue
RESTART TIME = <real>time
RESTART = AUTOMATIC
START TIME = <real>restart_start_time
TIMESTEP ADJUSTMENT INTERVAL = <integer>steps
AT TIME <real>time_begin INCREMENT =
<real>time_increment_dt
ADDITIONAL TIMES = <real>output_time1
<real>output_time2 ...
AT STEP <integer>step_begin INCREMENT =
<integer>step_increment
ADDITIONAL STEPS = <integer>output_step1
<integer>output_step2 ...
AT WALL TIME <real>time_begin[s|m|h|d] INCREMENT =
<real>time_increment_dt[s|m|h|d]
TERMINATION TIME = <real>termination_time_value
OVERLAY COUNT = <integer>overlay_count
CYCLE COUNT = <integer>cycle_count
FILE CYCLE COUNT = <integer>file_cycle_count
SYNCHRONIZE_OUTPUT
USE OUTPUT SCHEDULER <string>scheduler_name
COMPONENT SEPARATOR CHARACTER = <string>character|NONE
OUTPUT ON SIGNAL = <string>SIGALRM|SIGFPE|SIGHUP|SIGINT|
SIGPIPE|SIGQUIT|SIGTERM|SIGUSR1|SIGUSR2|SIGABRT|
SIGKILL|SIGILL|SIGSEGV
OPTIONAL
DECOMPOSITION METHOD = RCB|RIB|LINEAR|HSFC|BLOCK|CYCLIC|
RANDOM|KWAY|GEOM_KWAY|KWAY_GEOM|METIS_SFC|EXTERNAL
SHIFT TO START TIME
END [RESTART DATA <string>restart_name]
The restart capability in Sierra/SM allows a simulation to resume after the analysis has terminated. Restart can be used to break a long analysis into several smaller runs so the user can examine intermediate results before proceeding. In case of abnormal termination, such as a machine outage, restart may be used to avoid rerunning the entire simulation. If a restart file has been written at various intervals throughout the analysis up to the point where the abnormal termination has occurred, the analysis can be restarted at a time before the abnormal termination, and the simulation can continue.
In general, the restart capability assumes the simulation input parameters are the same before and after the restart. Therefore, it is highly recommended that parameters such as boundary conditions, element types, and material models remain the same for at least the first step of each “restart run.”
It is also highly discouraged to restart from an implicit simulation directly into an explicit simulation or vice versa. If an implicit/explicit restart is desired, refer to Section 9.9.8 below.
Many input parameters may be modified in a restart, such as hourglass control or solver parameters. If an abnormal termination is encountered due to, for example, element inversion or an implicit solver convergence problem, it is recommended to modify these parameters to allow the simulation to continue past the abnormal termination time previously encountered.
The amount of data written at a restart time is large, since it contains a complete description of the state for the problem at that time. The restart data includes not only information such as displacement, velocity, and acceleration, but also information such as element stresses and all state variables for the material model associated with each element. Careful consideration should be used to choose the increment for writing restart files (so that storage space is not completely filled) and to determine if it is necessary to overwrite previous restart files.
The time at which to restart or automatic restart can be specified within the Sierra scope (see Section 11.2 for more details), but the same commands are available within the RESTART DATA command block as well.
In order to perform a basic restart, it is required to specify (i) the restart database name to read or write and (ii) either automatic restart or the time for restart to take place, for example:
BEGIN RESTART DATA RESTART_DATA
DATABASE NAME = g.rsout
Restart = auto
END RESTART DATA RESTART_DATA
Note that if this command is used, restart data will only be written at the last step of the analysis or if Sierra/SM detects an internal error such as element inversion. To specify more frequent times to write restart data, the AT TIME ... or AT STEP ... command should be used.
BEGIN RESTART DATA RESTART_DATA
DATABASE NAME = g.rsout
Restart = auto
At Time 1e-4 Increment = 1e-5
END RESTART DATA RESTART_DATA
For more complex uses of restart, please read below for further details.
9.9.1. Restart Options
The RESTART DATA command block begins with the input line:
BEGIN RESTART DATA <string>restart_name
and is terminated with:
END [RESTART DATA <string>restart_name]
where restart_name is a user-selected name for the RESTART DATA command block.
Nested within the RESTART DATA command block are a set of command lines, as shown in the block summary given above.
DATABASE NAME = <string>restart_file
INPUT DATABASE NAME = <string>restart_input_file
OUTPUT DATABASE NAME = <string>restart_output_file
DATABASE TYPE = <string>database_type(exodusII)
OPTIONAL
The database name commands are used to specify the names of the files to read restart data from or write restart data to. If the {OUTPUT DATABASE NAME command is used, restart data will be written to the filename specified. If the INPUT DATABASE NAME command is used, restart data will be read from the file with the specified filename. If the DATABASE NAME command is used, the filename specified will be used for both writing out restart data and reading in restart data. Either the DATABASE NAME command should be used alone or the INPUT DATABASE NAME command line OUTPUT DATABASE NAME command line should be used, but not both. The DATABASE NAME command is more automated than specifying individual input and output database names and is the preferred method. Section 9.9.1.2 through Section 9.9.1.5 shows how this command line is used in specific instances.
See Section 9.2 for more details on the general database commands such as DATABASE NAME, DATABASE TYPE, OVERWRITE, etc.
In coupled physics analyses with multiple regions, only a subset of the regions may have a restart database associated with them. The OPTIONAL command (Section 9.9.1) is used to tell the application that it is acceptable to restart the analysis even though a region does not have an associated restart database. This is only allowed in analyses containing multiple regions; if there is only a single region, it must have a restart database to restart.
Certain simuation scenarios require a shift in time when restarting. The command
SHIFT TO START TIME
allows the code to ignore the time associated with the restart file, and the time at the beginning of the restarted analysis is controlled by the START TIME command in the TIME CONTROL input block. For an example, suppose the restart file contains data at time \(t=3\,\mathrm{s}\), but the restarted analysis time needs to be shifted to \(t=0\,\mathrm{s}\) to correspond with an external time-dependent loading. The restart command SHIFT TO START TIME can be used in this scenario.
9.9.1.1. What Times to Write
Several options exist to write restart at specific points in the analysis.
The
AT TIMEandADDITIONAL TIMEScommands define restart in terms of analysis timeThe
AT STEPandADDITIONAL STEPScommands define restart in terms of analysis step numberThe
AT WALL TIMEcommand defines restart in terms of wall clock runtime. An example of the expected format is
AT WALL TIME 3600s INCREMENT = 60m
This example would write the first restart after 3600 seconds of runtime; then additional restart data will be written every 60 minutes thereafter. The units of time must be specified by s (seconds), m (minutes), h (hours), or d (days).
Note that restart can only be written at the end of a completed step. If a restart time \(t_s\) which does not coincide with a simulation time step \(t_i\) is specified, restart data will be written at the next analysis step \(t_{n+1} \ge t_s\). For example, if an analysis has steps at times 0.1 and 0.2, asking for restart at time 0.125 will write the restart data at time 0.2 (the end of the next completed step). Additionally, restart will only be written at the end of a completed implicit loadstep.
9.9.1.2. Automatic Read and Write of Restart Files
Restart files can be tracked in an automated fashion by using the RESTART = AUTO command. Automatic restart can best be explained by an example. For the below example, assume the problem is run on two processors and all files are in the current working directory.
The automated restart option will not only manage the restart files to prevent overwriting, it will also manage the results files and history files to prevent overwriting. In the example given, assume the results file is defined with the command line
DATABASE NAME = rslt.e
where this command will generate the file rslt.e. Also assume history output is defined with the command line
DATABASE NAME = hist.h
to generate the file hist.h.
For all runs in this example, the command line
RESTART = AUTOMATIC
should be in either the SIERRA scope or the RESTART DATA command block. The RESTART TIME command line should not be used with RESTART = AUTOMATIC. For additional documentation of the RESTART TIME command, refer to Section 2.1.3.1.
Finally, assume the RESTART DATA command block is defined as follows:
BEGIN RESTART DATA RESTART_DATA
DATABASE NAME = g.rsout
AT TIME 0.0 INCREMENT = 0.25E-3
RESTART = AUTO #Optionally placed here or SIERRA scope
END RESTART DATA RESTART_DATA
In this block, the DATABASE NAME command line specifies a root file name for the restart file. The AT TIME command line gives the time and the increment for which restart information will be written.
Assume the first job is launched; the first run will generate the following output files:
# restart files
g.rsout.2.0
g.rsout.2.1
# results files
rslt.e.2.0
rslt.e.2.1
# history files
hist.h.2.0
hist.h.2.1
For the above files, the .2.0 and .2.1 extensions designate how the mesh was decomposed across the two processors.
Now, it is desired to start a second simulation using the restart data from the first, at the termination time of the previous run. The restart information in the input file will remain the same. In the TIME CONTROL command block, the only line that needs to be changed is the TERMINATION TIME command line. Even if the START TIME remains at 0.0, this time will be superseded by the last output time in the restart file.
The second job may then be launched; when it is completed, the following output files will be in the current working directory:
# restart files
g.rsout.2.0
g.rsout.2.1
g.rsout-s0002.2.0
g.rsout-s0002.2.1
# results files
rslt.e.2.0
rslt.e.2.1
rslt.e-s0002.2.0
rslt.e-s0002.2.1
# history files
hist.h.2.0
hist.h.2.1
hist.h-s0002.2.0
hist.h-s0002.2.1
Files that were previously written remain untouched; new files with the extension -s0002 contain the output data associated with the second run.
The process outlined above can be continued as long as necessary. As long as the RESTART = AUTO command remains, new output files with the -s* extension will continue to be generated and the previous files will not be overwritten.
9.9.1.3. User-Controlled Read and Write of Restart Files
Restart times and input/output database names may be specified using a combination of the RESTART TIME command line in the SIERRA scope and the INPUT DATABASE NAME and OUTPUT DATABASE NAME command lines in the RESTART DATA command block, allowing the user fine-grained control of restart data.
Note, however, that restart information is always automatically written upon element inversion.
User-controlled restart can best be explained by an example. We will use a two-processor example and assume all files will be in our current directory. In this example, the creation of new restart files is managed by the user so as not to overwrite existing restart files, analogously to the automated option. Unlike the automated option for restart, this controlled use of restart requires the user to manage restart file names manually so as to prevent overwriting previously generated files. The user must also manage the creation of new results and history output files so as not to overwrite existing files by creating new results/history files for each run in the sequence of restart runs requires changing the DATABASE NAME command line in the RESULTS OUTPUT and HISTORY OUTPUT command blocks. Explicit examples of DATABASE NAME command line usage in the RESULTS OUTPUT and HISTORY OUTPUT command blocks are omitted, since it closely parallels the pattern managing restart file names.
For the first run in the restart sequence, only a RESTART DATA command block is present in the region; there is no restart-related command line in the SIERRA scope of the input file. However, a
START TIME = 0.0
command line in a TIME STEPPING command block (within the TIME CONTROL command block) and a
TERMINATION TIME = 2.5E-3
command line within the TIME CONTROL command block are present to set the limits for the begin and end times. The RESTART DATA command block in the example input file is as follows:
BEGIN RESTART DATA RESTART_DATA
OUTPUT DATABASE NAME = RS1.rsout
AT TIME 0.0 INCREMENT = 0.5E-3
END RESTART DATA RESTART_DATA
In the first run, the restart option will generate the restart files:
RS1.rsout.2.0
RS1.rsout.2.1
The .2.0 and .2.1 file name extensions correspond to each processor’s mesh file in a two-processor run. If our mesh file is mesh.g, then our mesh files on the individual processors will be mesh.g.2.0 and mesh.g.2.1. All restart information in the above files appears at time intervals of \(0.5 \times 10^{-3}\), and the last restart information is written at time \(2.5 \times 10^{-3}\).
For the second run in the sequence of restart runs, it is desired to start at the previous termination time, \(2.5 \times 10^{-3}\), and terminate at time \(5.0 \times 10^{-3}\). To do this, the command line
RESTART TIME = 2.5E-3
must be added to the SIERRA scope, and the termination time must be set using the command line
TERMINATION TIME = 5.0E-3
within the TIME CONTROL command block.
The actual start time for the second run in our analysis is now set by the restart time set on the RESTART TIME command line, \(2.5 \times 10^{-3}\). The command line START TIME = 0.0 in the TIME STEPPING command block is now superseded as the actual starting time for the second run by the restart commands. Any START TIME command line in a TIME STEPPING command block is still valid for defining time stepping blocks (used to set activation periods), but the restart process sets the actual start time for our analysis. This pattern of control for setting the actual start time holds for any run in the sequence of restart runs.
The RESTART DATA command block must also be modified to read:
BEGIN RESTART DATA RESTART_DATA
INPUT DATABASE NAME = RS1.rsout
OUTPUT DATABASE NAME = RS2.rsout
AT TIME 0.0 INCREMENT = 0.5E-3
END RESTART DATA RESTART_DATA
For the second run, the files
RS1.rsout.2.0
RS1.rsout.2.1
are read, and the files
RS2.rsout.2.0
RS2.rsout.2.1
are written. All restart information in RS2.rsout.2.0 and RS2.rsout.2.1 appears at time intervals of \(0.5 \times 10^{-3}\), restart information is written from times \(2.5 \times 10^{-3}\) to \(5.0 \times 10^{-3}\). The restart files from the first run of the restart sequence have been preserved, since distinct input and output database names, RS2.rsout and RS1.rsout, respectively, have been specified.
Now, a third run is conducted in the restart sequence, starting at time \(4.5 \times 10^{-3}\) and terminating at time \(8.5 \times 10^{-3}\). The second run terminated at time \(5.0 \times 10^{-3}\); thus, a restart time must be manually specified:
RESTART TIME = 4.5E-3
and the TERMINATION TIME command line within the TIME CONTROL command block must be specified:
TERMINATION TIME = 8.5E-3
The RESTART DATA command block should also read:
BEGIN RESTART DATA RESTART_DATA
INPUT DATABASE NAME = RS2.rsout
OUTPUT DATABASE NAME = RS3.rsout
AT TIME 0.0, INCREMENT = 0.5E-3
END RESTART DATA RESTART_DATA
The third run will read restart files:
RS2.rsout.2.0
RS2.rsout.2.1
and write restart data to files:
RS3.rsout.2.0
RS3.rsout.2.1
All restart information in RS3.rsout.2.0 and RS3.rsout.2.1 appears at time intervals of \(0.5 \times 10^{-3}\), and restart information is written from times \(4.5 \times 10^{-3}\) to \(8.5 \times 10^{-3}\). The restart files from the first and second runs of the restart sequence have been preserved, since distinct input and output database names have been specified.
9.9.1.4. Overwriting Restart Files
By default, every time an analysis is restarted, that output file the restart data is written to will be overwritten. As previously stated, it is best to have a separate restart file (or files for parallel runs) associated with each run in a sequence of restart runs if storage space permits. To not overwrite a restart file, the OVERWRITE = OFF command should be used. If this command is used and a restart file already exists with the name given in the DATABASE NAME command line, Sierra/SM will throw an error on start-up. To fix this error, the name in the DATABASE NAME command should be changed.
The below example demonstrates how the restart data will be written:
For the first run, suppose the termination time is set as
TERMINATION TIME = 1.0E-3
and the RESTART DATA command block is as follows:
BEGIN RESTART DATA
DATABASE NAME = RS.out
AT TIME 0.0 INTERVAL = 0.25E-3
END RESTART DATA
The above command block will output restart data to the file RS.out at intervals of \(0.25 \times 10^{-3}\) from time 0.0 through the termination time.
For the second run, suppose the termination time is set as:
TERMINATION TIME = 2.0E-3
and the restart data block is:
BEGIN RESTART DATA
DATABASE NAME = RS.out
AT TIME 0.0 INTERVAL = 0.25E-3
RESTART TIME = 1.0E-3
END RESTART DATA
For this second run, restart information is read from the RS.out file(s) at time \(1.0 \times 10^{-3}\). Then, these files are overwritten with new restart data generated starting at time \(1.0 \times 10^{-3}\) at intervals of \(0.25 \times 10^{-3}\).
This pattern will continue for all analyses run in the set of restart runs.
9.9.1.5. Recovering from a Corrupted Restart
When performing an analysis during a sequence of restart runs, it is sometimes possible that a restart file becomes corrupted (due to a system crash for example). If the RESTART=AUTO option is exercised, restart will detect the corrupted entry and find the closest previous valid entry which can be used for restart.
A manual recovery can also be performed using the RESTART TIME command. Using this method, the user must select a restart time before the file corruption occurred. The INPUT DATABASE NAME and OUTPUT DATABASE NAME command lines should be used to avoid overwriting previous restart files. Additionally, the filenames should also be changed in the results and history command blocks to avoid overwriting those files.
9.9.2. Component Separator Character
COMPONENT SEPARATOR CHARACTER = <string>character|NONE
The COMPONENT SEPARATOR CHARACTER command can be used in the same manner it is used in the exodus output command block. See Section 9.3.1.7 for more details about this command.
9.9.3. Specifying Time Steps for Output
The same output step control commands available for results output are also available for restart output. See Section 9.3.1.9.
9.9.4. Reducing Restart File Size
As stated in the introduction, the amount of data written for restart output is significant; therefore, restart files can become large. The commands in this section may be used to limit restart file size. These commands may be used individually or collectively.
9.9.4.1. Overlay Count
OVERLAY COUNT = <integer>overlay_count
The OVERLAY COUNT command specifies the number of restart output times that will be overlaid on top of the current step before advancing to the next step. For example, suppose the overlay_count parameter \(n_o = 2\), and restart data is set to be written every \(\Delta t_{s} = 0.1\) second. At time 0.1 seconds, restart step 1 will be written to the output restart database. At time 0.2 seconds, restart information will be written over the step 1 information, which originally contained restart information at 0.1 seconds; similarly, step 1 information will once again be overwritten at time 0.3 seconds. At time 0.4 seconds, step 1 will now be complete and data will be written to step 2 At time 0.5 seconds, restart information will be written over the step 2 information, which originally contained information at 0.4 seconds. This process then continues throughout the simulation.
9.9.4.2. Cycle Count
CYCLE COUNT = <integer>cycle_count
FILE CYCLE COUNT = <integer>file_cycle_count
The CYCLE COUNT command line specifies the number of restart steps that will be written to the output restart database before previously written steps are overwritten. For example, suppose the cycle_count parameter \(n_c = 5\), and restart data is set to be written every \(\Delta t_{s} = 0.1\) second. The restart system writes information to the output restart database at times \(0.1,\, 0.2,\, \ldots,\, 0.5\) seconds. The information at step 1 (originally written at time 0.1) is overwritten with information at time 0.6. At time 0.7, step 2 is overwritten, etc. At time 0.8, the output restart database will contain information at times 0.6, 0.7, 0.8, 0.4, and 0.5 seconds—time will not necessarily be monotonically increasing on a restart database that uses CYCLE COUNT. If it is desired to only keep the last step available on the output restart database, CYCLE_COUNT should be set to 1.
9.9.4.3. Example
As stated previously, the CYCLE COUNT and OVERLAY COUNT command lines can be used at the same time. The following example is used to explain that scenario:
It is assumed the overlay count \(n_{o} = 2\) and the cycle count \(n_{c} = 5\). Information is written to the output restart database time step every \(\Delta t_s = 0.1\) second. The times at which information is written to the output restart database are \(t_n = n \Delta t_s,\, n = 1,\, 2,\,\ldots,\,N\) seconds, where \(n\) is an output step and N is the number of steps. The overlay command will result in information at times \(t_i = i(n_{o}+1) \Delta t_s,\, i = 1,\,2,\,\ldots,\,n_{c}\) seconds written at steps \(i\) on the output restart database—in the example, times 0.3, 0.6, 0.9, 1.2, and 1.5, at steps 1, 2, 3, 4, and 5. For \(t_i > n_c(n_o+1)\Delta t_s\), or 1.5 seconds, the cycle command will take effect because \(n_{c} = 5\) different time steps are currently written on the output restart database. The overlay procedure then resumes at time \(t = (1+n_c(n_o+1))\Delta t_s\): information at subsequent times 1.6, 1.7, and 1.8 seconds overwrite the information at step 1, which previously held information at time 0.3 second. Information at times 1.9, 2.0, and 2.1 seconds overwrites the information at step 2, etc. For any output step \(n\), its corresponding restart output database step number \(n_{s}\) is given by
where \(i \mod j\) represents the modulo operator, or the remainder of \(i/j\). The quotient is evaluated using integer arithmetic (any fractional remainder is discarded). For example, if \(n=4\), \(n_c=5\), and \(n_o=2\), then \((4-1) \mod 5(2+1) = 3\) and \(n_{s} = 1 + 3/3 = 2\).
The FILE CYCLE COUNT command is related to CYCLE COUNT; using this option, each restart dump will be written to a separate file suffixed with A, B, etc., where the count specifies how many separate files are written before the cycle repeats. For example, if FILE CYCLE COUNT = 3 would cycle through file-A.rs, file-B.rs, file-C.rs, file-A.rs, etc. The maximum value for the cycle count is 26.
9.9.5. Synchronize Output
SYNCHRONIZE OUTPUT
The SYNCHRONIZE OUTPUT command can be used in the same manner it is used in the exodus output command block. See Section 9.3.1.10 for more details about this command.
9.9.6. Use Output Scheduler
USE OUTPUT SCHEDULER <string>scheduler_name
The USE OUTPUT SCHEDULER command can be used in the same manner it is used in the exodus output command block. See Section 9.3.1.11 for more details about this command.
9.9.7. Write Restart If System Error Encountered
OUTPUT ON SIGNAL = <string>SIGALRM|SIGFPE|SIGHUP|SIGINT|
SIGPIPE|SIGQUIT|SIGTERM|SIGUSR1|SIGUSR2|SIGABRT|
SIGKILL|SIGILL|SIGSEGV
The OUTPUT ON SIGNAL command line is used to initiate the writing of a restart file when the computer system encounters an error. Only one error type in the list of error types should be entered for this command line. These system errors cause the code to terminate before the code can add any current restart output (restart output past the last restart output time step) to the restart file. If the code encounters the specified error type during execution, a restart file will be written before execution is terminated. This command line can also be used to force the writing of a restart file at some point during execution of the code. Suppose the command line
OUTPUT ON SIGNAL = SIGUSR2
is included in the input file. While the code is running, a user can execute (from the keyboard) the system command line
kill -s SIGUSR2 <pid>
to terminate execution and force the writing of a results file. In the above system command line, pid is the process identifier, an integer.
The most useful application of the command line is to send a signal via a system command line to write a restart file. The OUTPUT ON SIGNAL command line is primarily a debugging tool for code developers.
9.9.8. Restarting an Implicit/Explicit Analysis
Restarting from an implicit analysis directly into an explicit analysis (or vice versa) is not expected to work correctly; Sierra/SM uses different default settings for the two simulation types, and internal modifications are required when transitioning from a simulation which does not consider mass/inertial effects (implicit quasistatics) to one that does (explicit dynamics), or vice versa. To start an explicit simulation from a restart file generated by an implicit simulation, for example, the following procedure should be followed:
Suppose the input file used to generate the implicit restart file is called implicit.i, and the file that contains the input commands for the explicit simulation is called explicit.i. Before restarting into the explicit simulation, the contents of the adagio procedure in implicit.i must be copied and pasted into the procedure before the presto procedure in explicit.i – the procedures should have different names. If using the RESTART=AUTO option, the restart block in the implicit procedure may remain unchanged. If manually specifying the time at which to restart, the RESTART TIME should be set to the termination time of the implicit procedure (prior time, if desired). A procedural transfer block should then be added to the adagio procedure performing the explicit dynamics portion of the run. The simplest and most conventional procedural transfer block for this purpose is as follows:
BEGIN PROCEDURAL TRANSFER trans
INCLUDE ALL BLOCKS
END
For more information on procedural transfers, see Section 11.2. This pattern should always be followed when restarting from different analysis types (e.g., Implicit Quasistatics to Implicit Dynamics, Explicit Dynamics to Implicit Quasistatics, Explicit Dynamics to Implicit Dynamics, etc.).