6.1. Running Aria

6.1.1. Loading the Sierra Module

Once you have finished setting up an input file, you are ready to run Aria. From a CEE or HPC UNIX environment at Sandia, you can load the sierra module to access the latest release of Aria.

$ module load sierra

This will load the current release of Sierra. To load other versions of Sierra, you can use the following modules

  • module load sierra/x.x - Load version x.x (e.g. module load sierra/5.10)

  • module load sierra/sprint - Load the latest sprint release (every 3-week release)

  • module load sierra/daily - Load the daily build of Sierra.

Warning

Using the sierra/daily module exposes you to potential bugs and instabilities since it is actively developed. If the nightly Sierra build process fails, the sierra/daily executable may not exist, or may be much older than expected.

To see a list of the available Sierra versions (and other useful modules available like apps/anaconda3 or apps/matlab) you can use

$ module avail

6.1.2. Running Aria Locally

To run a job on a non-queued system (e.g. a CEE blade or a cee compute machine) you can call launch or mpirun. For example, to run a job on 4 processors, you would use

$ module load sierra
$ launch -n 4 aria -i demo.i

The launch command is usually equivalent to using mpirun for local execution, but handles setting required MPI flags when running on some HPC systems.

$ module load sierra
$ mpirun -np 4 aria -i demo.i

6.1.3. Using the Sierra Script

The sierra script included in the sierra modules provides some additional functionality for launching sierra jobs. Its use is similar to the launch and mpirun commands.

$ sierra --np 4 aria -i demo.i

By default, the sierra script will perform extra steps that are not necessary for running Aria. These include:

  • Reading the input file to find the mesh file and running decomp on it beforehand.

  • Reading the input file to find the output files and running epu on them after the simulation is done.

Aria will automatically decompose your mesh, which is usually faster than manually running decomp, and most visualization tools can view decomposed output files so running epu to combine them into a single file is usually not needed either. To use the sierra script without invoking those steps, add the --run option.

$ sierra --run --np 4 aria -i demo.i

The sierra script can also be used to build user plugins using the --make option. See the sections on user plugins and user subroutines for more information.

$ sierra --make aria -i usersub_demo.i

6.1.4. Running Aria on an HPC

The HPC systems at Sandia use slurm to schedule jobs. To run a job on an HPC you need to submit it to the slurm system along with some additional information (listed below) and the system will put your job in the queue and run at some point later in the future.

  • The number of compute nodes to run on.

  • The number of cores to use per node.

  • The wall-clock duration of the job (slurm will kill the job if it’s not done by this time limit).

  • A WCID for the job based on the project funding it. The WCID is used for tracking purposes and also determines the job priority. Use the WC Tool web site to check your WCIDs or get a new one.

  • Which queue to submit to (most HPCs have “batch”, “short”, and “long” - “batch” is the standard).

Refer to the HPC homepage for details about queue limits, core counts per node, and other useful HPC information for the machine you intend to run on.

If you are using SAW to submit your jobs, it can handle collecting the required information and submitting the job to the queue. If you do not use SAW, you can submit your jobs manually using either a batch script or the sierra script.

For simple job submissions you can use the sierra script directly after logging in to the HPC you want to run on. For example, to run Aria on 360 processors for up to 24 hours you would log in to the HPC and run the following command.

$ sierra --run --np 360 aria -i demo.i --account WCID --queue-name batch --time-limit 24:00:00

For more complicated submissions, you may need to prepare a batch script to perform any custom pre-processing steps you need. There are example submission scripts for different platforms in /projects/samples at Sandia. An example script to run Aria is shown below.

#!/bin/bash
#SBATCH --nodes=10                    # Number of nodes
#SBATCH --ntasks-per-node=36          # Number of cores per node
#SBATCH --time=24:00:00               # Wall clock time (HH:MM:SS)
#SBATCH --account=PUT_YOUR_WCID_HERE  # WC ID
#SBATCH --job-name=test               # Name of job
#SBATCH --partition=batch             # partition/queue name: short or batch

nodes=$SLURM_JOB_NUM_NODES
cores=36

# do any pre-processing steps you need

mpiexec --bind-to core --npernode $cores --n $(($cores*$nodes)) aria -i demo.i

If you saved the above script as run_aria then you would submit it to the queue using

$ sbatch run_aria

You can check on the status of your queued jobs using squeue -u myusername. To see an estimate of when they will start, use squeue -u myusername --start.

6.1.5. Running MPMD Jobs

The command to run in MPMD mode is different from what is used to run Aria normally. To run Aria cases you simply run a parallel job of Aria using something discussed above e.g.

$ mpirun -np 100 aria -i input.i

In this case one executable, Aria, is using 100 cores. With MPMD runs you are launching two separate MPI jobs with two different codes that can communicate. An example MPMD launch command would look like

$ mpirun -np 100 aria --mpmd_master --mpi_color 9999 \
    -i input.i : -np 100 follower_app [follower_app args...]

The mpmd_master flag indicates that Aria will be controlling the execution (the “leader”) of the simulation while the other application “follows” Aria. Aria can be run as follower by omitting this flag e.g.

$ mpirun -np 100 aria --mpi_color 9999 \
    -i input.i : -np 100 leader_app [leader_app args...]

Note that the order of the two apps does not matter. The mpi_color argument specifies an id to label the cores executing Aria with to distinguish them from the cores executing the coupled application. This can be any integer, and must be unique across the coupled apps.

There is no requirement that the two codes use the same number of cores, so depending on the mesh and computational costs you may choose a different allocation per code. For example, if your coupled app is very expensive, you may allocate more cores to it than Aria:

$ mpirun -np 50 aria --mpmd_master --mpi_color 9999 \
    -i input.i : -np 150 other_app [other_app args...]

Special care must be taken when submitting MPMD jobs on the HPCs or any queued environment. By default, the two MPMD codes cannot share cores so to launch the case above on an HPC you would need to request an allocation of 200 cores. This is unnecessarily wasteful though if the coupled app would not be using its 150 cores while Aria runs, and Aria would not be using its 50 cores while the coupled app. To get around this, you must enable oversubscription. For couples where the two apps run sequentially (never executing at the same time) you can allow them to share resources. To get an allocation of 100 cores and use them all for both codes you must add additional mpi flags:

$ mpiexec --oversubscribe --bind-to core:overload-allowed -np 100  \
    aria --mpmd_master --mpi_color 9999 -i input.i : \
    --bind-to core:overload-allowed -np 100 other_app [other_app args...]

Keep in mind that the specific command to use can be platform dependent. A more complete example submission script on an HPC may look like

#!/bin/bash

#SBATCH --nodes=10
#SBATCH --time=48:00:00
#SBATCH --account=PUT_YOUR_WCID_HERE
#SBATCH --job-name=aria
#SBATCH --partition=batch

nodes=$SLURM_JOB_NUM_NODES
cores=36

module load sierra
export OMPI_MCA_rmaps_base_oversubscribe=1
mpiexec --oversubscribe                                        \
  --bind-to core:overload-allowed                              \
  --npernode $cores --n $(($cores*$nodes)) aria --mpmd_master  \
  --mpi_color 9999 -i input.i :                                \
  --bind-to core:overload-allowed                              \
  --npernode $cores --n $(($cores*$nodes)) other_app [other_app args...]

Contact sierra-help@sandia.gov if you need more help or encounter issues running MPMD jobs.