Tutorial (OAR primer)

Note

The source code of the examples are located in /soft/igrida/examples.

In this OAR tutorial section, you will experiment how to launch:

  1. an interactive job using 1 core (default)
  2. an interactive job using n cores
  3. an interactive job on 2 compute nodes (useful for MPI development & debugging, not for production!)
  4. a batch job using one single core
  5. same as before, but in best-effort mode
  6. an interactive parallel run on 1 compute node, using MPI (useful for development & debugging, not for production!)
  7. an interactive parallel run on 2 compute node, using MPI (idem)
  8. a batch parallel run on 2 compute node, using MPI (appropriate for production jobs)
  9. a simple job array of multi-threaded runs, using OpenMP
  10. a parametric job array of sequential runs

Please take notice of some useful OAR features that are not yet covered in this tutorial (see reading advices below):

  • job containers (useful e.g. for organizing training sessions)
  • moldable jobs
  • etc.

Interactive job using 1 core

To launch an interactive job using 1 core (default), just type:

> oarsub -I
[ADMISSION RULE] Modify resource description with type constraints
[ADMISSION RULE] Set default walltime to 10 minutes.
[ADMISSION RULE](15) stdout : /temp_dd/igrida-fs1/scampion/OAR_%jobid%.stdout
[ADMISSION RULE](15) stderr : /temp_dd/igrida-fs1/scampion/OAR_%jobid%.stderr
Generate a job key...
OAR_JOB_ID=17036
Interactive mode : waiting...
Starting...

Connect to OAR job 17036 via the node igrida05-01.irisa.fr

You are automatically brought to one interactive node where you only have access to one core of the node. In the informations echoed by oarsub, the default walltime is indicated (in minutes, as configured by default on the cluster).

To then visualize and monitor your jobs, please have a look at the following links:

  • The Monika web page. This page provides an overview of the cluster load (compute nodes). Running jobs are highlighted, whereas unused resources are indicated as “Free”. You can recognize your own job ID in one of the highlighted cluster cells. More informations are printed by clicking on this cell.
  • The DrawGantt web page. This page gives an overview of all submitted jobs, along with their estimated scheduling time (for jobs not yet running). The current time is indicated with a vertical red line. You must find your own one core interactive job somewhere on this Gantt diagram, and some details are provided by placing the mouse cursor over any depicted job. Note that each line represents one individual core in this Gantt diagram. These lines are grouped by CPU, and CPUs are grouped by hostnames.

To terminate your interactive job, just type exit in your terminal window:

> exit
Connection to igrida05-01.irisa.fr closed.
Disconnected from OAR job 17036

Interactive job using n cores

You may proceed similarly to reserve n cores for an interactive job. If n is big enough, the batch system will reserve cores on several nodes for you. For instance, to reserve 6 cores,

oarsub -I -l core=6
...
...
igrida02-09%

Warning

In this command, you do NOT require the 6 cores to be on the same node!

When the system attributes resources which are distributed on several nodes, you are interactively brought on one of these nodes. The detailed list of your 6 cores may be obtained by typing:

igrida02-09%cat $OAR_NODEFILE
igrida02-09.irisa.fr
igrida02-09.irisa.fr
igrida02-10.irisa.fr     --> note here that the 6 allocated cores
igrida02-10.irisa.fr         are NOT on the same node!
igrida02-10.irisa.fr
igrida02-10.irisa.fr

You can connect to one any the other nodes listed in this file with the oarsh command. Doing this, note that you don’t need to give any password, since ssh keys are automatically installed by OAR over your job lifetime:

oarsh igrida02-10
...
exit #to leave this ssh connection

Have a look at the Monika and DrawGantt pages (do not forget to refresh them) to visualize your new interactive job.

If you want 10 cores on the same server (most usual situation!), use this command:

oarsub -I -l /nodes=1/core=10

Warning

Please always remember to leave any OAR interactive job just by typing exit !

exit

to leave your interactive session.


Interactive job on 2 compute nodes

Useful for MPI development & debugging, not for production!!!

Staying logged into the front-end node, you can now try to reserve 2 full compute nodes for an interactive job (typically for MPI debugging):

oarsub -I -l /nodes=2
...
...
cat $OAR_NODEFILE

Do not forget to leave this interactive job with the simple exit command:

exit

Batch job using one single core

Let us now launch a first batch job, on a single core.

As a good practice, we recommand you to create a SCRATCHDIR directory, located in /temp_dd/igrida-fs-1/..., where the run will take place. In particular, your run outputs shoud never be written to your home directory, which would necessarily drive to a NFS crash at some time. For instance, type:

mkdir -p /temp_dd/igrida-fs1/$USER/SCRATCH/

and create the SCRATCHDIR variable in your environment files. For example, put the line:

setenv SCRATCHDIR /temp_dd/igrida-fs1/$USER/SCRATCH

in your $HOME/.cshrc_perso (in the IRISA network). For this new environment variable to be taken into account, just type:

source $HOME/.cshrc

An equivalent brute force way to proceed is to leave your front-end session (with the exit command) and connect again to igrida. Anyway, to then check that this new variable was properly taken into account:

echo $SCRATCHDIR

Then copy the following first-job-with-oar.sh example script somewhere in your home directory, and substitute my_login by your own LDAP identifier:

#!/bin/sh
#OAR -l core=1,walltime=00:05:00
#OAR -O /temp_dd/igrida-fs1/my_login/SCRATCH/fake_job.%jobid%.output
#OAR -E /temp_dd/igrida-fs1/my_login/SCRATCH/fake_job.%jobid%.error
set -xv

echo
echo OAR_WORKDIR : $OAR_WORKDIR
echo
echo "cat \$OAR_NODE_FILE :"
cat $OAR_NODE_FILE
echo

echo "
##########################################################################
# Where will your run take place ?
#
# * It is NOT recommanded to run in $HOME/... (especially to write), 
#   but rather in /temp_dd/igrida-fs1/...
#   Writing directly somewhere in $HOME/... will necessarily cause NFS problems at some time.
#   Please respect this policy.
#
# * The program to run may be somewhere in your $HOME/... however
#
##########################################################################
"

TMPDIR=$SCRATCHDIR/$OAR_JOB_ID
mkdir -p $TMPDIR
cd $TMPDIR

#EXECUTABLE=$HOME/some/where/my_progam.exe

echo "pwd :"
pwd

echo
echo "=============== RUN ==============="

#-- FAKE RUN EXECUTION
echo "Running ..."
sleep 60   # fake job, 1 minute

#-- FAKE RUN OUTPUTS
cat > my_program_summary.out <<EOF
For example, some short solver statistics are summarized here.
1.e-10
1.e-13
1.e-14
1.e-16
Converged
EOF

echo "Done"
echo "==================================="

#-- ECHO SOME SUMMARY OUTPUTS OF THE RUN IN THE ***.output FILE
echo
echo "cat my_program_summary.out"
echo "---------------------"
cat my_program_summary.out
echo "---------------------"
echo
echo OK

Make this script executable:

chmod u+x first-job-with-oar.sh

Then, your are ready to submit the job to OAR in batch mode:

oarsub -S first-job-with-oar.sh
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=1679

To check your job status,

oarstat

or with more verbose outputs:

oarstat -f

To watch your run progress, you may look at the job output file:

tail -n 100 -f $SCRATCHDIR/fake_job.nnnn.output

where you need to susbtitute nnnn by the OAR_JOB_ID echoed by your previous oarsub command.

Note that the “-f” option causes tail to not stop when end of file is reached, but rather to wait for additional data to be appended to the file. Therefore, you need to type Control-C in the terminal window to finish this command.

In a similar manner, you can have a look at the job error file:

cat $SCRATCHDIR/fake_job.nnnn.error

Refresh your Monika and DrawGantt web pages to visualize your batch job scheduling.

In case you want to delete your job, just type:

oardel nnnn

with nnnn as above.

Same as before, but in best-effort mode

To use the best effort mode (can use a lot of resources, but can be killed at anytime, see 3. Best effort computing), use:

oarsub -t besteffort -t idempotent -S first-job-with-oar.sh

Interactive parallel run on 1 compute node, using MPI

Useful for development & debugging, not for production!

Let’s now launch a simple interactive parallel run on 1 node using MPI.

oarsub -I -p "infiniband='YES'" -l nodes=1

Copy the following hello-world-mpi.c example (wikipedia example).

/*
 *   "Hello World" MPI Test Program
 *
 */
 #include <mpi.h>
 #include <stdio.h>
 #include <string.h>
 
 #define BUFSIZE 128
 #define TAG 0
 
 int main(int argc, char *argv[])
 {
   char idstr[32];
   char buff[BUFSIZE];
   int numprocs;
   int myid;
   int i;
   MPI_Status stat;
 
   MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N' processes exist thereafter */
   MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD world is */
   MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */
 
   /* At this point, all programs are running equivalently, the rank distinguishes
 *       the roles of the programs in the SPMD model, with rank 0 often used specially... */
   if(myid == 0)
   {
     printf("%d: We have %d processus\n", myid, numprocs);
     for(i=1;i<numprocs;i++)
     {
       sprintf(buff, "Hello %d! ", i);
       MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);
     }
     for(i=1;i<numprocs;i++)
     {
       MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);
       printf("%d: %s\n", myid, buff);
     }
   }
   else
   {
     /* receive from rank 0: */
     MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
     sprintf(idstr, "Processus %d ", myid);
     strncat(buff, idstr, BUFSIZE-1);
     strncat(buff, "reporting for duty\n", BUFSIZE-1);
     /* send to rank 0: */
     MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);
   }
 
   MPI_Finalize(); /* MPI Programs end with MPI Finalize; this is a weak synchronization point */
   return 0;
 }

Load our MPI environment module

module load openmpi

To compile and link your code with the MPI library, do NOT directly use the cc compiler, but rather the mpicc wrapper instead:

mpicc ./hello-world-mpi.c -o hello-world-mpi

To now run your code using MPI, proceed through the following steps.

Have a look at the nodes which OAR reserved for you:

cat $OAR_NODEFILE

You may attempt to run your code as usually by typing:

./hello-world-mpi

This works, but such a call does NOT use the MPI library however, and just runs the executable in a sequential way. To run with MPI, you must use the mpirun wrapper.

For example, in your current interactive session, you have reserved one node (4 cores). You may just run using 2 processes e.g.: you will only be using 2 cores then, the 2 others remain idle:

mpirun -np 2 ./hello-world-mpi

To fully use your 4 cores, it is therefore recommanded, in this case, to type:

mpirun -np 4 ./hello-world-mpi

You may even ask for more processes than the number of available cores:

mpirun -np 10 ./hello-world-mpi

Note that in that case, your 10 processes will concurrently run on the 4 cores of the node, and you would have bad performances in a real situation.

Now leave your interactive job:

exit

Interactive parallel run on 2 compute node, using MPI (idem)

Let’s now launch the same MPI example on 2 nodes, interactively (useful for debugging, not for production). Note that the allowed wallclock for interactive sessions is quite short. To reserve 2 nodes for an interactive session, type:

oarsub -I -l nodes=2 -p "infiniband='YES'"

and then:

module load openmpi
mpirun -machinefile $OAR_NODEFILE ./hello-world-mpi

Do not worry about the warning messages. Your code has indeed run on 8 cores, and over 2 nodes.


Batch parallel run on 2 compute node, using MPI

Appropriate for production jobs

Let’s now launch the previous job in batch mode instead of an interactive session (more appropriate for production jobs). Copy the following second-job-with-oar.sh script example somewhere in your home directory:

#!/bin/sh
#OAR -l nodes=2,walltime=00:03:00
#OAR -O /temp_dd/igrida-fs1/my_login/SCRATCH/test_mpi_2nodes.%jobid%.output
#OAR -E /temp_dd/igrida-fs1/my_login/SCRATCH/test_mpi_2nodes.%jobid%.error
set -xv

TMPDIR=$SCRATCHDIR/$OAR_JOB_ID
mkdir -p $TMPDIR
cd $TMPDIR

echo $OAR_NODEFILE :
cat $OAR_NODEFILE
echo

EXECUTABLE=$OAR_WORKDIR/hello-world-mpi  # or put absolute path

cd $TMPDIR

echo "============= MPI RUN ============="
mpirun --mca plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE $EXECUTABLE
echo "==================================="

echo OK

Substitute my_login by your own LDAP identifier, and make the script executable by typing:

chmod u+x second-job-with-oar.sh

and then submit your job in batch mode (i.e. with the “-S” option of the oarsub command):

oarsub -S second-job-with-oar.sh

Check the output file as previously:

cat $SCRATCHDIR/test_mpi_2nodes.nnnn.output

where nnnn denotes the job ID echoed by the oarsub command.


Simple job array of multi-threaded runs, using OpenMP

To conclude this “First steps” section, let’s play with job arrays. Two types of job arrays are available within OAR, either simple or parametric job arrays.

Assume you want to run a simple job array of multi-threaded runs. At first, copy the following hello-world-omp.c source code:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
 
int main (int argc, char *argv[]) {
  int th_id, nthreads;
  #pragma omp parallel private(th_id)
  {
    th_id = omp_get_thread_num();
    printf("Hello World from thread %d\n", th_id);
    sleep(60);
    #pragma omp barrier
    if ( th_id == 0 ) {
      nthreads = omp_get_num_threads();
      printf("There are %d threads\n",nthreads);
    }
  }
  return EXIT_SUCCESS;
}

and compile it:

cc hello-world-omp.c -fopenmp -o hello-world-omp

Copy the associated array-job-with-oar.sh job script, substitute my_login and make the shell script executable (just as done before):

#!/bin/sh
#OAR -l core=2,walltime=00:05:00
#OAR --array 10
#OAR -O /temp_dd/igrida-fs1/my_login/SCRATCH/array_job.%jobid%.output
#OAR -E /temp_dd/igrida-fs1/my_login/SCRATCH/array_job.%jobid%.error
set -xv

echo
echo OAR_WORKDIR : $OAR_WORKDIR
echo
echo OAR_JOB_ID : $OAR_JOB_ID
echo
echo "cat \$OAR_NODE_FILE :"
cat $OAR_NODE_FILE
echo

TMPDIR=$SCRATCHDIR/$OAR_JOB_ID
mkdir -p $TMPDIR
cd $TMPDIR

EXECUTABLE=$OAR_WORKDIR/hello-world-omp   # you may put the absolute path here

echo "pwd :"
pwd

echo
echo "=============== RUN ==============="
echo "Running ..."
export OMP_NUM_THREADS=2
$EXECUTABLE
echo "Done"
echo "==================================="
echo

cat > my_program_summary.out <<EOF
Job array, current job with OAR_JOB_ID $OAR_JOB_ID
Each job of this array ran on 2 cores (using OpenMP)

For example, some short solver statistics are summarized here.
1.e-10
1.e-13
1.e-14
1.e-16
Converged
EOF

#-- ECHO SOME SUMMARY OUTPUTS OF THE RUN IN THE ***.output FILE
echo
echo "cat my_program_summary.out"
echo "---------------------"
cat my_program_summary.out
echo "---------------------"
echo
echo OK

Note, in this file, the following lines, with:

  • the OAR number of cores and the OAR array directive,
  • the definition of the EXECUTABLE variable, pointing to the multi-threaded program,
  • the OMP_NUM_THREADS variable (which must be coherent with the number of cores you asked for),
  • the OAR_JOB_ID echoed in the output file (corresponding to the current task of the job array)
#OAR -l core=2,walltime=00:05:00
#OAR --array 10
...
EXECUTABLE=$OAR_WORKDIR/hello-world-omp
...
export OMP_NUM_THREADS=2
...
cat > my_program_summary.out <<EOF
Job array, current job with OAR_JOB_ID $OAR_JOB_ID
Each job of this array ran on 2 cores (using OpenMP)
...

We are now ready to launch an array of 10 jobs, each of them being multi-threaded, and using 2 cores. To proceed, submit your job as before:

igrida02-02% oarsub -S ./array-job-with-oar.sh
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
Generate a job key...
OAR_JOB_ID=1813
OAR_JOB_ID=1814
OAR_JOB_ID=1815
OAR_JOB_ID=1816
OAR_JOB_ID=1817
OAR_JOB_ID=1818
OAR_JOB_ID=1819
OAR_JOB_ID=1820
OAR_JOB_ID=1821
OAR_JOB_ID=1822
OAR_ARRAY_ID=1813

Notice that each task of the array has an OAR_JOB_ID, exactly like standard jobs we’ve seen before, but the array itself is identified by an additional OAR_ARRAY_ID variable (corresponding with the OAR_JOB_ID of the first task). To get specific information about your array, you can use the –array option of oarstat:

oarstat --array

Refresh your Monika and DrawGantt web pages to monitor your job life cycle.


Parametric job array of sequential runs

As a last use case, we consider you want to run a parametric job array of sequential runs. By using parametric jobs you’ll create as many jobs as parameter lines in your file, and each parameter line will be handled as arguments of your job executable (see this discussion).

To create a sample parameter file, you can type:

cat > $SCRATCHDIR/param-file.txt <<EOF
# this is a parameter file to be used within a parametric job array
#
#==> a subjob with one single parameter
100
#==> a subjob without parameters
""
#==> a subjob with a multiple parameters
12.34 44.67 3.14 -10.56
EOF

Now copy the following program-with-arguments.c source code:

#include <stdio.h>

main(int argc, char** argv)
{
    int i;
    printf("Program runs with the following arguments\n");
    printf("Nb. of arguments = %d\n", argc-1);
    for (i = 1; i < argc; i++)
     	printf("argv[%d] = \"%s\"\n", i, argv[i]);
    sleep(60);
}

and compile it:

cc ./program-with-arguments.c -o program-with-arguments

Now copy the array-parametric-job-with-oar.sh script, substitute my_login, make the script executable:

#!/bin/sh
#OAR -l core=1,walltime=00:05:00
#OAR --array-param-file /temp_dd/igrida-fs1/my_login/SCRATCH/param-file.txt
#OAR -O /temp_dd/igrida-fs1/my_login/SCRATCH/param_array_job.%jobid%.output
#OAR -E /temp_dd/igrida-fs1/my_login/SCRATCH/param_array_job.%jobid%.error
set -xv

TMPDIR=$SCRATCHDIR/$OAR_JOB_ID
mkdir -p $TMPDIR
cd $TMPDIR

EXECUTABLE=$OAR_WORKDIR/program-with-arguments   # you may put the absolute path here

echo
echo "=============== RUN ==============="
echo "Running ..."
$EXECUTABLE $*
echo "Done"
echo "==================================="

You are ready to submit:

igrida02-02% oarsub -S ./array-parametric-job-with-oar.sh
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
Generate a job key...
Generate a job key...
OAR_JOB_ID=1852
OAR_JOB_ID=1853
OAR_JOB_ID=1854
OAR_ARRAY_ID=1852

Refresh your web monitoring tools, and have a look at the job output and error files to see what happens.


After these first steps, you shoud be ready to proceed further and run your own executable programs, either in sequential or in parallel, and using several cluster nodes if necessary. Some reading advices are indicated below, which will help you to get deeper insight in the OAR functionalities and internal mechanisms. At first, we recommand you to watch the “OAR presentation” movie (40 minutes, in french) for an overview of the OAR tool.

Further reading

To become more familiar with the OAR batch system, please visit the official OAR website . We particularly recommand you to have a look at the following documents: