How Job Scripts Work
Initially you may not want to understand how a script does what it does, you may be content to just copy one of the example scripts and make minor changes to filenames to suit your needs. If so skip to these now. However, after a while you might find it useful to have some understanding of the basic anatomy of a script, and in particular, how the PBS batch system reads instructions about your job and what resources it needs.
Anatomy of a script
As a first example we'll look at a very simple script. All this does is wait for 60 seconds before writing "Hello There!" to the job output file. This just has the following four lines
The very first line looks meaningless to the uninitiated, but is very important to the system:
#!/bin/bash
This line tells the job to use the BASH shell to interpret any commands used later in the script. (If you don't know anything about command shells don't worry, just accept the recipes we give you for now.)
The next line is an example of a comment which provides information but is not executed:
# This script waits for 1 minute before writing a greeting
Anything following a # symbol on a line is a comment. In this case the comment describes what the script does. The first command in the job is on the next line:
sleep 60
The sleep command tells the job to wait and do nothing for the number of seconds given in the argument before executing the command on the next line:
echo "Hello There!"
This prints the message "Hello there" to the job output file.
As a more realistic example we'll take the template for a Matlab job script.
The first line sets the BASH shell as before, and this is followed by a number of comments. In particular, there are some interesting comments that starts with #PBS. These are PBS directives. Although they are not executed inn the job itself, these are noticed by the PBS system, when you submit the job with the qsub command, and used to pass infomation to the batch system about your jobs requirements. The first such PBS directive is:
#PBS -S /bin/bash
This is another way to specify that the BASH shell should be used, but it has an extra effect of ensuring that your user environment is set up correctly in this shell (there are reasons why it's worth using both methods which we won't bother you with).
The second PBS directive is more interesting:
#PBS -l walltime=4:00:00
This tells the batch system that your job requires a maximum of 4 hours 0 minutes and 0 seconds of time to run (as seen by a clock on the wall). You can over-rule this directive when you submit the job by specifying a different time, eg. with qsub -l walltime=6:30:00 run_matlab but it provides a convenient way of specifying a default time and avoids a lot of excessive typing if you re-use the script several times, and even better saves you having to remembering what to type .
The first line that isn't a comment or directive in the script is the following:
module load matlab
This command is a convenient way of setting up your user environment so that it how to find and run Matlab. There is more infomation about modules on the Environment modules page, but all you probably need to know for now is how to load the module you want.
There is one more potentially useful line before we actually run Matlab:
cd $PBS_O_WORKDIR
This line has the effect of changing the working directory for the job to the directory from which the job was submitted. (When a job starts it assumes all the files you need are in your home directory. If in fact these are in a different sub-directory, it's much easier to change to this first so that all input and output files are easily located.) When you submit a job the batch system makes a note of which directory you were in at the time and passes this on to your job when it starts to run in the environment variable $PBS_O_WORKDIR.
Finally we can run a set of matlab commands contained in the file matlab_input with:
matlab -nojvm -nodisplay < matlab_input > outputfile
Note that the flags "-nojvm -nodisplay" are used to tell Matlab to disable the GUI interface. The "<" and ">" output symbols are also important. The "<" symbol tells an application where to find input that you would otherwise type in on the screen if you were running interactively. Similarly the ">" symbol tells an application that output that would normally go to the screen should be written to the named file instead. (These symbols apply in general to all Linux commands and applications, not just to Matlab.)
Parallel Jobs
The current examples have just used a single processor on a node - but there are 8 processor-cores per node so it would be nice to be able take advantage of these. The simplest way is to just run several instance of the same application on the node using different data sets perhaps. This makes sense if you have a lot of different instances you need to run for different parameter sets (and there is no issue with getting enough software licences - in the case of Fluent, CFX & Ansys licences are very expensive, so the number of licences an individual can use have to be restricted). An example of how to run multiple sub-jobs on a single node from within a single script will be added latter.
For some applications it is also possible to solve a single problem in parallel using multiple processors. Most CFD and structural analysis programs now offer this possibility. Often all that is required is a modification of the way the application is called to instruct it to run in parallel and specify the number of processors to be used. A good example is Fluent where you just need to add a flag specifying the number of processors to run in parallel over ( add "-t2" to your normal fluent to run command run over 2 processors). You can specify the number manually, or you can get your script to work this out for you from the number of processors you requested when you submitted the job - the example fluent script link below shows an example of this, where the number of processors is read from a shell variable $nprocs, eg.
fluent 3d -t$nprocs -g -i elbow.jou > output_file
The number of processors has been calculated earlier in the script, in the line begining "nprocs=". Don't worry too much about just how this line works - just make sure you have an exact copy.
(If you want to understand a little more, this line gets the information it needs from the contents of a file which is passed to the job when it starts. The name of the file is given by another shell variable $PBS_NODEFILE whose contents are just the names of the compute nodes which have been assigned to your job, if you've requested multiple processors on a node then the name of the node will be repeated the appropriate number of times. The rest of the line is just a way of counting how many processors have been requested in total.)
Note: parallel execution for licensed applications normally requires special special licenses and, although these are cheaper than standard licences, we will still have limited numbers of parallel licences, so access to these may be restricted. For instance we regret that Undergraduate and MSc project students are currently restricted to a maximum 4-way job.
Example Scripts
Links to example scripts for the applications will be added to progressively. Copies of these scripts and associated files are also available in the sub-directories of /local/software/examples on both Iridis and Lyceum, so you can copy them from there.
- Ludicrously simple job
- Simple job - use this to run an executable you've compiled yourself
- Matlab
- Fluent - in parallel on a single node
- CFX
- Ansys
- Starccm+
- Parallel MPI job
- R (statistics) job
- Multiple sub-jobs on a single node
- Abaqus
Copying and Running the Example jobs
The directory /local/software/examples contains examples of batch jobs for some common applications on both Lyceum and Iridis. Each example contains a README file, a sample job file and associated input files plus sample output. You can copy them to your own filestore, to experiment with, using the command copy_example.
If you just use the copy_example command on it's own it will list which examples are available. You can then use the command again to copy the desired example. For instance:
copy_example fluent
Will copy a fluent example run to your own filestore, normally to ~/fluent_example (where ~ is shorthand for your home directory). This example corresponds to the Fluent parallel processing tutorial for the mixing elbow.
Change to this directory with:
cd ~/fluent_example
To run this example as a batch job on a compute node, use the command:
qsub run_fluent
The output files produced by the job should be similar to those in the subdirectory sample_output. To adapt the example to your own jobs you need to provide your own case file and substitute your own set of fluent text commands for the elbow.jou file (refer to sec 1.3 of the Fluent User Guide initially, and then look at the Fluent Text Command List - use flu_man to access these manuals in a browser).
Taking it further
Shell scripts take a sequence of commands and execute them - so they are potentially a powerful programming environment in their own right if you use the features built into the shell. These provide variables, loops, tests and I/O. If you want to do something a bit more complicated, or just understand how some of the example scripts work, try dipping into some of the links below.
You're not constrained to just using the bash shell either, a job script could use the tcsh shell (which is the normal login in shell on Lyceum, but not Iridis, and perhaps better for interactive work). If you're familiar with perl or python then you can use these high-level scripting languages to acomplish complex tasks. Scripts written in these languages can also be submitted as PBS job scripts.

News feeds