Recent Developments on Iridis
This page lists recent developments on Iridis, prior to them being fully integrated into the main web-pages. Some of the PBS flags required to access the new facilities may seem a bit cumbersome but we plan to introduce a simpler interface later (if you use PBS directives in your scripts you can avoid having to type these flags every time you submit a job).
Login Nodes and RHEL 4.0 upgrade
All the compute nodes have now been upgraded to Red Hat Enterprise Linux 4.0 (RHEL4.0). However, the main Iridis2 login node and the alternative login node, blue14.iridis.soton.ac.uk, are still built with RHEL 3.0. Most users will not be affected by the RHEL version, but there are differences in the versions of the glibc libraries and also in the versions of the Tcl/Tk libraries that can be significant for a few. Hence, we have provided another login node, blue18.iridis.soton.ac.uk, which you can ssh to in the normal way. It is important that all users check that their codes will compile in the RHEL 4.0 environment on blue18 and that the resulting code runs satisfactorily on the compute nodes. Note that we intend to make blue18 the login node in the near future (with luck this might help with the problems with the login node hanging) so it is in your interests to check that you can compile and run your code under RHEL 4.0 as soon as possible. Note that blue18 only has 2 GB of memory at the moment - we have ordered more memory and once this is installed we will arrange for blue18 to become the default login node.
Myrinet Network
A high-performance Myrinet network has been added to 64 of the single-core compute nodes. This network is targeted at users with MPI applications (including Fluent) which have demanding inter-node communications requirements. Those applications where communications latency is currently a performance bottleneck will particularly benefit, though the increase in bandwidth will also be beneficial. Non-MPI jobs will not benefit from the Myrinet network, and some MPI jobs where the work is computation-bound rather communication bound (eg some pdns jobs) will see little benefit. If you are unsure as to whether you need to use the Myrinet network we suggest you make some comparative timings. You can get some clues as to whether your jobs are communication-bound on the standard network single-core nodes by logging in to a node on which your job is running and using the "top" command to observe the %CPU utilisation. If this is consistently below 80% then it is likely that your job would benefit from the Myrinet network. (Intensive IO will also produce reduced %CPU utilisation but this is normally temporary. On the dual-core nodes, main-memory bandwidth limitations may also reduce %CPU utilisation for some codes.) As the Myrinet nodes are a limited and valuable resource, we request that jobs which do not derive a significant performance benefit are not submitted to these nodes.
How do I compile for the Myrinet nodes?
Instead of building with the standard MPI-enabled versions of the compilers you will need to use the equivalent versions for Myrinet, which use the mpichGM communications libraries. These can be found in /local/mpichgm-1.2.6-pgi for the PGI compilers or /local/mpichgm-1.2.6 if your prefer to use the gnu compilers. eg.
/local/mpichgm-1.2.6-pgi/bin/mpif90
How do I modify my job scripts for the Myrinet nodes?
If you are running your own MPI executable then you just need to change the version of mpirun to use the version for mpichGM corresponding to that which you compiled with (Note that the name of the mpirun executable changes to mpirun.ch_gm as well). eg. the mpirun line might become:
/local/mpichgm-1.2.6-pgi/bin/mpirun.ch_gm -np $nprocs -machinefile $PBS_NODEFILE mpi_prog
If you are running Fluent then you merely need to change the network communicator with the fluent flag -pgmpi rather than -pnmpi or -pnet. (Note that it does not make sense to use the Myrinet nodes for a single node Fluent jobs, these should be run using the shared-memory communicator flag -psmpi. Note also that it may be worth experimenting with using 4 CPU-cores on the dual-core nodes for single node fluent jobs as described below.)
For convenience you may want to add a PBS directive line, given in the next section, to your script to ensure that your job runs on Myrinet nodes.
How do I ensure my jobs run on the Myrinet nodes?
Support for specifying that jobs should run on the Myrinet nodes will be simplified later, in the interim you can request that a particular job runs on the Myrinet nodes by using the flag "-Wx=NODESET:ONEOF:FEATURE:myrinet" to PBS, either directly with qsub:
qsub -Wx=NODESET:ONEOF:FEATURE:myrinet myjob
or as a directive line in the script (this must be before all executable lines!)
#PBS -Wx=NODESET:ONEOF:FEATURE:myrinet
Using the Dual-core nodes
Iridis has been augmented with 72 dual-processor, dual-core nodes. These nodes need to be requested specifically when submitting jobs to the batch system (It should be emphasised that not all codes will run equally well on dual-core nodes. See the note below.)
What are the characteristics of the dual-core nodes?
Each node has 2 processors, like the single core nodes, but each processor has 2 computational cores on the same chip. So each node has 4 computational cores. The nodes run at a slightly lower clock rate than the single core nodes (2.0 Ghz as opposed to 2.2 Ghz for the single-core nodes, on it's own this would reduce performance by around 10% but this is compensated to some extent by the nodes having faster memory. In terms of amount of memory, there is 2 GB per node - which means that there is 0.5 GB per core rather than 1 GB per core on the single-core nodes.
How do I run my jobs on the dual-core nodes?
The dual-core nodes do not yet form part of the default pool of nodes, in the interim you can request that a particular job runs on the dual core nodes by using the flag "-Wx=NODESET:ONEOF:FEATURE:switch10:switch11" to PBS, either directly with qsub:
qsub -Wx=NODESET:ONEOF:FEATURE:switch10:switch11:switch12 myjob
or as a directive line in the script
#PBS -Wx=NODESET:ONEOF:FEATURE:switch10:switch11:switch12
If you need to run multi-node jobs which use the $PBS_NODEFILE variable, then you will have to specify that there are effectively 4 processors per node. Eg. for a 2 node, 8 process job:
qsub-lnodes=2:ppn=4 -Wx=NODESET:ONEOF:FEATURE:switch10:switch11:switch12 myjob
If you want to run 4 sequential sub-jobs per node then a simple modification of the script for single-core nodes( 2 sub-jobs per node example) to add more sub-jobs should work.
Will my jobs benefit from dual-core nodes?
To make it worthwhile using the dual-core nodes, you must use all 4 CPU cores - otherwise you might as well use the standard single-core nodes. There are two questions you need to answer:
- Does it perform well?
We expect jobs using Monte-Carlo type codes to perform very well on dual-core nodes. This is because most of the data required for computation comes from cache and the data transferred to and from main-memory is modest. For some codes the bandwidth to main-memory is a major bottleneck and will limit the overall performance. The latter codes will not generally be able to make good use of all 4 CPU-cores on a node. This is because memory-bandwidth to each processor is now shared between the 2 computational cores so the memory bandwidth seen by each core is halved. For yet other codes (possibly Fluent) the situation is between the 2 extremes, so the perfomance on dual-core nodes may not be as good as on twice as many single-core nodes, but it may be acceptable - eg. it runs 20% slower perhaps. The responsibility is on the user to decide whether the dual-core nodes are suited to their code, preferably through the results of timing measurements, but we will be happy to give advice on the methodology of making such measurements.
One other case that might show an advantage is MPI jobs on a single node, where you can now use 4 CPU-cores per job rather than 2. Because all communication is internal to the node this will be much faster than if you were running 4 MPI processes over 2 single-core nodes (but you will have to be careful to make sure your problem fits into 2 GB). This may be advantageous for some single-node Fluent jobs. - Is there enough memory?
The standard dual-core compute nodes have 2 GB of memory, so if you are running 4 processes on each node then they only have an average of 0.5 GB available to each process rather than 1 GB on single-core nodes.

News feeds