Learning how to use the SANBI computing environment
SANBI has a small but powerful cluster of computers that provide a High Powered Computing environment for our users.
Prerequisites
This lesson guides you through the basics of submitting and monitoring jobs on the SANBI cluster. If you have a working knowledge of the Linux/Unix shell, you’re ready for this lesson.
If you know how to qsub scripts, you probably won’t learn a lot from this lesson.
The SANBI cluster
The design of SANBI’s cluster has two intended purposes: users should be able to run jobs without caring (much) on which computer the job runs and users’ jobs should not interfere with each other.
SANBI cluster design
Users log onto the login node named queue00 using ssh. This machine is only available from inside SANBI. If you want to access it from outside SANBI you need to use our VPN or first ssh to gate.sanbi.ac.za.
The cluster is managed by a machine called grid00. Users never log in here, and keeping this separate from queue00 means that even if users crash queue00 the cluster will keep running. The actual computing happens on machines named grid01, grid02 and so forth. The Sun Grid Engine (SGE) software on grid00 starts the jobs running on these machines, users never log into them directly.
The SANBI /cip0 filesystem
All our research data is stored in directories under /cip0. In general, no data should be stored in your home directory (/usr/people/username). On /cip0 each user has a scratch directory and a research directory named /cip0/research/username and /cip0/research/scratch/username. All work should be done in subdirectories of the scratch directory and final results stored in the research directory. Make sure that you are now in your scratch directory:
$ id -unusername$ cd /cip0/research/username $ pwd/cip0/research/username
Topics
LO summary
- Be able to create a script and run it on the SANBI cluster
- Be able to find software on the SANBI cluster
- Be able to use ‘module add’ within a script
- Be able to specify resource requirements
- Be able to monitor job status
- Be able to kill jobs using qdel
- Be able to find the standard and error output of the job
- Be able to move data to and from the cluster using scp