Usage
There are two clusters available for use. Andromeda is available for researchers, and Seawulf is available for educational use, such as classes.
Alternatives
You may also want to check out XSEDE Education Allocations. These are available for free, with an on-going review process for requests for allocations. An allocation lasts for a semester. See the site more details.
Access
The clusters are available via SSH. If you are travelling, you might need to use a VPN connection.
The recommended Windows client is Putty. In the hostname box of putty, you can put both the username and hostname as in: username@seawulf.uri.edu.
From a Mac, you can use ssh from the Terminal application (found under /Applications/Utilities). Linux users can run ssh from any terminal or console.
To access the Educational cluster, use:ssh -l username seawulf.uri.edu
Please do not leave yourself logged in when you are not using the system.
Getting Started
Here are some links to tutorials that may help you get started using the cluster environment.- Shell commands explains the general concepts for navigating the system, working with files and related commands.
- Software Carpentry has lessons on Python and R, which can be useful for pre or post processing data, as well as computation.
- HPC Carpentry has lessons on using HPC environments in general.
Data
When using this cluster for a class, some data may be pre-loaded. The instructor will provide the path to this data.
Transferring Data
Use scp/sftp to transfer data to the cluster. The recommended client for this is Cyberduck. If you are using a graphical interface for transferring data, you might want to bookmark the paths you use most frequently. See this video for setting up Cyberduck.
Submitting jobs
All jobs on the cluster are written as shell scripts and submitted using the command sbatch.
A sample script might look like:#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH --nodes=1 --ntasks-per-node=1
#SBATCH --export=NONE
./single_job
The lines starting with “#SBATCH” provide default values for parameters to the sbatch command.
- -t HH:MM:SS
- This specifies the maximum wall-clock time the process will be given before it is terminated. For array jobs this is the time for each instance, not the entire array.
- --nodes=1 --ntasks-per-node=1
- This specifies, for a single-threaded script, to allocate 1 node and 1 processor-per-node
- --export=NONE
- This tells SLURM not to pass current environment variables to your job. This is important to ensure you are loading a consistent set of modules in your script.
$ sbatch myjob.sh Submitted batch job 30484
The output of the sbatch command is the job id. This can be used to check the status of the job:
$ squeue --job 30484 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 30484 default bash user R 3:04 1 n005
sbatch Parameters
NOTE: You can always check the manual on the system for more options using the command man sbatchUseful parameters are:
- -t HH:MM:SS
- This specifies the maximum wall-clock time the process will be given before it is terminated.
- --nodes 2 --ntasks-per-node=12
- This specifies to allocate 2 nodes and 12 processors (cores) per node. For most simple programs, this should be --nodes=1. See below for how to change this effectively.)
- --cpus-per-task=4
- This specifies to allocate 4 processors (cores) of a node. For most simple programs, this should be used with a --nodes=1 parameter.
- --mem=20G
- If you need more than 6GB of RAM per node, you can request it with--mem=120G.
- --mem-per-cpu=6G
- Per-cpu version of above.
- --mail-type=BEGIN,END,FAIL
- Email you when the job starts, completes or fails. TIME_LIMIT_80 alerts when 80% of the wall time is reached. 50 and 90% are also available.
- --mail-user=username@example.edu
- Specify the email address to send notifications to. You could also put this in ~/.forward
- -o filename
- Specify where the output from the program should go (includes stderr unless -e is specified).
- -e filename
- Specify where the error output from the program should go (stderr only)
- --array=1-100%10
- Create an array job. In this case, the array indices would be 1-100, with a limit of 10 jobs running concurrently.
- --export=VAR=val
- Add an environment variables to the job.
- --export=NONE
- Start the job with a “clean” environment, not with the variables in the current shell.
- --exclusive
- Node exclusive (See below for when to use this.)
- --chdir=/start/path
- Specify a path to start working in. Normally jobs are started in $HOME. This sets $SLURM_SUBMIT_DIR. Many scripts start with cd $SLURM_SUBMIT_DIR to make the directory sbatch was run from be the directory for the rest of the script.
- --partition
- Only if you need to use a GPU, you should specify --partition=dgx
- --gres=gpu:1
- Depending on your account configuration, you can request between 1 and 8 GPUs
Additional Resources
See here for complete documentation of sbatch.Using multiple nodes
You must check if your program is capable of using multiple cores and/or multiple nodes to do computation before requesting resources with the --nodes option. If your program uses the MPI libraries, than you should be able to adjust these parameters within the constraints of that program. For example, some programs require that the number of concurrent processes is even or a multiple of some other number. If your program allows you to specify a number of threads, then you may want to set -c 56 (the number of cores in most of our nodes) and use $SLURM_CPUS_ON_NODE to pass that number to your program.
Array Jobs
If you need to run a job over several instances of data, but each instance can run independently, you might want to create an array job. Because there is some expense to creating a job, it does not make sense to use this to iterate over tens of thousands of very short jobs, and those should be batched into chunks. Here are some guidelines:
- An instance of a job should run for at least 10 minutes.
- No job array should be more than a two or three thousand instances.
- Limit the concurrent jobs in the array to 1 if using a whole node, as in: --nodes=1 --cpus-per-task=56 --array=1-100%1
- Limit the concurrent jobs in the array to 56 if using a single core: --nodes=1 --cpus-per-task=1 --array=1-100%56
scancel '123_[2-5]'
To alter the number of concurrent array tasks running, you can use scontrol update jobid=JJ ArrayTaskThrottle=NN where JJ is the JOBID and NN is the new count.
Exclusive node use
There are a couple of situations in which you might want to reserve an entire node to your job.- If your job is going to use less than the number of cores on a node, but is going to use almost almost all of the RAM on that node.
- You want to use less than the number of cores on a node, but are measuring the performance of the job and want to ensure consistent results.
Environment
While jobs are running on the computation nodes, there are environment variables available. The most important of these are below:
- SLURM_JOB_ID
- The ID of the job. You can use squeue to see the nodes allocated to your job.
- SLURM_SUBMIT_DIR
- Directory from which sbatch was called.
- SLUR_ARRAY_TASK_ID
- Unique ID assigned to this instance of the job. (Array jobs only.)
Job Status
There are a couple of commands that can be used to check on the status of a job.
- squeue
- Gives a list of jobs in the queue. Some options are:
- --long or -l
- Long format (more details).
- --states=RUNNING or -t RUNNING
- List only running jobs
- --states=PENDING or -t PENDING
- List only pending jobs
- -r
- Expand array jobs (only for pending jobs).
- -u $USER
- Show only your jobs.
- scontrol show job
- Shows the details of a currently running job.
- sacct
- Show job history for your user. Use -l for all details.
Jobs in Pending State
If squeue reports your job as pending (short form ST=PD), this could be for several reasons:- ReqNodeNotAvail
- This could happen if:
- There is a reservation in the system for that node (or all nodes). This is usually for maintenance purposes.
- You specified particular nodes using --nodelist (not advised).
- AssocGrp*Limit
- You have hit a quota on the amount of resources currently consumed. Your job will start when a sufficient number of your other jobs finish.
- Resources
- All the nodes are busy. This job will be the next to run.
- Priority
- All the nodes are busy or are waiting for a job that was submitted earlier and is also pending.
Cancelling a job
If your job is running, but not producing useful output, you should cancel it. This can be done with the command scancel JJ where JJ is the Job ID.Interactive use
Often some short tests need to be done to make sure that programs are working as expected before running a full job. In this case, you should still request computation time on a node, so as to not slow down the login node. In order to do this, use either interactive (preferred) or srun --pty /bin/bash. This will create a new session on a compute node as if you had used ssh to that node. The interactive script is set at 8 hours, and -t can be used as above for srun. Please try to limit your walltime request to how much time you will actually be interactive, and submit a batch job to do longer computations.X11 forwarding
If you need to use an X11 display, make sure you have configured your local SSH client properly (command line ssh uses -X or -Y) when connecting to the cluster. You can test if it is working with a command like xset q. Once that is working, the interactive command should pick it up automatically. If you are using ‘srun’, you need to pass --x11.Software Management
There is fair number of software packages and libraries installed on the clusters. You can use the module list command to see what is currently installed. Send an email to hpc@etal.uri.edu if you would like something new installed.Using modules
The module command is used to setup your environment to run a particular version of software.- avail
- Shows a list of software that is available to use. Some software is in groups, so you can do module list bio/ to see the list of biology related software. use module list without any modifiers to see the complete list, although there will be some duplication.
- load
- Used to setup the environment for a particular software. E.g., module load BamTools will make the bamtools command available. Note that some modules may not be loaded side-by-side, such as those compiled with different dependencies. Sometimes multiple versions of a package will be available. You can specify an exact version, such as module load BamTools/2.4.0-foss-2016b
- unload
- Unload a particular module.
- list
- Shows the list of loaded modules.
- show
- Shows how a module will change the environment.
- purge
- Reset the environment.
Conflicts
You may get errors when trying to load a module if it conflicts with another module that is already loaded. This usually happens if they use a different toolchain (foss-2020b vs foss-2018b, etc). See this page for a complete list of compatible GCC/foss versions. If you can’t find a compatible set, then you may request such a set be made available. Generally later versions are better. In some cases you can change midstream:module load xxx/yyy-foss-2020b
./prog1 # depends only on xxx
module purge
module load www/zzz-foss-2018b
./prog2 # depends only on www
You can not have xxx and www loaded at the same time. So if xxx itself tries to run www as a sub-process, this procedure won’t work. In that case, you can contact hpc@etal.uri.edu to install compatible versions.
Miscellaneous
Resetting your password
You may use the passwd command to update your password once you have logged into the system.Reporting job issues
If you your job failed and you believe it to be an issue with cluster or the software that is installed, please report the following details with your request for assistance:- Job number
- Job script location (as ran, not modified)
- Job log file location(s)
- Exact error message received, if any
- Software the job uses
- If it is a software failure, please provide a small example of data and how to run the software.