Table of Contents

    Usage

    There are three clusters available for use.

    • Andromeda is the on-premise cluster available for researchers.
    • Unity is a larger cluster available for researchers.
    • Seawulf is available for educational use, such as classes.

    Access

    The Andromeda cluster is available via SSH on campus. If you are off campus, you will need to use a VPN connection.

    Generating an ssh key

    On Windows you can install the “OpenSSH Client” under “Optional Features” in Windows Settings. You can then use a PowerShell window to run the ssh commands. From a Mac, you can use ssh from the Terminal application (found under /Applications/Utilities). Linux users can run ssh from any terminal or console.

    If you don’t already have a key, you must generate an ssh key and send the public half (never send the private key) to hpc@etal.uri.edu when you first sign up. Students should CC their Advisor on this email. If you are not the PI on the project, please CC them in the request. On the command line (PowerShell, Terminal, etc), you should have a command called ssh-keygen to create a key if you do not already have a key (when prompted, press Enter to use the default file name). Set a passphrase on this key. You will be prompted for this passphrase when you connect. This will generate two files: the private key ~/.ssh/id_rsa and the public key ~/.ssh/id_rsa.pub. Only send the id_rsa.pub file in the email.

    NOTE: On macOS you can either attach the file, or use the command cat ~/.ssh/id_rsa.pub | pbcopy to copy the contents to the clipboard. You may need to use ⌘-Shift-Period to toggle being able to see files/directories starting with a “.”.

    Here is a ▶️video of creating a key and connecting.

    To access the Research cluster, use:

    ssh -l username ssh3.hac.uri.edu

    Please do not leave yourself logged in when you are not using the system.

    Getting Started

    Here are some links to tutorials that may help you get started using the cluster environment.
    • Shell commands explains the general concepts for navigating the system, working with files and related commands.
    • Software Carpentry has lessons on Python and R, which can be useful for pre- or post- processing data, as well as computation.
    • HPC Carpentry has lessons on using HPC environments in general.

    Data

    Working data for a job should be stored in a directory under /data. When a group is granted access to the cluster the exact path will determined (usually /data/groupname). Backup copies of code, scripts and small summary results may be stored in your home directory, but the job script and data MUST be under /data or the job will not run.

    Transferring Data

    Small files

    Use scp/sftp to transfer small amounts of data to/from the cluster. The recommended client for this is Cyberduck. If you are using a graphical interface for transferring data, you might want to bookmark the paths you use most frequently. See this ▶️video for setting up Cyberduck.

    Remote transfers

    To transfer data from remote sites, you will need to pull the data from the remote site, as our system is not available remotely. So, to transfer to /data/somegroup/, choose one of (in order of preference):

    rsync -av remotesystem:/remote/path/ ./localpath/
    scp -r remotesystem:/remote/path ./localpath
    wget -O ./localpath https://remotesystem/url
    curl -O https://remotesystem/url
    

    Or, to copy data off, one of (in order of preference):

    rsync -av ./localpath/ remotesystem:/remote/path/
    scp -r ./localpath remotesystem:/remote/path
    

    See the man pages for how to use each of rsync/scp/wget/curl.

    Note that above if remotesystem is referring to your machine, you must make sure that you are running an appropriate server (such as sshd) and that your machine is available for public access (not behind a firewall). You can get your IP by asking Google. Restricted addresses (beginning with 192.168. or 10.) will not work.

    Managing Data

    If your program needs to create temporary files, they should do this in /tmp. For some programs the way to do this is to set the environment variable via export TMPDIR="/tmp/". Note that most compute nodes have ~1TB available there, and it is not shared across nodes. Also make sure your script cleans up anything it puts there.

    Quota
    If your group is subject to a quota, you can see how much space is left by using the df command in the directory you are using.

    Scheduler

    The cluster is using SLURM as the scheduler. Official SLURM user documentation is available here.

    Important items

    MPI
    SLURM has some built-in support for working with OpenMPI. Information on working with all MPI implementations can be found at SLURM’s MPI page.
    Interactive mode
    See below for details of using interactive/debugging sessions.
    Environment variables
    SLURM by default will copy your existing environment from the login shell. While this may be helpful for testing, for reproducible results, it is usually better if your script sets all the variables and loads any modules directly. To get this behavior, use sbatch --export=NONE.

    Submitting jobs

    All jobs on the cluster are written as shell scripts and submitted using the command sbatch. A sample script might look like:
    #!/bin/bash
    #SBATCH -t 1:00:00
    #SBATCH --nodes=1 --ntasks-per-node=1
    #SBATCH --export=NONE
    ./single_job

    The lines starting with “#SBATCH” provide default values for parameters to the sbatch command. For this job a single core will be allocated, as no node, cpu or task parameters were given.

    -t HH:MM:SS
    This specifies the maximum wall-clock time the process will be given before it is terminated. For array jobs this is the time for each instance, not the entire array.
    --nodes=1 --ntasks-per-node=1
    This specifies, for a single-threaded script, to allocate 1 node and 1 processor-per-node.
    --export=NONE
    This tells SLURM not to pass current environment variables to your job. This is important to ensure you are loading a consistent set of modules in your script.
    You can then submit this job to run:
    $ sbatch myjob.sh
    Submitted batch job 30484
    The output of the sbatch command is the job id. This can be used to check the status of the job:
    $ squeue --job 30484
          JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          30484   default     bash     user  R       3:04      1 n001
    A simpler way to run a single command is to use the --wrap argument.
    $ sbatch -t 1:00:00 --nodes=1 --ntasks-per-node=1 --wrap="./single_job with_args"
    Submitted batch job 30485

    sbatch Parameters

    NOTE: You can always check the manual on the system for more options using the command man sbatch.

    There is tab-completion support for all of the new commands, so you can type sbatch --<tab><tab> to see all of the options. You can also review the documentation below the online SLURM man pages). SLURM’s documentation includes a guide to converting from other schedulers here (PDF) if your software has examples of using it in other HPC environments.

    Useful parameters are:

    -t HH:MM:SS
    This specifies the maximum wall-clock time the process will be given before it is terminated.
    --nodes 2 --ntasks-per-node=36
    This specifies to allocate 2 nodes and 36 processors (cores) per node. For most simple programs, this should be --nodes=1. (See below for how to change this effectively.)
    --cpus-per-task=4
    This specifies to allocate 4 processors (cores) of a node. For most simple programs, this should be used with a --nodes=1 parameter.
    --mem=20G
    If you need more than 6GB of RAM per task, you can request it with something like--mem=250G or --mem=510G.
    --mem-per-cpu=6G
    Per-cpu version of above.
    --mail-type=BEGIN,END,FAIL,TIME_LIMIT_{50,80,90}
    Email you when the job starts, completes or fails. TIME_LIMIT_80 alerts when 80% of the wall time is reached. 50 and 90% are also available.
    --mail-user=username@example.edu
    Specify the email address to send notifications to. You could also put this in ~/.forward
    -o slurm-%j.out
    Specify where the output from the program should go (stdout only if -e is also specified). Use %j for job id or %A_%a for array jobs.
    -e slurm-%j.out
    Specify where the error output from the program should go (stderr only).
    --array=1-100%10
    Create an array job. In this case, the array indices would be 1-100, with a limit of 10 jobs running concurrently.
    --export=VAR=val
    Add an environment variables to the job.
    --export=NONE
    Start the job with a “clean” environment, not with the variables in the current shell.
    --exclusive
    Node exclusive (See below for when to use this.)
    --exclusive=user
    When using less than one node (for example -c 1), use --exclusive=user to keep the jobs on as few nodes as possible.
    --chdir=/start/path
    Specify a path to start working in. Normally jobs are started in $HOME. This sets $SLURM_SUBMIT_DIR. Many scripts start with cd $SLURM_SUBMIT_DIR to make the directory sbatch was run from be the directory for the rest of the script.
    --wrap="command"
    Specify a single command to run as a batch job. This removes the need to have a script, if you specify all of the sbatch parameters on the command line.
    --qos="normal"
    Specify a Quality of Service to use (for groups with high priority access, see below).
    Additional Resources
    See here for complete documentation of sbatch.

    Choosing a job size

    To decide on which parameters to use when selecting the number of nodes and/or cores to run your job on, consider the following (general guidance; check your program documentation):
    1. Does the program you are using support MPI (the documentation will say so)? If yes, then you can use multiple nodes, and want to use them in exclusive mode. You will also likely use ‘srun’ or ‘mpirun’ to run the program, depending on how the program was compiled. This may take a bit of experimentation. You need to also look further to see if it supports threads. If so (sometimes called “hybrid” mode), then you want to use $SLURM_CPUS_ON_NODE as the number of threads you specify to the program (unless the program auto-detects this, or has another recommendation). If it does not support threads, then use --ntasks-per-core=1.
    2. If your program does not use MPI (the typical case), then determine if it supports threads. If it does, then you should again use exclusive mode with --nodes=1 and $SLURM_CPUS_ON_NODE to maximize use of the node.
    3. If the program does not mention using MPI or threads, it may mention using OpenMP (Note this is different than OpenMPI). OpenMP can also use threads (--exclusive --nodes=1), but in this case you want to export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE. Depending on the application, you may wish to balance --ntasks-per-node and --cpus-per-task to achieve the best performance.
    4. If none of the above apply, then your program likely only runs on one core, and you don’t need to specify node, cpu or task option (the default is one core). In this case, if you are running the same code against independent data, you may want to consider array jobs.

    Using multiple nodes

    You must check if your program is capable of using multiple cores and/or multiple nodes to do computation before requesting resources with the --nodes option. If your program uses the MPI libraries, than you should be able to adjust these parameters within the constraints of that program. For example, some programs require that the number of concurrent processes is even or a multiple of some other number. If your program allows you to specify a number of threads, then you may want to set -c 36 (the number of cores in most of our nodes, although some have 24) and use $SLURM_NTASKS_PER_NODE to pass that number to your program.

    Array Jobs

    If you need to run a job over several instances of data, but each instance can run independently, you might want to create an array job. Because there is some expense to creating a job, it does not make sense to use this to iterate over tens of thousands of very short jobs, and those should be batched into chunks. Here are some guidelines:

    • An instance of a job should run for at least 10 minutes.
    • No job array should be more than a two or three thousand instances.
    • Limit the concurrent jobs in the array to 10 if using a whole node, as in: --nodes=1 --cpus-per-task=36 --array=1-100%10
    • Limit the concurrent jobs in the array to 200 if using a single core: --nodes=1 --cpus-per-task=1 --array=1-1000%200

    An array job is listed in the form 123_[1-5] for the entire array, and 123_3 for an instance in that array. You can use $SLURM_ARRAY_TASK_ID within your script to distinguish each instance.

    To alter the number of concurrent array tasks running, you can use scontrol update jobid=JJ ArrayTaskThrottle=NN where JJ is the JOBID and NN is the new count.

    Exclusive node use

    There are a couple of situations in which you might want to reserve an entire node to your job, which you can do with --exclusive.
    • Your job is going to use all of the cores in a node. In this case, you can pass $SLURM_CPUS_ON_NODE to your program for the number of threads
    • If your job is going to use less than the number of cores on a node, but is going to use almost almost all of the RAM on that node.
    • You want to use less than the number of cores on a node, but are measuring the performance of the job and want to ensure consistent results.

    Large memory jobs

    To submit a job on a node with at least 256GB or 512GB of memory use exclusive mode, and request the node as sbatch --mem=250GB or sbatch --mem=500GB, respectively.

    Environment

    While jobs are running on the computation nodes, there are environment variables available. The most important of these are below:

    SLURM_JOB_ID
    The ID of the job. You can use squeue to see the nodes allocated to your job.
    SLURM_SUBMIT_DIR
    Directory from which sbatch was called.
    SLURM_ARRAY_TASK_ID
    Unique ID assigned to this instance of the job. (Array jobs only.)

    Job Status

    There are a couple of commands that can be used to check on the status of a job.
    squeue
    Gives a list of jobs in the queue. Some options are:
    --long or -l
    Long format (more details).
    --states=RUNNING or -t RUNNING
    List only running jobs
    --states=PENDING or -t PENDING
    List only pending jobs
    -r
    Expand array jobs (only for pending jobs).
    -u $USER
    Show only your jobs.
    scontrol show job
    Shows the details of a currently running job.
    sacct
    Show job history for your user. Use -l for all details.
    Jobs in Pending State
    If squeue reports your job as pending (short form ST=PD), this could be for several reasons:
    ReqNodeNotAvail
    This could happen if:
    1. There is a reservation in the system for that node (or all nodes). This is usually for maintenance purposes.
    2. You specified particular nodes using --nodelist (not advised).
    If it is due to maintenance, then it might help to reduce your requested time (--time) so that it will finish before the reservation (Use scontrol show reservations to see the details).
    AssocGrp*Limit
    You have hit a quota on the amount of resources currently consumed. Your job will start when a sufficient number of your other jobs finish.
    Resources
    All the nodes are busy. This job will be the next to run.
    Priority
    All the nodes are busy or are waiting for a job that was submitted earlier and is also pending.
    For other reason codes, check the man page (man squeue) or contact the system administrator.

    Canceling a job

    If your job is running, but not producing useful output, you should cancel it. This can be done with the command scancel JJ where JJ is the Job ID.

    Interactive use

    Often some short tests need to be done to make sure that programs are working as expected before running a full job. In this case, you should still request computation time on a node, so as to not slow down the login node. In order to do this, use either interactive (preferred) or srun --pty /bin/bash. This will create a new session on a compute node as if you had used ssh to that node. The interactive script is set at 8 hours, and -t can be used as above for srun. Please try to limit your walltime request to how much time you will actually be interactive, and submit a batch job to do longer computations.

    X11 forwarding

    If you need to use an X11 display, make sure you have configured your local SSH client properly (command line ssh uses -X or -Y) when connecting to the cluster. You can test if it is working with a command like xset q. Once that is working, the interactive command should pick it up automatically. If you are using ‘srun’, you need to pass --x11.

    Software Management

    There is fair number of software packages and libraries installed on the clusters. You can use the module list command to see what is currently installed. Send an email to hpc@etal.uri.edu if you would like something new installed, and include a link to the software, as well as any other requirements. NOTE: Some packages will not run on the login node, even for simple arguments like cmd --help. You should start an interactive session to test your scripts.

    Using modules

    The module command is used to set up your environment to run a particular version of software.
    avail
    Shows a list of software that is available to use. Some software is in groups, so you can do module list bio/ to see the list of biology related software. Use module list without any modifiers to see the complete list, although there will be some duplication.
    load
    Used to setup the environment for a particular software. E.g., module load BamTools will make the bamtools command available. Note that some modules may not be loaded side-by-side, such as those compiled with different dependencies. Sometimes multiple versions of a package will be available. You can specify an exact version, such as module load BamTools/2.4.0-foss-2016b
    unload
    Unload a particular module.
    list
    Shows the list of loaded modules.
    show
    Shows how a module will change the environment.
    purge
    Reset the environment.
    A note about module naming: module versions with a suffix of “foss-2016b” are built with gcc-5.4.0, OpenBLAS-0.2.18, LAPACK-3.6.1, and OpenMPI-1.10.3, FFTW-3.3.4, ScaLAPACK-2.20.2. If you want to build a package that uses any of these libraries, it is best to load them as this set (you can load foss/2016b directly). It will then have the environment setup correctly for using these versions.

    Conflicts

    You may get errors when trying to load a module if it conflicts with another module that is already loaded. This usually happens if they use a different toolchain (foss-2016b vs foss-2018b, etc). See this page for a complete list of compatible GCC/foss versions. If you can’t find a compatible set, then you may request such a set be made available. Generally later versions are better. In some cases you can change midstream:
    module load xxx/yyy-foss-2018b
    ./prog1 # depends only on xxx
    module purge
    module load www/zzz-foss-2016b
    ./prog2 # depends only on www
    You can not have xxx and www loaded at the same time. So if xxx itself tries to run www as a sub-process, this procedure won’t work. In that case, you can contact hpc@etal.uri.edu to install compatible versions.

    Policy

    Scheduling Policy

    Some portions of this cluster have been purchased by researchers (if you would like to contribute please review the HPC Policy and then contact hpc@etal.uri.edu) whose group then receive priority access to a comparable set of resources. Currently this means that if requested (by specifying a QoS (-q) when submitting the job), the job will enter the queue with increased priority. If the job does not start within 24-hours, then the scheduler may stop already running jobs (if they have run for more than 24 hours) in order to make room. Note that this only occurs if the entire cluster is in use by lower priority jobs for 24 consecutive hours, which is currently a very unusual circumstance, as a majority of jobs take less time. A further note for groups using this mechanism: it will also affect jobs submitted by your group not using this priority. To view which qos levels you can use, use this command sacctmgr show association where user=$USER format=account,qos%40 To view the limits on the qos, use this command sacctmgr show qos format=name%15,grptres%20,mintres,maxtres

    Reporting job issues

    If you your job failed and you believe it to be an issue with cluster or the software that is installed, please report the following details with your request for assistance:
    • Job number
    • Job script location (as ran, not modified)
    • Job log file location(s)
    • Exact error message received, if any
    • Software the job uses
    • If it is a software failure, please provide a small example of data and how to run the software.

    External Resources

    The national XSEDE program has online training available, much of which should apply to any cluster environment: https://portal.xsede.org/online-training.