Working on a computing cluster

Working on a computing cluster#

The Slurm Queuing System#

Many people have access to Euler. If everyone ran whatever program they liked, whenever they liked, the system would soon grind to a halt as it tried to manage the limited resources between all the users. To prevent this, and to ensure fair usage of the server, there is a queueing system that automatically manages which jobs are run when. Any program that will use either more than 1 CPUs (sometimes referred to as cores or threads, though there are minor technical differences between these terms), more than a few MB of RAM, or will run for longer than a few minutes, should be placed in the queue.

To correctly submit a job to the queue on Euler, it’s usually easiest to write a short shell script based on a template. Our server Cousteau also uses the Slurm Queuing System.

#! /bin/bash
#SBATCH --job-name example              # Job name
#SBATCH --output example_out.log     # Output log file path
#SBATCH --error example_error.log    # Error log file path
#SBATCH --ntasks 8                      # Number of CPUs
#SBATCH --mem-per-cpu=2G                # Memory per CPU
#SBATCH --time=1:00:00                  # Approximate time needed

# Insert your commands here
echo This job ran with $SLURM_NTASKS threads on $SLURM_JOB_NODELIST

Then the equivalent commands:

# Submit the job to the queue
sbatch my_jobscript.sh

# Check the status of your jobs
squeue

# Remove a job from the queue
scancel jobid

Exercise 0.9

  • Copy the script /nfs/teaching/551-0132-00L/2_Good_practices/submit_slurm.sh to your home directory.

# Copy the submit script to your home directory
cp /nfs/teaching/551-0132-00L/2_Good_practices/submit_slurm.sh ~/
  • Submit the script to the job queue with sbatch and look at the output file

# Submit the script
sbatch submit_slurm.sh

# Check if it is in the queue (it may run too fast and you miss it)
squeue

# Check the output files
less example*error.txt # Note: "*" stands for a number
# Should be empty
less example*out.txt # Note: "*" stands for a number
# Should tell you that it ran with 8 threads on localhost
  • Now edit the script:

    • Remove the existing echo command.

    • Put a command to run the script you wrote for Exercise 2.5 on one of the fasta files in /nfs/teaching/551-0132-00L/1_Unix/genomes.

    • You should only use 1 CPU instead of 8, the other parameters can stay the same unless you want to rename the job and log files

# Modify the submit script (submit_slurm.sh) to look something like this:

        #! /bin/bash
        #SBATCH --job-name fastacount              # Job name
        #SBATCH --output out.log                   # Output log file path
        #SBATCH --error error.log                  # Error log file path
        #SBATCH --ntasks 1                         # Number of CPUs
        #SBATCH --mem-per-cpu=2G                   # Memory per CPU
        #SBATCH --time=1:00:00                     # Approximate time needed

        ./fastacount.sh /nfs/teaching/551-0132-00L/1_Unix/genomes/bacteria/escherichia/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna
  • Submit the job. When the job is finished, look at the output files for yourself.

# Then you submit it like this:
sbatch submit_slurm.sh

# Check the output
less error.log      # Should be empty
less out.log        # Should have the output of your script, e.g. 4302