Additional content#

This content is an additional resource for interested students, who want to learn about a wider variety of topics and commands used by scientists working on the command line on a daily basis.

Installing software#

In some cases, people want to install specific software for their own use that is not commonly used by others. These software might not be available in a pre-installed version on a module system. In this case, in order to install pieces of software, you could (from easiest to hardest):

  • Ask your friendly system administrator to install it for you (thus someone with administrator rights will install software on a computer for you, f.ex. IT services)

  • Use a docker or singularity image (installing an existing environment/container containing a piece of software together with its dependencies)

  • Install it using Conda (using an environment manager to install a piece of software and automatically retrieve and install all dependencies)

  • Install it from scratch, including libraries, dependencies and ensuring that it does not clash with any existing installations (installing software and dependencies manually)

These installation instructions are purely for educational purpose, you will not be required to install software yourself in this course.

Working on a computing cluster#

Many people have access to Euler. If everyone ran whatever program they liked, whenever they liked, the system would soon grind to a halt as it tried to manage the limited resources between all the users. To prevent this, and to ensure fair usage of the server, there is a queueing system that automatically manages which jobs are run when. Any program that will use either more than 1 CPUs (sometimes referred to as cores or threads, though there are minor technical differences between these terms), more than a few MB of RAM, or will run for longer than a few minutes, should be placed in the queue.

The Slurm Queuing System#

To correctly submit a job to the queue on Euler, it’s usually easiest to write a short shell script based on a template. Our server Cousteau also uses the Slurm Queuing System.

#! /bin/bash
#SBATCH --job-name example              # Job name
#SBATCH --output example_out.log        # Output log file path
#SBATCH --error example_error.log       # Error log file path
#SBATCH --ntasks 8                      # Number of CPUs
#SBATCH --mem-per-cpu=2G                # Memory per CPU
#SBATCH --time=1:00:00                  # Approximate time needed

# Insert your commands here
echo This job ran with $SLURM_NTASKS threads on $SLURM_JOB_NODELIST

Then the equivalent commands:

# Submit the job to the queue
sbatch my_jobscript.sh

# Check the status of your jobs
squeue

# Remove a job from the queue
scancel jobid

Exercise 2.6#

Exercise 2.6

  • Copy the script /nfs/teaching/551-0132-00L/2_Good_practices/submit_slurm.sh to your home directory.

# Copy the submit script to your home directory
cp /nfs/teaching/551-0132-00L/2_Good_practices/submit_slurm.sh ~/
  • Submit the script to the job queue with sbatch and look at the output file

# Submit the script
sbatch submit_slurm.sh

# Check if it is in the queue (it may run too fast and you miss it)
squeue

# Check the output files
less example*error.txt # Note: "*" stands for a number
# Should be empty
less example*out.txt # Note: "*" stands for a number
# Should tell you that it ran with 8 threads on localhost
  • Now edit the script:

    • Remove the existing echo command.

    • Put a command to run the script you wrote for Exercise 2.5 on one of the fasta files in /nfs/teaching/551-0132-00L/1_Unix/genomes.

    • You should only use 1 CPU instead of 8, the other parameters can stay the same unless you want to rename the job and log files

# Modify the submit script (submit_slurm.sh) to look something like this:

        #! /bin/bash
        #SBATCH --job-name fastacount              # Job name
        #SBATCH --output out.log                   # Output log file path
        #SBATCH --error error.log                  # Error log file path
        #SBATCH --ntasks 1                         # Number of CPUs
        #SBATCH --mem-per-cpu=2G                   # Memory per CPU
        #SBATCH --time=1:00:00                     # Approximate time needed

        ./fastacount.sh /nfs/teaching/551-0132-00L/1_Unix/genomes/bacteria/escherichia/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna
  • Submit the job. When the job is finished, look at the output files for yourself.

# Then you submit it like this:
sbatch submit_slurm.sh

# Check the output
less error.log      # Should be empty
less out.log        # Should have the output of your script, e.g. 4302