Comparative Sequence Analysis#
In this section of the course, you will learn how to analyse an unknown sequence. For simplicity, we will focus on the procedure you would follow for the genes in a novel bacterial genome. The techniques we use are generally applicable to other organisms and other features, but more difficult or computationally intensive. We will briefly discuss other applications beyond bacterial genes in the relevant sections.
The common element in all these techniques is sequence comparison. Determining the similarities and differences between two or more sequences allows us to infer the features and functions of the sequence (annotation) and its relationship to other sequences (phylogeny). However in order to compare sequences we must understand the concept of alignment.
Sequence alignment#
One way to compare two sequences is through alignment - arranging them against one another to identify areas of similarity. There are two general approaches you could take:
Global alignment attempts to align every residue (every base in a nucleotide sequence, every amino acid in a protein sequence) between the two sequences
Local alignment looks for regions of alignment between the two sequences and ignores the rest (dissimilar regions)
We then need an algorithm that will arrange and score different possible alignments.