BLAST

BLAST#

BLAST, or the Basic Local Alignment Search Tool (BLAST), is the most commonly used alignment tool and, which you may be able to tell from its acronym, performs local alignment. The details of the algorithm (here) are beyond the scope of this course, but understanding the scoring system is important.

BLAST scores an alignment residue by residue based on whether it is a match, mismatch, or a gap (which might exist due to the insertion or deletion of 1 or more residues in one sequence).

mismatch
|
ATGACTAGCTGCTATATCAGCTAC
 || ||||||||     |||||||  <-- every | indicates a match
GTG-CTAGCTGC-----CAGCTAC
   |          |
   gap        extended gap

The score for a match or mismatch depends on exactly which residues are being compared. For DNA this is simple, an identical penalty for any mismatch. For amino acids this is more complex, where residues with similar properties (say, hydrophobicity or polarity) are penalised less than those that are very different. A scoring matrix is employed to determine the final match or mismatch score and here we show the BLAST DNA scoring matrix and a commonly used amino acid matrix, BLOSUM62:

A

C

G

T

A

+5

-4

-4

-4

C

-4

+5

-4

-4

G

-4

-4

+5

-4

T

-4

-4

-4

+5

../../_images/BLOSUM62.png

Gaps are scored in two ways. Firstly, there is a penalty for opening a gap, and if there are multiple gaps in a row then a gap extension penalty is applied, typically less than the penalty for opening it in the first place. In BLAST, these penalties depend on the exact algorithm used.

So the final score for a given alignment is the sum across all residues of the matches, mismatches, gap openings and gap extensions.