End Gaps
Consider the following alignment:
A
|
C
|
G
|
G
|
A
|
C
|
C
|
-
|
A
|
G
|
G
|
T
|
G
|
A
|
-
|
-
|
-
|
-
|
A
|
C
|
C
|
T
|
A
|
-
|
-
|
-
|
-
|
-
|
There are four matches, yielding +4 points, and 10 alignments with gaps, which,
given a gap penalty of
-2, yields -20 points, for a total score of -16.
Now realign the sequences as follows:
A
|
C
|
G
|
G
|
A
|
C
|
C
|
A
|
G
|
G
|
T
|
G
|
A
|
A
|
C
|
-
|
-
|
-
|
C
|
-
|
-
|
-
|
-
|
T
|
-
|
A
|
Now we have 5 matches, yielding +5 points, and 8 alignments with gaps,
yielding -16 points,
for a total of -11 points. This is better?
|
|
|
|
Well, -11 is a better score than -16 mathematically, but the context of our
comparison is biology.
We're interested in the idea that the similarity of the two sequences may be
due to a series of
changes over time: insertions, deletions, etc. In that light the first sequence
is more interesting
than the second despite its lower score. It is more interesting to view the
string ACCTA as
serving some biological purpose that, in being inserted in the first string
almost intact, improved
that sequence. The letters of a small sequence can always be shuffled about
as was done in the
second alignment, but there's almost no way to determine if that is
significant.
The problem lies with the end gaps. We shouldn't be punishing end gaps, those
that lie fully to the left
or right of one of the two sequences, if we're not interested in the global
alignment, ie.,
the alignment of full sequences. And most of the time we're not. And as a
first step to fully local
alignments, we'll look at partially local alignments, those that don't
punish for end gaps, and
we'll do that on the
next page. This page will be much like page 3 of this section, a JavaScript
that will ask you to input two sequences.
|