The derivation of a scoring matirx given on the previous pages was fairly straightforward and rigorous. Notice that qAA + qBB + qCC = 0.566, which is the total probability of having no change, ie., that over one step any given residue will remain stable. This number is fairly low. Over only one evolutionary step we should expect a lot more stability than that.

It was arbitrarily decided in a smoke-filled room at a secret meeting of geneticists that an acceptable one-step stability probability should be 0.99 (99 percent stable, not mutating), and that this would define what one step meant. Having eradicated all opposition in a series of daring night-time raids, they devised a mathematical kludge that accomplished their goal. It's not a bad kludge, although it's not hard to imagine special conditions in which it gives nonsense results. Still, we may safely assume that such conditions are not present in the real world.

The 20x20 scoring matrix resulting from this method would be a PAM-1 matrix, where PAM = Percent of Accepted Mutations; 1 = one step. To get the PAM-250 scoring matrix we have to go back to the kludged Mutation Probability Matrix, M, multiply it by itself 250 times to move 250 steps into the future, then rederive a scoring matrix from that point (the number 250 was decided upon presumably by the same cabal in the same smoke-filled room, when they weren't covering up their work on extraterrestials). The result is a widely used scoring matrix, but ti is not the only choice.
LARRY- Multiple Sequence Alignments
MO--E-
CURLY-
SH--EP
To complete this section let me say a brief word about Multiple Alignments. Often times the task of geneticists is to take a single query sequence, pop it into a database, find optimal pairwise scores for aligning the single sequence with each member of the database, then pick out the scores above a certain cut-off and look at the corresponding optimal alignments.

Sometimes, however, we have a collection of k > 2 sequences that we are pretty sure are all related, and to better understand that relationship we want to multiply align all k members of the collection all at once. There is an example of a multiple alignment, although perhaps not optimal, just above, k = 4. The simplest, and arguably the most reasonable, way to score this is to sum all the pairwise scores that appear in the multiple alignment, giving any gap-gap alignments a score of zero. (I'll leave it to the reader to prove that there are 36 pairwise scores in the example above, with 5 of those being gap-gap zeros.) Anyway, there are dynamic programming algorithms similar to those already discussed, which will yield optimal alignments and scores for any collection of sequences. We needn't go into details here.

We will discuss more advanced aspects of these subjects in a fifth section if funding can be found to support the effort. Meanwhile, this concludes the alignment section. The fourth section covers Statistics.
Head back to the Menu to get to page 1.