
The derivation of a scoring matirx given on the previous pages was fairly straightforward
and rigorous. Notice that q_{AA} + q_{BB} + q_{CC} = 0.566, which is
the total probability of having no change, ie., that over one step any given residue will
remain stable. This number is fairly low. Over only one evolutionary step we should
expect a lot more stability than that.
It was arbitrarily decided in a smokefilled room at a secret meeting of geneticists that
an acceptable onestep stability probability should be 0.99 (99 percent
stable, not mutating), and that this would define what one step meant. Having eradicated
all opposition in a series of daring nighttime raids, they devised a mathematical kludge
that accomplished their goal. It's not a bad kludge, although it's not hard to imagine
special conditions in which it gives nonsense results. Still, we may safely assume that
such conditions are not present in the real world.

The 20x20 scoring matrix resulting from this method would be a PAM1 matrix, where
PAM = Percent of Accepted Mutations; 1 = one step. To get the PAM250 scoring matrix we have to go back to
the kludged Mutation Probability Matrix, M, multiply it by itself 250 times to move 250 steps into the
future, then rederive a scoring matrix from that point (the number 250 was decided upon presumably by the
same cabal in the same smokefilled room, when they weren't covering up their work on extraterrestials).
The result is a widely used scoring matrix, but ti is not the only choice.

L  A  R  R  Y   
Multiple Sequence Alignments

M  O      E   
C  U  R  L  Y   
S  H      E  P 
To complete this section let me say a brief word about Multiple Alignments.
Often times the task of geneticists is to take a single query sequence, pop it into a database,
find optimal pairwise scores for aligning the single sequence with each member of the database,
then pick out the scores above a certain cutoff and look at the corresponding optimal alignments.
Sometimes, however, we have a collection of k > 2 sequences that we are pretty sure are all related,
and to better understand that relationship we want to multiply align all k members of the collection
all at once. There is an example of a multiple alignment, although perhaps not optimal, just above,
k = 4. The simplest, and arguably the most reasonable, way to score this is to sum all the pairwise
scores that appear in the multiple alignment, giving any gapgap alignments a score of zero. (I'll
leave it to the reader to prove that there are 36 pairwise scores in the example above, with 5 of
those being gapgap zeros.) Anyway, there are dynamic programming algorithms similar to those
already discussed, which will yield optimal alignments and scores for any collection of sequences.
We needn't go into details here.
We will discuss more advanced aspects of these subjects in a fifth section if funding can be found
to support the effort. Meanwhile, this concludes the alignment section. The fourth section covers
Statistics. Head back to the Menu to get to page 1.

