Ok, so anyway, we rummage
in our database and choose all pairs of sequences that once aligned have 80% or
greater matches. The majority of the mismatches we assume result from single
substitutions (and likewise we assume that the vast majority of the matches
occur as a result of stability). We use all these aligned pairs to build a onestep
mutation probability matrix (recall: we saw an example of this on page 6 of the
probability chapter).
Let's investigate the process by example. Say we have a 3letter codon alphabet, {A,B,C},
and two 30 residue sequences aligned as follows:

Sequence Q: 
A  B  C  C  C  A  B  A  A  C 
A  C  C  C  B  A  C  B  C  C 
C  A  A  B  B  B  A  A  B  B 
Sequence L: 
C  B  C  C  C  A  B  C  B  C 
B  B  C  A  B  C  B  B  C  C 
A  C  A  B  B  A  A  C  B  C 
Let k_{A} = 16, k_{B} = 20, k_{C} = 24, be the total number od A's, B's and
C's, respectively, in the total of 60 residues. Therefore,
k_{A}+k_{B}+k_{C} = 60.
Let's suppose these sequences are representative of the entire population of genes arising from the
3letter alphabet. In that case the three fractions,
p_{A} = k_{A}/60 = 16/60, p_{B} = k_{B}/60 = 20/60, p_{C} = k_{C}/60 = 24/60,
are the occurance probabilities for A, B and C in the total population. Note that
p_{A}+p_{B}+p_{C} = 1. That is, there is a probability of 1 (certainty) that any given
residue will be filled with an A, B or C.

Construction of the Probability Matrix.
The probability matrix for this case has the following nine elements:
M_{AA}  M_{AB}  M_{AC} 
where, for example, M_{BA} = probability that if we start with A, we'll
end up with B after one step (NOT visa versa!).

M_{BA}  M_{BB}  M_{BC} 
M_{CA}  M_{CB}  M_{CC} 
So, for example, the three probabilities, M_{AA}, M_{BA}, M_{CA},
of the first column represent the probabilities of everything that can happen to A:
M_{AA}  (A>A); and M_{BA}  (A>B); and M_{CA}  (A>C).
Since one of these must happen, M_{AA} + M_{BA} + M_{CA} = 1.
By a similar argument the sum down each of the three columns must be 1.
We calculate M_{BA}  the probability that A will mutate to B  as follows:
  # times A aligned with B   3   3 
M_{BA}  =    =    =  . 
  # times A occurs total   k_{A}   16 
Note that (# times A aligned with B) = (# times B aligned with A) = 3, indicated above in red.
But k_{B} is not equal k_{A}, so M_{AB} = 3/20 is not equal M_{BA} = 3/16.
The matrix is not symmetric.
Finally, having determined M_{BA} = 3/16, and M_{CA} = 7/16, we set
M_{AA} = 1  M_{BA}  M_{CA} = 6/16. (In reality when dealing with genome databases,
diagonal probabilities like M_{AA}, which measure the likelihood a codon will NOT change, will be
much closer to 1, and all offdiagonal probabilities, like M_{BA}, much closer to zero.)
On the next page we'll write out the full onestep mutation probability matrix for this case and
show how to use it to get multistep matrices.

