Scoring More.
 Ok, so anyway, we rummage in our database and choose all pairs of sequences that once aligned have 80% or greater matches. The majority of the mismatches we assume result from single substitutions (and likewise we assume that the vast majority of the matches occur as a result of stability). We use all these aligned pairs to build a one-step mutation probability matrix (recall: we saw an example of this on page 6 of the probability chapter). Let's investigate the process by example. Say we have a 3-letter codon alphabet, {A,B,C}, and two 30 residue sequences aligned as follows:
 Sequence Q: Sequence L: A B C C C A B A A C A C C C B A C B C C C A A B B B A A B B C B C C C A B C B C B B C A B C B B C C A C A B B A A C B C
 Let kA = 16, kB = 20, kC = 24, be the total number od A's, B's and C's, respectively, in the total of 60 residues. Therefore, kA+kB+kC = 60. Let's suppose these sequences are representative of the entire population of genes arising from the 3-letter alphabet. In that case the three fractions, pA = kA/60 = 16/60, pB = kB/60 = 20/60, pC = kC/60 = 24/60, are the occurance probabilities for A, B and C in the total population. Note that pA+pB+pC = 1. That is, there is a probability of 1 (certainty) that any given residue will be filled with an A, B or C.
Construction of the Probability Matrix.
The probability matrix for this case has the following nine elements:
 MAA MAB MAC where, for example, MBA = probability that if we start with A, we'll end up with B after one step (NOT visa versa!). MBA MBB MBC MCA MCB MCC
So, for example, the three probabilities, MAA, MBA, MCA, of the first column represent the probabilities of everything that can happen to A:

MAA - (A-->A); and MBA - (A-->B); and MCA - (A-->C).

Since one of these must happen, MAA + MBA + MCA = 1. By a similar argument the sum down each of the three columns must be 1.

We calculate MBA - the probability that A will mutate to B - as follows:
 # times A aligned with B 3 3 MBA = -------------------------------- = --- = ---. # times A occurs total kA 16
Note that (# times A aligned with B) = (# times B aligned with A) = 3, indicated above in red. But kB is not equal kA, so MAB = 3/20 is not equal MBA = 3/16. The matrix is not symmetric.

Finally, having determined MBA = 3/16, and MCA = 7/16, we set MAA = 1 - MBA - MCA = 6/16. (In reality when dealing with genome databases, diagonal probabilities like MAA, which measure the likelihood a codon will NOT change, will be much closer to 1, and all off-diagonal probabilities, like MBA, much closer to zero.)

On the next page we'll write out the full one-step mutation probability matrix for this case and show how to use it to get multistep matrices.