Ok, so anyway, we rummage
in our database and choose all pairs of sequences that once aligned have 80% or
greater matches. The majority of the mismatches we assume result from single
substitutions (and likewise we assume that the vast majority of the matches
occur as a result of stability). We use all these aligned pairs to build a one-step
mutation probability matrix (recall: we saw an example of this on page 6 of the
probability chapter).
Let's investigate the process by example. Say we have a 3-letter codon alphabet, {A,B,C},
and two 30 residue sequences aligned as follows:
|
Sequence Q: |
A | B | C | C | C | A | B | A | A | C |
A | C | C | C | B | A | C | B | C | C |
C | A | A | B | B | B | A | A | B | B |
Sequence L: |
C | B | C | C | C | A | B | C | B | C |
B | B | C | A | B | C | B | B | C | C |
A | C | A | B | B | A | A | C | B | C |
Let kA = 16, kB = 20, kC = 24, be the total number od A's, B's and
C's, respectively, in the total of 60 residues. Therefore,
kA+kB+kC = 60.
Let's suppose these sequences are representative of the entire population of genes arising from the
3-letter alphabet. In that case the three fractions,
pA = kA/60 = 16/60, pB = kB/60 = 20/60, pC = kC/60 = 24/60,
are the occurance probabilities for A, B and C in the total population. Note that
pA+pB+pC = 1. That is, there is a probability of 1 (certainty) that any given
residue will be filled with an A, B or C.
|
Construction of the Probability Matrix.
The probability matrix for this case has the following nine elements:
MAA | MAB | MAC |
where, for example, MBA = probability that if we start with A, we'll
end up with B after one step (NOT visa versa!).
|
MBA | MBB | MBC |
MCA | MCB | MCC |
So, for example, the three probabilities, MAA, MBA, MCA,
of the first column represent the probabilities of everything that can happen to A:
MAA - (A-->A); and MBA - (A-->B); and MCA - (A-->C).
Since one of these must happen, MAA + MBA + MCA = 1.
By a similar argument the sum down each of the three columns must be 1.
We calculate MBA - the probability that A will mutate to B - as follows:
| | # times A aligned with B | | 3 | | 3 |
MBA | = | -------------------------------- | = | --- | = | ---. |
| | # times A occurs total | | kA | | 16 |
Note that (# times A aligned with B) = (# times B aligned with A) = 3, indicated above in red.
But kB is not equal kA, so MAB = 3/20 is not equal MBA = 3/16.
The matrix is not symmetric.
Finally, having determined MBA = 3/16, and MCA = 7/16, we set
MAA = 1 - MBA - MCA = 6/16. (In reality when dealing with genome databases,
diagonal probabilities like MAA, which measure the likelihood a codon will NOT change, will be
much closer to 1, and all off-diagonal probabilities, like MBA, much closer to zero.)
On the next page we'll write out the full one-step mutation probability matrix for this case and
show how to use it to get multistep matrices.
|
|