Sequence Q: |
A | B | C | C | C | A | B | A | A | C |
A | C | C | C | B | A | C | B | C | C |
C | A | A | B | B | B | A | A | B | B |
Sequence L: |
C | B | C | C | C | A | B | C | B | C |
B | B | C | A | B | C | B | B | C | C |
A | C | A | B | B | A | A | C | B | C |
Log-odds.
Here we have the random probabilities for encountering the nine possible alignments of our
3-letter alphabet.
PAA=pApA=0.07111 | PAB=pApB=0.08889 | PAC=pApC=0.10667 |
PBA=pBpA=0.08889 | PBB=pBpB=0.11111 | PBC=pBpC=0.13333 |
PCA=pCpA=0.10667 | PCB=pCpB=0.13333 | PCC=pCpC=0.16 |
And here we have the probabilities derived from the alignment of Q and L up top, and
alignment assumed to represent Nature's preferences.
qAA=MAApA=0.1 | qAA=MABpB=0.05 | qAC=MACpC=0.11667 |
qBA=MBApA=0.05 | qBA=MABpB=0.23333 | qBC=MACpC=0.05 |
qCA=MCApA=0.11667 | qCA=MABpB=0.05 | qCC=MACpC=0.23333 |
The first step to computing the log-odds matrix is to take the ratios of all the corresponding elements
in both the q-array and the P-array. Here's the result:
qAA/PAA=MAA/pA=1.40625 | qAB/PAB=MAB/pA=0.5625 | qAC/PAC=MAC/pA=1.09375 |
qBA/PAB=MBA/pB=0.5625 | qBB/PBB=MBB/pB=2.1 | qBC/PBC=MBC/pB=0.375 |
qCA/PCA=MCA/pC=1.09375 | qCB/PCB=MCB/pC=0.375 | qCC/PCC=MCC/pC=1.45833 |
Ok, so look at these numbers. If the ratio is greater than 1, then q is greater than P, which means
the particular alignment occurs at a greater than random frequency, which means that it is a
biologically favorable substitution. This occurs with all the diagonal components (AA, BB, CC), and
the AC or CA off-diagonal substitution. We want these substitutions to get positive scores, and all
the rest negative scores. The function that does this is the log function (in particular the natural
log function "ln"). We won't go into details, but recall that if X > 1, then ln(X) > 0, and if 0 < X < 1,
then ln(X) < 0. Anyway, here's what we get if we take the natural logarithm of each of the nine
components above:
ln(qAA/PAA)= +0.34093 | ln(qAB/PAB)= -0.57536 | ln(qAC/PAC)= +0.08961 |
ln(qBA/PAB)= -0.57536 | ln(qBB/PBB)= +0.74194 | ln(qBC/PBC)= -0.98083 |
ln(qCA/PCA)= +0.08961 | ln(qCB/PCB)= -0.98083 | ln(qCC/PCC)>= +0.37729 |
Well, this would be a fine scoring matrix, but we really have no reason to believe that we need this
much precision, and as biologists it would be nicer to deal with integers, so one way to make a more
comfortable matrix of alignment scores would be to take these numbers, multiply by 10, and round to the
nearest integer. In this manner we get our final result, the alignment scoring matrix below:
Given this scoring matrix the Q and L alignment above scores 49 (if I did it right),
which is quite high. The question is, is it high enough to convince the researcher that it's
biologically significant? Well, since we've assumed from the start that their alignment is
significant, we can say yes in this case. We'll attempt to answer that kind of question in general
in the Statistics section. On the next page of this section we'll finish our discussion
of scoring, and begin discussing multiple alignments.
|
|