Log-odds Scores.
Sequence Q: ABCCCABAAC ACCCBACBCC CAABBBAABB
Sequence L: CBCCCABCBC BBCABCBBCC ACABBAACBC
Log-odds.
Here we have the random probabilities for encountering the nine possible alignments of our 3-letter alphabet.
PAA=pApA=0.07111PAB=pApB=0.08889PAC=pApC=0.10667
PBA=pBpA=0.08889PBB=pBpB=0.11111PBC=pBpC=0.13333
PCA=pCpA=0.10667PCB=pCpB=0.13333PCC=pCpC=0.16
And here we have the probabilities derived from the alignment of Q and L up top, and alignment assumed to represent Nature's preferences.
qAA=MAApA=0.1qAA=MABpB=0.05qAC=MACpC=0.11667
qBA=MBApA=0.05qBA=MABpB=0.23333qBC=MACpC=0.05
qCA=MCApA=0.11667qCA=MABpB=0.05qCC=MACpC=0.23333
The first step to computing the log-odds matrix is to take the ratios of all the corresponding elements in both the q-array and the P-array. Here's the result:
qAA/PAA=MAA/pA=1.40625qAB/PAB=MAB/pA=0.5625qAC/PAC=MAC/pA=1.09375
qBA/PAB=MBA/pB=0.5625qBB/PBB=MBB/pB=2.1qBC/PBC=MBC/pB=0.375
qCA/PCA=MCA/pC=1.09375qCB/PCB=MCB/pC=0.375qCC/PCC=MCC/pC=1.45833
Ok, so look at these numbers. If the ratio is greater than 1, then q is greater than P, which means the particular alignment occurs at a greater than random frequency, which means that it is a biologically favorable substitution. This occurs with all the diagonal components (AA, BB, CC), and the AC or CA off-diagonal substitution. We want these substitutions to get positive scores, and all the rest negative scores. The function that does this is the log function (in particular the natural log function "ln"). We won't go into details, but recall that if X > 1, then ln(X) > 0, and if 0 < X < 1, then ln(X) < 0. Anyway, here's what we get if we take the natural logarithm of each of the nine components above:
ln(qAA/PAA)= +0.34093ln(qAB/PAB)= -0.57536ln(qAC/PAC)= +0.08961
ln(qBA/PAB)= -0.57536ln(qBB/PBB)= +0.74194ln(qBC/PBC)= -0.98083
ln(qCA/PCA)= +0.08961ln(qCB/PCB)= -0.98083ln(qCC/PCC)>= +0.37729
Well, this would be a fine scoring matrix, but we really have no reason to believe that we need this much precision, and as biologists it would be nicer to deal with integers, so one way to make a more comfortable matrix of alignment scores would be to take these numbers, multiply by 10, and round to the nearest integer. In this manner we get our final result, the alignment scoring matrix below:
A
B
C
A
3
-6
1
B
-6
7
-10
C
1
-10
4
Given this scoring matrix the Q and L alignment above scores 49 (if I did it right), which is quite high. The question is, is it high enough to convince the researcher that it's biologically significant? Well, since we've assumed from the start that their alignment is significant, we can say yes in this case. We'll attempt to answer that kind of question in general in the Statistics section. On the next page of this section we'll finish our discussion of scoring, and begin discussing multiple alignments.