Alignment Scores.
 Sequence Q: Sequence L: A B C C C A B A A C A C C C B A C B C C C A A B B B A A B B C B C C C A B C B C B B C A B C B B C C A C A B B A A C B C
Random Alignment Probabilities.
The two sequences above are assumed to be biological, and to be subject to biological constraints. One place we see this is in the probabilities pA=16/60, pB=20/60 and pC=24/60. In a perfectly random world we might expect the A, B and C codons to occur with equal probability, which would be 20/60 = 1/3. But in this imaginary world with its imaginary genes, the C codon is more useful biologically than the A, so the A's occur less frequently than the C's.

There is yet another level at which biology reduces randomness, and that is in having a preference for some mutations over others. Let's suppose we have a large pool of A, B and C codons with which we want to build a 30 residue gene, and each time we draw a codon to fill a residue we have a chance of 16/60 that the codon is A, 20/60 it's a B, and 24/60 it's a C. Then on average such genes will have the same fraction of A's, B's and C's as the genes Q and L above. Now let's make another random 30 residue gene with the same codon probabilities, and align it with the first. Take a look at position 1. What is the probability that it's an A aligned with an A? This probability, denote it PAA, is just the probability that the first sequence has an A in that position times the probability that the second sequence has an A in that position, because the sequences are independent. The same argument can be made for all the pairwise random alignment probabilities. This gives us a matrix of random pairwise alignment probabilities:
 PAA=pApA=0.07111 PAB=pApB=0.08889 PAC=pApC=0.10667 Note that this P-matrix is symmetric. PBA=pBpA=0.08889 PBB=pBpB=0.11111 PBC=pBpC=0.13333 PCA=pCpA=0.10667 PCB=pCpB=0.13333 PCC=pCpC=0.16
Since these nine alignments are everything that can happen, the sum of these nine probabilities must be 1 (NOTE: this is not a probability matrix of the kind developed on the previous pages; in that case the sum of the probabilities down each column was 1).

NonRandom Alignment Probabilities.
This is the mutation probability matrix from the previous page:
 6/16 3/20 7/24 3/16 14/20 3/24 7/16 3/20 14/24
From this we will form a new matrix the components of which are defined below:
 qAA=MAApA=(6/60)=0.1 qAA=MABpB=(3/60)=0.05 qAC=MACpC=(7/60)=0.11667 qBA=MBApA=(3/60)=0.05 qBA=MABpB=(14/60)=0.23333 qBC=MACpC=(3/60)=0.05 qCA=MCApA=(7/60)=0.11667 qCA=MABpB=(3/60)=0.05 qCC=MACpC=(14/60)=0.23333
This matrix, like the P-matrix, is symmetric, and also like the P-matrix the sum of all nine components is 1. Let's take a look at one of the components and make sense of it. For example,

qBA=MBApA
= [(# times A aligned with B)/(# times A occurs)] x [(# times A occurs)/(# of total codons)]
= (# times A aligned with B)/(# of total codons) = (# times B aligned with A)/(# of total codons)
= qAB.

This would seem to be exactly the same as PBA, that is, the probability of finding an A aligned with a B. The difference is PBA has no connection to the idea that some alignments are biologically prefered; it's a random alignment probability. qBA on the other hand is defined in terms of MBA, the value of which is determined by studying a biological genome database (which in this case has two genes).

 The Point. Suppose qBA > PBA. That means the probability of finding an A as a result of a mutation from a B occurs more frequently than we would expect given purely random mutations. That is, Nature likes this idea, and if Nature likes it, then we had better score it positively, despite the fact it's a mismatch. Likewise, if qBA < PBA, then Nature dispproves. To keep her mollified we choose to score this mismatch negatively. The way we achieve reasonable scores is to produce from P and q a log-odds matrix, which we'll do on the next page.