Let's apply these ideas to a simple sequence alignment. Suppose in some perfect world every gene consists of exactly 32 amino acid codons. In this world there are

2032 = 4.29 x 1041

possible genes, a huge number, although not even remotely as large as the number of possible genes in our real but imperfect world. In the 32 residue / gene world it is easy to test the similarity of one gene to another: simply line them up together and compare the 32 pairs of residues (we'll ignore the idea that some mismatches are better than others for the time being). There are 33 possible outcomes to such a comparison:
  • 0 matches;
  • 1 match;
  • 2 matches;
  • ...
  • 32 matches.

Each outcome has its own probability of randomly occuring. Denote these:
p0, p1, p2,..., p32.
Since these 33 outcomes are everything that can happen, we know the sum of their probabilities must be 1:

p0 + p1 + p2 +...+ p32 = 1.

For each pair of residues the probability of a match (in a perfect world) is 1 chance out of 20, or p = 1/20 = 0.05. The probability of a mismatch is 1-p = 19/20 = 0.95. Using the ideas developed on the preceding page we can easily compute the probabilities pk, k = 0,1,2,...,32. They are:

The probability p32 is not zero, but close. Actually
p32 = 2.328 x 10- 42,
close enough to zero that if two random 32 residue genes did completely match, we'd have to consider it miraculous, and that being the case everything that follows would be rendered unreliable, or at least suspect.

Here's an interesting and important point:
p0 + p1 + p2 + p3 + p4 = 0.9798. That means that about 98% of the time two random genes of length 32 will match in 4 or fewer residues. Put another way, there is only about 1 chance out of 50 (probability = 0.02) of finding 5 or more matches out of 32. There is less than 1 chance out of 100,000 of finding 10 or more matches out of 32, so finding two sequences which have 10 or more matches is an indication that they are probably not random, that they are somehow related in the evolutionary context.