The information/residue

of an N letter alphabet, such that the probability that the ith letter will appear in any given residue is pi, is given by the following expression:

-[p1 log2(p1) + p2 log2(p2) +...+ pN log2(pN)].

If all the pi are the same (pi = 1/N for all i = 1 to N), then the expression given above reduces to

log2 N.

The nucleotide alphabet contains 4 symbols: {A,G,T,C}. We will assume that they occur with equal probability (1/4). Therefore the information/residue of this alphabet is

2 = log2 4.

Two bits. Therefore, a nucleotide string of length 900 is equivalent to 1800 bits, which can code for

4900 = 21800 = 7.144 x 10541

Thus a gene of this length can code for not just an unimaginably large number of states, but in some sense an unknowably large number. If the universe were a brain, this number might surpass even its capacity. In any case, numbers of this size easily account for the diversity of life on this, and potentially, all other planets. But as it turns out, this number is too big.

Nucleotides aren't so much responsible for genetic coding as are triples of nucleotides that code for amino acids and ultimately proteins. Three nucleotide residues contain 3x2 = log243 = log264 = 6 bits of information. But there is much redundancy in the capacity of these nucleotide triples to code for amino acids, since there are only 20 amino acids that arise in this context (instead of 64). So from a genetic standpoint it is reasonable to replace the 4 letter nucleotide alphabet with the 20 letter amino acid alphabet. If all of these were to occur with equal probability, the information per residue would be maximal, log220 = 4.322. But they don't occur with equal probability, so the more complicated expression for information/residue given at the top of the page must be used. In the end, given our best guess for the 20 probabilities, the information/residue for the amino acid alphabet is about 4.19 bits. A 900 nucleotide sequence corresponds to 300 amino acids, hence is equivalent to about 4.19x300 = 1257 bits. This many bits can code for 21257 = 2.48 x 10378 states. That's fewer than we got above, but still unknowably large. If the whole known universe were divided into this many little cubes, each cube would be incredibly much smaller than an electron. Or, to put it another way, if you had that many pennies, you could go to 12 movies at today's prices. The mind boggles.