One bit has 2 states. k bits have 2k states. So 4 bits, shown at far left, has 2x2x2x2 = 24 = 16 states.
The same number of states can be coded using two residues, each of which can be filled by any of 4 symbols (instead of using the "alphabet" {0, 1}, I've chosen {A, G, T, C}). Two residues having a 4 letter alphabet can code for 4x4 = 42 = 16 states.
But this is equivalent, as far as the number of codable states is concerned, to 4 bits (four residues with a 2 symbol alphabet). Therefore we say in both cases that the information content of the residue/alphabet combination is 4 bits. We compute this information content by taking the logarithm to the base 2 of the number of states that can be coded.

4 = log2 24 = log2 42 = log2 16.

This formula can be applied to cases where the number of states is not an integral power of 2. Given a three letter alphabet with 5 residues (slots/positions), we have 35 = 243 states; so the bit information in this case is

7.9248125 = log2 243.

That is, it takes about 7.925 bits to code for 243 states. What's that mean? Well, it's more than 7 bits (128 states), and less than 8 bits (256 states), but it's much closer to 8 bits. Just flow with it.