In the real world the statistical analysis of alignment scores is seldom as clean as has been outlined on the previous pages. As an example, let's suppose you're a biologist working in a secret laboratory 37 stories beneath the Nevada desert. You've sequenced a gene from an extraterrestial sold to the lab by a group of children who wouldn't let it phone home. The gene consists of 300 amino acid codons (why the alien's amino acids should be the same as ours was explained on Star Trek). In order to test its relatedness to earthbound genes you type it into a computer and let the BLAST program run it up against each of 250,000 sequences in a genome database. The BLAST program finds local alignment scores, but to save time it doesn't use a straightforward dynamic programming algorithm. In particular, BLAST local alignments are gapless.

Anyway, each query/database comparison may result in several local alignments. The maximum segment pair (MSP) is the one with the highest score. It's an extreme value. The statistical distribution that best approximates the distribution of MSP scores is the extreme value distribution. Even so it does not give us exact probabilities, only lower limits. Let P(s>S) be the probability of randomly encountering a score s greater than some cut-off S. Then

P(s>S) > 1 - exp(-Kmne-vS).
K and v are constants dependent upon the make-up of the database, m is the length of the query, and n depends on the context (for a single query/database comparison, n is the length of the database sequence).

We're interested in high scores S. Note that the bigger S gets, the smaller e-vS gets, and the smaller that gets, the closer exp(-Kmne-vS) gets to 1, and the closer the lower bound for P(s>S) gets to zero. That is, big S yield small P.

In addition, if Kmne-vS is close to zero, then exp(-Kmne-vS) is well approximated by 1 - Kmne-vS. In that case the lower bound above can be well approximated by Kmne-vS. This value is called the expect. According to Setabul and Meidanis in their book "Introduction to Computational Molecular Biology", it is interpreted as the expected number of distinct segment pairs between two random sequences with score above S. If this is not all clear, don't fret, it will either be clarified in another section (if it's funded), or will be clarified in class.
Notice that the function 1 - exp(-Kmne-vS) is not the distribution itself, but the area under its right-tail. Recall that areas are associated with probabilities. (The extreme value distribution reappears in the context of FASTa.)

On the next and final page of this section I've put a Java applet that graphs the extreme value distribution. I do this because prior to my involvement in this subject I'd never even heard of the thing, and I want to see what it looks like.