Normal Distribution

At the end of the animation on the previous page the point was made that as distributions get more spread out, the intervals corresponding to fixed probabilities also expand. The easiest way to see this is with the normal distribution (also known as the Bell Shaped Curve), which is the most commonly encountered continuous distribution is statistical theory ("continuous" means that all values in the allowable range are possible outcomes, not just a discrete collection of values like the results of rolling two dice).

Although we will make very little use of the Normal distribution, two other continuous distributions will play important roles in what follows. But the Normal distribution is a good place to define the Standard Deviation, which is a measure of spread.

To begin with go to the animation below and slide the ball back and forth. The effect of this action is too cool for words. Play around with it a while and observe stuff. Then skip to the bottom and we'll talk again.

Cool, huh?

The first thing to observe is that unlike most of the distributions we'll be dealing with, the Normal distribution is symmetric. It is a theoretical curve that many collections of sample data aspire to fit (as the sample data on page 1 of this section aspired to fit the binomial distribution). For example, the IQs of a random sample (randomness is essential) of humans should form a distribution that looks Normal. As is often true, there is no attainable population for this case, as that could be defined as not just the IQs of everyone on the planet, but of every conceivable human who could theoretically exist. That's an infinite collection, which is required for a continuous distribution. As to the curves appearing in the animation above, I'll ask you to imagine that the tops of the sliding bars are connected by a smoothly fitting curve, and that curve is then the Normal distribution.

So what happens as you slide the ball back and forth? The area under the curve stays the same (let's set it equal to 1, making this a probability distribution), but the distribution itself spreads out and lowers as the ball is slid to the right, and narrows and raises as it is slid to the left. Tightly clustered data will result in a narrow and high distribution, while a broad but low distribution results from broadly scattered data

A narrow distribution of IQs might result from a sample of genetically similar individuals, a sample that doesn't vary much. A more spread out distribution might result from a sample of individuals taken after WWIII when the planet is populated by weird mutants all of whom spend their time chasing Charlton Heston.

The spread of a distribution is generally measured by what is called the Standard Deviation. In the case of the theoretical Normal distribution one can define the standard deviation, s, as the distance one has to go from the center (mean = m) of the distribution - both to the left and to the right - such that the area above the interval from m-s to m+s is 0.6827. That is, 68.27% of the total area lies above this interval. In the animation above this interval is indicated by the purple and green parts of the distribution, which expand as the distribution spreads. On the next page we'll give a definition of the standard deviation for a finite sample of experimental values. However, although it is common to see references to the standard deviation for other distributions, it is to the Normal distribution that the standard deviation achieves its greatest usefulness.