Statistics for Practical People

PART V - Sampling Error and Confidence Intervals
Published April 1989


WHAT IS THE IDEA?

At the end of a cruise you want to give an indication not only of the stand volume, but of how good a job you have done. Depending on how much money is at risk (and whose money it is), you might care if the answer is good within ±3% or ±40% -- and if you don't care, somebody else might. Since you have already spent most of your money for the cruise on the field work, it makes sense to spend a few minutes to know the risk you are taking.

When you state the average, this is called a "point estimate", and it is usually the most believable answer, but not the only believable answer. When you state a range of answers that are possible, it is called a "confidence interval". If a 95% confidence interval is from 20,000 to 36,000 you are saying that "you are 95% certain that the true answer is in that range ... somewhere". Pure statisticians might argue slightly with that definition, but it is about the most practical and useful which can be given. The average from your cruise is almost always placed right in the middle of that confidence interval, and while that average is your best estimate, you are saying that even the most extreme answers are possible -- though much less likely.

HOW FAR EACH WAY?

How far do you go on each side of the mean to form this confidence interval? That depends on how confident you want to be. This is your choice, or maybe your boss or company policy dictates it. At any rate, the further you go on each side, the more likely you are to be right. All this assumes, of course that you have not created any biases in selecting the sample, measuring it, or calculating the results. All the confidence interval can do is to warn you of the statistical effects of sampling a population. It does not take into account any mistakes you have made in doing the work.

In the previous articles we explained about the Z-table. This is how you determine how far to go on each side of the mean to create the confidence interval. For a 68% confidence interval you go one standard deviate to each side of the mean. For a 95% confidence interval you go 2 standard deviates to each side. When you don't know FOR SURE exactly how big the standard deviate is, then there is a slight correction to the Z value, but we will cover that in the next part of this series. It is a minor point, and doesn't affect the main idea.Next Column

.

WHAT IS THE "STANDARD ERROR"?

Now when we are dealing with an average, we have a special term for a standard deviate. We call it a "standard error". It tells us how spread out the sample means are if we sample from a population. This is the second kind of standard deviate we have discussed, so a quick review is called for. We are concerned with just 2 kinds of "standard deviates":

1. The STANDARD DEVIATION -- it tells us how spread out the THINGS are in our population. You might see it written as one of these symbols: (sigma) or (sigma)n if you KNOW what it is by measuring all the items, (we will use (sigma) in this series),

OR:

SD or (sigma)(n-1) if you are ESTIMATING it from the data in your sample (we will use SD in this series).

2. The STANDARD ERROR -- it tells us how spread out the MEANS are when we sample. You will most often see this written in the following symbols: (sigma)xbar if you KNOW what it is by measuring all the data in the population.

OR:

SE or Sxbar -- if you are CALCULATING it from the data. (We will use SE).

There can be a confusing number of symbols and terms when talking about statistics -- but it will help to keep in mind that you can only be talking about one of TWO ideas. You are either talking about how spread out the THINGS are (Standard Deviation) or how spread out the AVERAGES are (Standard Error). Both of these are special cases of the term "standard deviate", but we give them these special names because they are quite different in practice.

One of the most important differences is one that we have talked about in a previous article. The standard deviation (SD) may not always be easily used to create a confidence interval for THINGS, because the population may not be normally distributed. On the other hand, we know that AVERAGES will always be normally distributed, and so a standard error (SE) is always useful to create a confidence interval.

In the next issue we will put all this together and go through a complete example.


Return to Home
Back to Contents