Statistics for Practical People (7)

If you have taken a statistics course, or just tried to read a statistics book, you will remember how confusing the terms can be. There seem to be a lot of them, but there are really only a few. The problem is that every special form of each term has its own name. While that may be handy once you understand all this, it is a terrific hassle when you are just trying to learn the material. We would like to talk about 4 "new" terms in this short article, they are:

Variance (V)
Variance of the mean (V_xbar)
Coefficient of Variation (CV)
Standard error in percent (SE%)

These are just variations on the two terms we have been talking about already. They aren't new, they are just different ways of talking about what we already know.

Lets consider the first term, the "Variance" (V). This is just the Standard Deviation (SD) squared. What use is it? None, really. The only way it is used is when you are combining several Standard Deviations. The proper mathematical way of combining these kind of terms, and doing a few other kinds of mathematical gyrations, is to square them first. There is no real interpretation that can be placed on this term, it is just useful in doing some sorts of mathematical operations. The term which has a meaning is the Standard Deviation (SD) itself, which tells you how "spread out" the population is.

As you might expect, the Variance of the Mean (V _xbar) is the same sort of situation. It is simply the Standard Error (SE) squared, and again has no useful interpretation. It is only something you do before combining amounts, and the math people have decided to give it a special name. Luckily, you don't see it too often. The SE itself, telling you how spread out the averages from a sample would be, is the logical term.

The last two terms are more frequently used, and again are just different ways of expressing the terms you are already familiar with. These really do have a logical interpretation. They are both just attempts to use SD or SE in percentage terms, rather than using them in their original units. This makes a good deal of sense.

Suppose you were measuring both trees and logs in some sort of study. You want an "equally good" answer in each case. You know that the trees have a standard deviation of ą0.3 cubic meters, while the logs have a SD of ą130 board feet. Which is the most variable? Clearly the most reasonable way to express these terms would be in percentages. We therefore divide the SD by the mean, and are able to express the "spread" of the data in a way that is easy to understand and to compare to other types of distributions. We give this percentage version of the Standard Deviation the special name "Coefficient of Variation (CV)". It would have been handy if they had called it "SD%", which would have made the situation obvious, but this was not done.

If we now say that trees have a CV of ą25%, and logs have a CV of ą17%, it is clear that trees are "more variable" than logs and will need to be measured more often to get the same relative answer.

The same situation applies to the Standard Error. We are more likely to get a reasonable reaction to a statement like "we know the stand volume within ą7.3%" than if we say "we know the stand volume within 12,300 Board Feet". The standard error is divided by the mean, and given the special term "Standard Error in Percent (SE%)". This is a fairly descriptive term, and you have seen it before in the discussion of "Bruce’s Formula". There are many instances where the percentage expressions are easier to use (or at least more logical) than the units themselves. This is not always true, of course, but in most cases people are more comfortable and better able to make judgments in percentage terms than in actual units.

These percentage terms themselves are also squared at times, but luckily we are not plagued by special terms for these "squared percentage versions".

The Coefficient of Variation (CV) is often seen in statistics, and is a very useful item. It is only misleading when the average is near zero (seldom a problem in forestry applications). If the zero point in a measurement is not a "real zero" it can also cause a slight problem. The CV of temperature, for instance, is not the same in Celsius and Fahrenheit. This is because the zero point of each temperature is at a different arbitrary position. Statisticians would say that these measurements are not "ratio scales", and therefore the CV should not be applied. In fact, even in these cases CV is a useful term as long as you are not too close to the zero point. In forestry, CV is perhaps the most widely used statistic.

The chart below summarizes all these terms and shows how they are related. Note that they are broken down into terms that discuss things (in the population) on the left side, and averages on the right side.

Individuals in the Population	Sample Averages
Variance V = SD²	Variance of the Mean V_xbar = SE²
Standard Deviation SD	Standard Error SE
Coefficient of Variation CV	Standard Error in Percent SE%

As you would expect, the central terms are therefore:

Standard Deviation (things)
Standard Error (averages)

When you square each of these you move up one level. If you want to express them as a percentage, you move down one level. To move from a description of the population (left side) to a description of the sample average (right side), you can just divide by the square root of the sample size (or by "n" itself for the variance where it has been squared already). This is because of the central equation of sampling:

SD / n^0.5 = SE

The chart makes it clear that you are only dealing with 2 terms, which are either being squared for math purposes or being expressed as percentages for easier understanding. The special names given to these terms may make you think something really different is being created, but this is an illusion. One of the hard parts of statistics is to tell when something really different is being introduced. This doesn't happen very often. There are really only a few important ideas in statistics. It isn't easy to pull them out and get them in perspective. If you have had trouble relating these terms in the past you have had a very common experience. One of the things we will try to do in this series is to reduce the number of items you need to think about. Suffering through a statistics course doesn't always get you to that point. As boring as repetition and practice may be, that seems to be the only way to really learn this material. If you think reading it once in a statistics book will do the trick -- by all means do it that way. On the other hand, if you have already tried that routine, we hope you will be patient when we seem to repeat ourselves and hammer away at the main ideas. Our experience is that almost everybody can learn this stuff if they don't get overwhelmed by the jargon and distracted by the details.

Now that we have covered all these terms, and how they relate, the next discussion will be on sample size. There are a few logical ideas (and a few cheap tricks) that will make it much easier for you to calculate and understand sample size.

Statistics for Practical People

PART VII - All Those Statistical Terms Published October 1989

PART VII - All Those Statistical Terms
Published October 1989