# Is it true that ...

### I can put plots in a stand until I run out of time, weather, patience…?

Maybe. Maybe not. There is a technical issue you want to be aware of. While it does not occur frequently with timber, it is becoming more common with the non?timber resources being sampled, and you might want to recognize it.

If plots take quite a different amount of time to do, it may not be quite correct to work until you run out of time (or money - the same logic can apply). This is true even if you are installing the plots randomly. This is easy to illustrate, but not widely understood. Consider the following example (made extreme for clarity and simplicity).

We are installing plots to learn about decay, and how frequently it occurs in a polygon. The travel time is such that we can only visit one polygon each day. We would like to "make a day of it" and put as many plots into that polygon as possible. Sounds reasonable, doesn't it?

There are 2 situations, and we cannot tell which will occur before we arrive at the sample point.

1. The center of the plot is infected ("IN"fected) with the decay, which we can quickly spot. In that case, we have to tear trees apart, dig up the plot and spend a little less than a day working there. At the end of the plot measurements we go home.
2. We can see quickly that it is clean ("C"lean) with no infection present, and go on to the next plot in the same polygon.

For illustration, assume that half of the area in every polygon is infected. You should get an equal number of decay and non?decay situations. Right?

Here is how the situation plays out every time we go to a polygon.

Half the time, we find an infected area first and the day stops right there. Our estimate would be that the whole polygon area was infected.

Half the time we find no decay, and immediately select another random plot. The next plot also has a 1/2 chance of being decay free, and we merrily put in plots until we hit decay, and then the day ends by completing that plot. Our estimate is that part of every polygon is infected, depending upon how many clean plots were seen before we ran into the infected one.

Here is the pertinent observation: No matter what the sequence of plots visited, you always end the day with an infected plot, even though you are choosing plots randomly each time. In effect, you "tack on" an extra observation of that type more frequently than you should.

• 1/2 of the time {INfected seen first}  Estimating all the polygon is infected.
• 1/4 of the time {Clean, then IN}  You estimate 1/2 is infected
• 1/8 of the time {C, C then IN}  You estimate 1/3 is infected
• 1/16 of the time {C, C, C then IN}  You estimate 1/4 is infected
• etc, etc, etc...

I will not bore you with the math, but when you work this out, it looks like about 70% of the area is infected. Clearly not the right answer, even though you used randomly selected plot locations. You have a "stopping rule" that causes the sequences to end with decayed plots each time.

This unusual bias arises because the sample size is determined by the results of the data. In forestry, this is not much of a problem. Due to travel time and other considerations, this form of bias is generally trivial. In the case of glaciers, of course, it might apply because we can see them on photos before we visit the site, but this kind of problem is seldom encountered in forestry work.

Therefore, this technically possible bias is generally ignored, and we seldom worry about stopping the process based on cost or time limitations.

The Reverse Result
Suppose we go to the same large area each day. We start a line of plots, and put in as many as we can for the day. The next day, we go back and start another line in a new random location. Sounds good so far, doesn't it? We will make this example a bit extreme as well, but the principles apply in all cases.

The area has two forest types, equal in area, and the patches are too small to type out. One type has no trees, and the other has heavy timber. There is only time to do one plot in the heavy timber. The size of the patches is such that a line averages 5 sample locations before it enters the other type.

If we start in the half of the area that is timbered, we put in one plot. Conclusion? The area is fully timbered.

You can see this coming, right? The results are going to be biased if we average the results for each line, in the same kind of manner as the first example.

"Wait", you say. "How about if we use all the plots as one big sample? Wouldn't that be OK? We are, after all, in the same area for all this work."

Sorry, it doesn't work. Half the time you have a string averaging 5 zero plots before you hit one timbered plot and end the days' work.

• 1/2 of the time:  1 timbered plot
• 1/2 of the time:  An average of 5 zeros, then one timbered plot.

Result of the combined data:
5/7 (71.4%) of the area estimated as zero timber, when 50% is the correct answer.

So here, too, we ran into a problem with a stopping rule that determines sample size based upon the outcomes of the observations.

What to do?
"Making a day of it" is a good idea. If the time required for a plot varies a lot based on the results, just do the plan before you reach the area and see the outcome for those plots. If you think that you can do 4-5 plots in an area, then plan on 4 to be safe and stop when you are done with those 4. Travel time to other areas might indicate that only 2 can be expected, so make that your plan.

The averages from clusters of samples like this are easy to properly weight, and this is much smarter than taking the same number of plots at each location and having a large amount of "down time."

If it takes too long, make it a long day, but finish the series of plots you planned. If you finish early at the polygon, then use the time for some useful quality control or training objective rather than putting in more plots.

Is this type of bias a big problem in forestry? No, generally not. Will it be a big problem in other disciplines like wildlife observations or stream assessments? That remains to be seen. Before you let the sample size vary because of the results at those plots - give it some careful thought.

When the plots take nearly the same amount of time, it's OK to pick a series of extra plot locations and randomly do plots until your time is exhausted. This allows you to use all your time, and use it fairly efficiently.

Originally published January 2000