Sampling from a List - a Big Improvement

One of the guest articles by Norm Marsh (newsletter issue #40, 1997) talked about "sampling from a sorted list." It might be useful to talk about how to do the mechanics of this kind of sampling, because it is so very efficient.

By efficient, we mean that you get a much better sampling error. In addition, you sample a very good cross-section of the population, and therefore get a better description of it.

Systematic Samples are common in forestry. A "square grid" of plots is one example. The operational reason to do a grid sample is that it is handy for plot location and navigation. It also looks convincing. A grid is only one kind of systematic sample. The statistical reason for systematic samples is that they give better answers than random samples. How much better depends on how cleverly you do the systematic design.

It might be that nature gives you a nice gradient to sample across, like the increased volume/acre up and down a slope for instance. If nature does not give you a nice gradient, you can create your own.

Suppose you have just taken over a company with an out-of-date inventory. You want to do your own measurements on the stands. A sorted list is one good way to do the selection of stands to measure. Even though you would have hundreds of stands, we will illustrate the idea with an example of only 20 stands. The actual measurements are also listed in the table (in case you want to do a sample for yourself), but of course you would only know those measurements when you sampled that stand.

One thing to note is that we are simply putting these stands into order from smallest to largest. We do not really need to give them values. The old estimates for the stands are just a convenient way to get them listed from smallest to largest volume/acre. There are several statistical techniques that make use of your estimates when they are actually numbers, but this method only needs the approximate order of stands from small to large.

You cannot cause a bias by how you sort the stands, so the order is completely up to you. Some people might prefer to sort them by age, for instance, so that they get a very good distribution of ages. If they used strata, they might want to sort by age within that strata. The main distinction between different sorts is how efficient they are at estimating the total or average.

After sorting, we specify a range for each stand, depending on its area. This will give each acre in the ownership the same chance of being sampled (and every stand has a probability of sampling proportional to its size). This is the same probability that a simple grid or random sample of the area would give.

The total area in all the stands is 1,093 acres. If we want a sample size of 6, we will select a stand every (1093/6)=182.17 acres.

The first selection is chosen randomly between acres 0 and 182.17. This is called a "random start" for the sample, for example it might be acre number 57.55. From then on the constant amount of 182.17 acres is added for each additional sample location. This gives the following sample of 6 stands to measure :

acre	57.55 239.72 (=57.55+182.17) 421.89 (=239.72+182.17) 604.05 ... etc ... 786.22 968.39

The stands these acres fall into are noted with an asterisk (*) in the table below. These identify the stands to sample. As you can see, the larger stands have a larger chance to be sampled. A very large stand might be sampled several times.

Old Value	Area	Range Low	Range High	New Value
1,744	69	0	* 69	2,831
4,571	16	69	85	5,108
6,548	43	85	128	11,027
6,657	57	128	185	12,890
7,126	35	185	220	12,025
13,044	88	220	* 308	15,546
14,987	53	308	361	23,920
17,785	83	361	* 444	33,677
19,539	57	444	501	24,719
21,359	27	501	528	27,168
30,517	77	528	* 605	51,209
30,926	82	605	687	43,369
34,231	15	687	702	47,208
36,323	79	702	781	67,132
36,866	26	781	* 807	71,333
37,633	41	807	848	55,271
39,212	66	848	914	69,752
40,620	84	914	* 998	61,426
45,697	85	998	1,083	54,167
45,718	10	1,083	1,093	75,908

As you can see, there is a rough relationship between the estimate and the possible new measurement, but there is considerable variability, and the new values are about 55% higher. A graph also shows this :

graph of old versus new estimates The average for this sample is 39,377 BF, which is a 57% increase over the old estimate of average volume/acre.

This systematic sample across the sorted list forces a "good" sample, where the large and small amounts balance each other better than a random sample would do. It maintains exactly the same probability for each stand that a random sample or a grid of plots would provide. It is the combination of stands that is controlled for any 6 observations, and which improves the result.

The better the job you do getting the list into the right order, the better the sample will balance internally to provide an improved average. In this example, a simulation indicated that a random sample of 6 would give a sampling error of about 23%. A sample of size 6 from a sorted list gives a sampling error less than 9%. This is about 7 times as efficient (in other words, the systematic sample of 6 is equivalent to 42 observations with a random sample).

You do not get this advantage for free. You get it from your ability to put the stands into roughly the correct order. The payoff is due to your effort. When nature has not put stands into a trend (such as up and down the slope) we can create a trend to sample across.

Even when you are doing more sophisticated sampling methods, this technique of using a sorted list can insure that you have a good coverage of the population.

When you do the usual sampling error calculation with a systematic sample, it does not seem as if you are getting this advantage. This is because you are using a formula that assumes a random sample was used. If you want full credit for your systematic sample, you should calculate the sampling error with a different process.

The simplest way to do that is to take several systematic samples and compare the averages of each of them directly. Here, for instance, is the result of 3 systematic samples of size 6 (each with a different random start) using our example population, for a total of 18 plots.

Sample #	Result
1	39,581
2	34,016
3	38,050

Using these 3 observations, we get one Standard Error of ą4.9% (including multiplying by 1.1, the t-value for a sample size of 3 @ 68% confidence). With the usual computation, the Standard Error would be about ą15%. You would require about 160 random plots to get an SE% of ą4.9%. Being able to document this improvement is a big payoff for doing the extra work of setting up 3 random starts.

You get the benefit from a systematic sample whether you calculate it or not. Many people are content just to know that they are getting a better result.

Kim Iles
Iles and Associates