Sampling from a List - a Big Improvement

One of the guest articles by Norm Marsh (newsletter issue #40, 1997) talked about "sampling from a sorted list." It might be useful to talk about how to do the mechanics of this kind of sampling, because it is so very efficient.

By efficient, we mean that you get a much better sampling error. In addition, you sample a very good cross-section of the population, and therefore get a better description of it.

Systematic Samples are common in forestry. A "square grid" of plots is one example. The operational reason to do a grid sample is that it is handy for plot location and navigation. It also looks convincing. A grid is only one kind of systematic sample. The statistical reason for systematic samples is that they give better answers than random samples. How much better depends on how cleverly you do the systematic design.

It might be that nature gives you a nice gradient to sample across, like the increased volume/acre up and down a slope for instance. If nature does not give you a nice gradient, you can create your own.

Suppose you have just taken over a company with an out-of-date inventory. You want to do your own measurements on the stands. A sorted list is one good way to do the selection of stands to measure. Even though you would have hundreds of stands, we will illustrate the idea with an example of only 20 stands. The actual measurements are also listed in the table (in case you want to do a sample for yourself), but of course you would only know those measurements when you sampled that stand.

One thing to note is that we are simply putting these stands into order from smallest to largest. We do not really need to give them values. The old estimates for the stands are just a convenient way to get them listed from smallest to largest volume/acre. There are several statistical techniques that make use of your estimates when they are actually numbers, but this method only needs the approximate order of stands from small to large.

You cannot cause a bias by how you sort the stands, so the order is completely up to you. Some people might prefer to sort them by age, for instance, so that they get a very good distribution of ages. If they used strata, they might want to sort by age within that strata. The main distinction between different sorts is how efficient they are at estimating the total or average.

After sorting, we specify a range for each stand, depending on its area. This will give each acre in the ownership the same chance of being sampled (and every stand has a probability of sampling proportional to its size). This is the same probability that a simple grid or random sample of the area would give.

The total area in all the stands is 1,093 acres. If we want a sample size of 6, we will select a stand every (1093/6)=182.17 acres.

The first selection is chosen randomly between acres 0 and 182.17. This is called a "random start" for the sample, for example it might be acre number 57.55. From then on the constant amount of 182.17 acres is added for each additional sample location. This gives the following sample of 6 stands to measure :

acre   57.55
239.72 (=57.55+182.17)
421.89 (=239.72+182.17)
604.05 ... etc ...
786.22
968.39

The stands these acres fall into are noted with an asterisk (*) in the table below. These identify the stands to sample. As you can see, the larger stands have a larger chance to be sampled. A very large stand might be sampled several times.

Old Value

Area

Range
Low

Range
High

New Value

1,744

69

0

* 69

2,831

4,571

16

69

85

5,108

6,548

43

85

128

11,027

6,657

57

128

185

12,890

7,126

35

185

220

12,025

13,044

88

220

* 308

15,546

14,987

53

308

361

23,920

17,785

83

361

* 444

33,677

19,539

57

444

501

24,719

21,359

27

501

528

27,168

30,517

77

528

* 605

51,209

30,926

82

605

687

43,369

34,231

15

687

702

47,208

36,323

79

702

781

67,132

36,866

26

781

* 807

71,333

37,633

41

807

848

55,271

39,212

66

848

914

69,752

40,620

84

914

* 998

61,426

45,697

85

998

1,083

54,167

45,718

10

1,083

1,093

75,908

As you can see, there is a rough relationship between the estimate and the possible new measurement, but there is considerable variability, and the new values are about 55% higher. A graph also shows this :

graph of old versus new estimatesThe average for this sample is 39,377 BF, which is a 57% increase over the old estimate of average volume/acre.

This systematic sample across the sorted list forces a "good" sample, where the large and small amounts balance each other better than a random sample would do. It maintains exactly the same probability for each stand that a random sample or a grid of plots would provide. It is the combination of stands that is controlled for any 6 observations, and which improves the result.

The better the job you do getting the list into the right order, the better the sample will balance internally to provide an improved average. In this example, a simulation indicated that a random sample of 6 would give a sampling error of about 23%. A sample of size 6 from a sorted list gives a sampling error less than 9%. This is about 7 times as efficient (in other words, the systematic sample of 6 is equivalent to 42 observations with a random sample).

You do not get this advantage for free. You get it from your ability to put the stands into roughly the correct order. The payoff is due to your effort. When nature has not put stands into a trend (such as up and down the slope) we can create a trend to sample across.

Even when you are doing more sophisticated sampling methods, this technique of using a sorted list can insure that you have a good coverage of the population.

When you do the usual sampling error calculation with a systematic sample, it does not seem as if you are getting this advantage. This is because you are using a formula that assumes a random sample was used. If you want full credit for your systematic sample, you should calculate the sampling error with a different process.

The simplest way to do that is to take several systematic samples and compare the averages of each of them directly. Here, for instance, is the result of 3 systematic samples of size 6 (each with a different random start) using our example population, for a total of 18 plots.

Sample #

Result

1 39,581
2 34,016
3 38,050

Using these 3 observations, we get one Standard Error of 4.9% (including multiplying by 1.1, the t-value for a sample size of 3 @ 68% confidence). With the usual computation, the Standard Error would be about 15%. You would require about 160 random plots to get an SE% of 4.9%. Being able to document this improvement is a big payoff for doing the extra work of setting up 3 random starts.

You get the benefit from a systematic sample whether you calculate it or not. Many people are content just to know that they are getting a better result.

Kim Iles
Iles and Associates

Originally published October 1999

Return to Home
Back to
Regular Article Index