Improving Data Quality using Computer- assisted Data Collection

by C.J. Cieszewski
WSFR, Univeristy of Georgia

While I was cruising the U.B.C. research forest in 1984, I was pleased with the waterproof paper we had available, but this was the only improvement in available inventory means over what I had always had available in any prior work I had done since 1972 in Poland and France. Accordingly, the quality of the data collected was likely to be similar to what I had always been collecting. In the early 90’s, I was in Alberta and for the first time using a data logger, but there also was rather little gain from writing the measurements directly into the data logger, and I was thinking then what kind of (or) deal it was to carry this heavy box through the bushes for the whole day so that I would save several minutes of keypunching. And even though I was then much into muscle building and thought that, after all, it was a good use of this “equipment.” I also thought that there must be a better way to use the computing capability of the data loggers. In the mid 90’s, when I went with a CFS crew to re-measure our permanent research sample plots in Greagburn and Tee Pee Pole, Alberta, using a data logger, the situation had changed dramatically. The gain from using the computer in the field was then nothing short of extraordinary. Particularly on a few Tee Pee Pole plots, measured in the past by temporary crews with summer students, we found an astounding number of errors in the data of individual tree measurements collected during the previous four decades. Here is the story.

First, our programmers wrote a C program for the data logger that we were using to run real-time checks on the entered measurements by comparing them with all the past data (all the data collected on these plots were uploaded onto the data logger and taken into the field). The program used the most recent models for tree height and diameter growth of the measured species (lodgepole pine). The models, based on difference equations, were initialized by entering into the computer the current measurements, and based on the data entered, quick analyses were run to compare the model predictions for all the other measurement ages in the data from the previous years’ measurements. When the comparisons revealed large discrepancies between predictions and measurements at corresponding ages, the computer would halt the entry with a beep and issue a warning with a description of the problem. With such a warning we would then investigate the reason for the large discrepancies usually starting with redoing the current measurement.

The abundance of errors we caught on these otherwise well maintained research plots was so unexpected that we did not think to keep track of any statistics on how many errors we found and where (not that anyone would want to advertise this either). My rough estimation is that 30 to 50% of trees on average had in the previous several remeasurements some kind of error that we were able to identify and at least partially correct right there on the plot. Some trees had multiple errors (for example, measurements on two trees would be recorded under each other’s numbers so that all measurement for a given year for each tree would be wrong). We were able to correct most of these errors using increment bores, counting and measuring older internodes and considering the special locations of the suspect trees and comparing measurement values for neighboring trees. Encountering so many warnings was most surprising, especially because we had a very good crew and two of the guys taking measurements had over 30 years experience each, and were some of the best cruisers around.

By far, the biggest errors were in the data from earlier measurements with some observable regularity associated with certain individual crews taking the measurements. For example, one crew was notorious in recording their measurements with wrong tree numbers. They would measure a tree, say, #54 and record all the measurements for this tree under a label of tree #53 or #45. Once we noticed the regularity we were able to correct such errors quite expediently. The program on the field computer allowed overwriting of the original data while also keeping backup records of the original values. After making corrections we would rerun the comparisons with the models and visually inspect how different series of measurements lined up over time. Sometimes it would take a few rounds to either walk away from a tree satisfied or to give up the corrections for the tree and just write a remark that its measurements were suspect.

In some instances we were able to measure the heights corresponding to previous measurements by counting internodes below the tip of the tree and measuring the height to the identified internodes. Similarly, we were able to correct drastic errors in past diameters by taking and measuring increment bores of the suspect trees. We finished all the work in a time similar to that of other crews in previous years, but effectively we completed probably a couple years worth of work. Thanks to the use of the field computer containing all data from the previous years and an elaborate program running real-time analysis on the measurements taken, we were able to eliminate many errors not only from our measurements but even more so from those in past year’s measurements while also saving much time.

Conclusion

While it may be deemed elegant and trendy, if not just gimmicky, to promenade with a small pocket computer with phone numbers and a dentist appointment, into which one can enter notes by handwriting (as if it were better than a piece of paper), there is a great potential value in using field computers with appropriate programs for data collection and field analyses, particularly on permanent sample plots. Using these computers with the programs to analyze the entered data in real-time (and reporting any discrepancies) can reveal data problems that can be rectified in the field but not in the office. An outstanding lesson from our experience is that one of the greatest opportunities with the field computers is a quick analysis of consistency in all-available measurements – both current and past. Comparing the current and past measurements with prediction models should be a minimum target. Further opportunity exists in more extensive analysis for cross-referencing the data on different trees and plots and alerting the field personnel to any atypical conditions.

Finally, I should add that the criteria for halting the inputs and making the alerts should be easily adjustable in the field, because it is impossible to make it universally suitable for all measuring conditions while too rigid parameters turning the field computer into a “beeping device” can be a real annoyance and hindrance at work.

Improving Data Quality using Computer- assisted Data Collection

by C.J. Cieszewski WSFR, Univeristy of Georgia

by C.J. Cieszewski
WSFR, Univeristy of Georgia