Improving Data Quality using Computer- assisted
Data Collection
by C.J.
Cieszewski
WSFR, Univeristy of Georgia
While I was cruising the U.B.C. research forest in 1984, I was pleased with the
waterproof paper we had available, but this was the only improvement in available
inventory means over what I had always had available in any prior work I had done since
1972 in Poland and France. Accordingly, the quality of the data collected was likely to be
similar to what I had always been collecting. In the early 90s, I was in Alberta and
for the first time using a data logger, but there also was rather little gain from writing
the measurements directly into the data logger, and I was thinking then what kind of (or)
deal it was to carry this heavy box through the bushes for the whole day so that I would
save several minutes of keypunching. And even though I was then much into muscle building
and thought that, after all, it was a good use of this equipment. I also
thought that there must be a better way to use the computing capability of the data
loggers. In the mid 90s, when I went with a CFS crew to re-measure our permanent
research sample plots in Greagburn and Tee Pee Pole, Alberta, using a data logger, the
situation had changed dramatically. The gain from using the computer in the field was then
nothing short of extraordinary. Particularly on a few Tee Pee Pole plots, measured in the
past by temporary crews with summer students, we found an astounding number of errors in
the data of individual tree measurements collected during the previous four decades. Here
is the story.
First, our programmers wrote a C program for the data logger that we were using to run
real-time checks on the entered measurements by comparing them with all the past data (all
the data collected on these plots were uploaded onto the data logger and taken into the
field). The program used the most recent models for tree height and diameter growth of the
measured species (lodgepole pine). The models, based on difference equations, were
initialized by entering into the computer the current measurements, and based on the data
entered, quick analyses were run to compare the model predictions for all the other
measurement ages in the data from the previous years measurements. When the
comparisons revealed large discrepancies between predictions and measurements at
corresponding ages, the computer would halt the entry with a beep and issue a warning with
a description of the problem. With such a warning we would then investigate the reason for
the large discrepancies usually starting with redoing the current measurement.
The abundance of errors we caught on these otherwise well maintained research plots was
so unexpected that we did not think to keep track of any statistics on how many errors we
found and where (not that anyone would want to advertise this either). My rough estimation
is that 30 to 50% of trees on average had in the previous several remeasurements some kind
of error that we were able to identify and at least partially correct right there on the
plot. Some trees had multiple errors (for example, measurements on two trees would be
recorded under each others numbers so that all measurement for a given year for each
tree would be wrong). We were able to correct most of these errors using increment bores,
counting and measuring older internodes and considering the special locations of the
suspect trees and comparing measurement values for neighboring trees. Encountering so many
warnings was most surprising, especially because we had a very good crew and two of the
guys taking measurements had over 30 years experience each, and were some of the best
cruisers around.
By far, the biggest errors were in the data from earlier measurements with some
observable regularity associated with certain individual crews taking the measurements.
For example, one crew was notorious in recording their measurements with wrong tree
numbers. They would measure a tree, say, #54 and record all the measurements for this tree
under a label of tree #53 or #45. Once we noticed the regularity we were able to correct
such errors quite expediently. The program on the field computer allowed overwriting of
the original data while also keeping backup records of the original values. After making
corrections we would rerun the comparisons with the models and visually inspect how
different series of measurements lined up over time. Sometimes it would take a few rounds
to either walk away from a tree satisfied or to give up the corrections for the tree and
just write a remark that its measurements were suspect.
In some instances we were able to measure the heights corresponding to previous
measurements by counting internodes below the tip of the tree and measuring the height to
the identified internodes. Similarly, we were able to correct drastic errors in past
diameters by taking and measuring increment bores of the suspect trees. We finished all
the work in a time similar to that of other crews in previous years, but effectively we
completed probably a couple years worth of work. Thanks to the use of the field computer
containing all data from the previous years and an elaborate program running real-time
analysis on the measurements taken, we were able to eliminate many errors not only from
our measurements but even more so from those in past years measurements while also
saving much time.
Conclusion
While it may be deemed elegant and trendy, if not just gimmicky, to promenade with a
small pocket computer with phone numbers and a dentist appointment, into which one can
enter notes by handwriting (as if it were better than a piece of paper), there is a great
potential value in using field computers with appropriate programs for data collection and
field analyses, particularly on permanent sample plots. Using these computers with the
programs to analyze the entered data in real-time (and reporting any discrepancies) can
reveal data problems that can be rectified in the field but not in the office. An
outstanding lesson from our experience is that one of the greatest opportunities with the
field computers is a quick analysis of consistency in all-available measurements
both current and past. Comparing the current and past measurements with prediction models
should be a minimum target. Further opportunity exists in more extensive analysis for
cross-referencing the data on different trees and plots and alerting the field personnel
to any atypical conditions.
Finally, I should add that the criteria for halting the inputs and making the alerts
should be easily adjustable in the field, because it is impossible to make it universally
suitable for all measuring conditions while too rigid parameters turning the field
computer into a beeping device can be a real annoyance and hindrance at work. |