Tuesday, 12 August 2008

Measurements in operational research

Today I swam for the hundredth time this year in the city swimming pool here in Exeter. It is a 25 metre pool, and -- according to my spreadsheet -- I have swum 8300 lengths this year. (I deliberately swam enough lengths today to make the total a round number, and the mean an integer.)
But, how far have I actually swum in that pool this year? I keep a spreadsheet for these swims, and that is updated every time I swim, so the number of swims in the city pool is reliable. I have swum once elsewhere, and we are discounting that five minute splash in a cold, open air pool. There are three obvious sources of error.
First, I may have miscounted. There is no mechanism for recording the lengths except my head, and I know that sometimes I lose track. But to counter this source of error, I usually swim with my wife, and she also counts lengths and we can generally verify the number of lengths each has swum (she is a little faster); I also know how fast I swim, so the clock on the wall of the pool gives a safeguard against gross error. As I swim an even number of lengths, my count is likely to be in error by \plusminus 2 if at all. I would hazard that I have made an error on at most 5 occasions. So the count of the lengths is 8300 \plusminus 10
Second, I do not always stay in the same lane of the pool, so I don't swim exactly the length of the pool. But simple geometry tells me that even if I swim slightly off straight, the difference between what I swim and the length of the pool is very small. We are talking about a variation of \plus 0.05% at most, say 4 lengths.
Third, I trust the pool builders to have measured correctly. But here is the most intriguing source of error. The pool is not exactly 25 metres. There is a tolerance because it was surveyed with measuring line when it was built. Everyone believes that it is exactly 25 metres, but the tolerance is probably \plusminus 150cm (6 inches) -- about 0.5%
So, all in all, I have not swum exactly 207.50 kilometres this year. The extreme range is
8294 * 0.024850 to 8314 * 0.025150 kilometres, i.e. (206.1 to 209.1) But that is the extreme range, and the confidence interval is smaller -- an exercise for the reader.
Why does this matter?
One, the Olympic Games are currently happening. How accurately are lengths of tracks and pools measured? The times of races are recorded very accurately, because we are very good at recording time. But how much tolerance is there in the distances?
Secondly, for O.R. professionals, how often do we believe in spurious accuracy of data? When I learnt about L.P. in the oil industry, we were told a cautionary tale, of the analysts who checked their data; one measurement of viscosity of crude oil was always given as an integer, a small integer. This was then processed through the L.P. model. Where did this value come from, they asked. As one should, they checked. The data was supplied by an experienced worker, who dipped his thumb and forefinger into the crude oil, rubbed them together, and pronounced the measurement. Now, far too often, I see papers submitted for journals where there are tables of results quoted to six or more significant figures. Where did these come from? Usually from the analysis of a few dozen observations that were each measured to two or three significant figures. The best models in the world cannot conjure more accuracy from the model than was in the source data; but all too often we forget that, at our peril.
The results are only as good as the data.

No comments: