LENIN'S TOMB: Iraq deaths "could be as high as half a million".

Tuesday, January 10, 2006

Iraq deaths "could be as high as half a million". posted by Richard Seymour

An interesting article on the Lancet study from Andrew Cockburn at Counterpunch:

Seeking further elucidation on the mathematical tools available to reveal the hidden miseries of today's Iraq, I turned to CounterPunch's consultant statistician, Pierre Sprey. He reviewed not only the Iraq study as published in the Lancet, but also the raw data collected in the household survey and kindly forwarded me by Dr. Roberts.

"I have the highest respect for the rigor of the sampling method used and the meticulous and courageous collection of the data. I'm certainly not criticizing in any way Robert's data or the importance of the results. But they could have saved themselves a lot of trouble had they discarded the straitjacket of Gaussian distribution in favor of a more practical statistical approach", says Sprey. "As with all such studies, the key question is that of 'scatter' i.e. the random spread in data between each cluster sampled. So cluster A might have a ratio of twice as many deaths after the invasion as before, while cluster B might experience only two thirds as many. The academically conventional approach is to assume that scatter follows the bell shaped curve, otherwise known as 'normal distribution,' popularized by Carl Gauss in the early 19th century. This is a formula dictating that the most frequent occurrence of data will be close to the mean, or center, and that frequency of occurrence will fall off smoothly and symmetrically as data scatters further and further from the mean - following the curve of a bell shaped mountain as you move from the center of the data.

"Generations of statisticians have had it beaten in to their skulls that any data that scatters does so according to the iron dictates of the bell shaped curve. The truth is that in no case has a sizable body of naturally occurring data ever been proven to follow the curve". (A $200,000 prize offered in the 1920s for anyone who could provide rigorous evidence of a natural occurrence of the curve remains unclaimed.)

"Slavish adherence to this formula obscures information of great value. The true shape of the data scatter almost invariably contains insights of great physical or, in this case medical importance. In particular it very frequently grossly exaggerates the true scatter of the data. Why? Simply because the mathematics of making the data fit the bell curve inexorably leads one to placing huge emphasis on isolated extreme 'outliers' of the data.

"For example if the average cluster had ten deaths and most clusters had 8 to 12 deaths, but some had 0 or 20, the Gaussian math would force you to weight the importance of those rare points like 0 or 20 (i.e. 'outliers') by the square of their distance from the center, or average. So a point at 20 would have a weight of 100 (20 minus 10 squared) while a point of 11 would have a weight of 1 (11 minus 10 squared.)

"This approach has inherently pernicious effects. Suppose for example one is studying survival rates of plant- destroying spider mites, and the sampled population happens to be a mix of a strain of very hardy mites and another strain that is quite vulnerable to pesticides. Fanatical Gaussians will immediately clamp the bell shaped curve onto the overall population of mites being studied, thereby wiping out any evidence that this group is in fact a mixture of two strains.

"The commonsensical amateur meanwhile would look at the scatter of the data and see very quickly that instead of a single "peak" in surviving mites, which would be the result if the data were processed by traditional Gaussian rules, there are instead two obvious peaks. He would promptly discern that he has two different strains mixed together on his plants, a conclusion of overwhelming importance for pesticide application".

(Sprey once conducted such a statistical study at Cornell - a bad day for mites.)

So how to escape the Gaussian distortion?

"The answer lies in quite simple statistical techniques called 'distribution free' or 'non parametric' methods. These make the obviously more reasonable assumption that one hasn't the foggiest notion of what the distribution of the data should be, especially when considering data one hasn't seen -- before one is prepared to let the data define its own distribution, whatever that unusual shape may be, rather than forcing it into the bell curve. The relatively simple computational methods used in this approach basically treat each point as if it has the same weight as any other, with the happy result that outliers don't greatly exaggerate the scatter.

"So, applying that simple notion to the death rates before and after the US invasion of Iraq, we find that the confidence intervals around the estimated 100,000 "excess deaths" not only shrink considerably but also that the numbers move significantly higher. With a distribution-free approach, a 95 per cent confidence interval thereby becomes 53,000 to 279,000. (Recall that the Gaussian approach gave a 95 per cent confidence interval of 8,000 to 194,000.) With an 80 per cent confidence interval, the lower bound is 78,000 and the upper bound is 229,000. This shift to higher excess deaths occurs because the real, as opposed to the Gaussian, distribution of the data is heavily skewed to the high side of the distribution center".

Cockburn adds:

Of course the survey on which all these figures are based was conducted fifteen months ago. Assuming the rate of death has proceeded at the same pace since the study was carried out, Sprey calculates that deaths inflicted to date as a direct result of the Anglo-American invasion and occupation of Iraq could be, at best estimate, 183,000, with an upper 95 per cent confidence boundary of 511,000.

In many ways, what we are seeing is a continuation under occupation of the genocidal level of deaths that we witnessed in the 1990s, about which more later.