GMO statistics Part 3.Trawling through lots of comparisons with tests designed for a singular decision is a recipe for trouble that starts with t.
When you drive along the highway and exceed the speed limit, sometimes you get into real trouble, get caught by a speed trap, and even have to spend the night in jail.
Statistics is similar. Be aware of the limitations of the test you use, or you may get into intellectual trouble. even worse if you go public with preliminary results or misinterpretation you may spend the scientific equivalent of a night in jail-- public humiliation over misinterpretation of evidence. And you can be caught by a Type I error trap.
On the other hand, unless you are a scientist, being wrong isn't a big deal. ;0)
In earlier posts at GMO Pundit, we have highlighted the trouble you can get into by using statistical tests carelessly.
Way back in 2007 we used Chris Preston's post to describe how Doctor Giles Seralini had got into trouble by looking through hundreds of rat feeding tests and discovering some were apparently significant at a "P=0.05 level".
The difficulty is that 5% of the time, chance alone will give results as discordant as those meeting the P =0.05 criterion.
A lot of confusion about genetics is connected with similar kinds of intellectual misadventure. Interpreting experiments intended to address complicated issues requires attention to how the experiments are designed, considerable knowledge about the behaviour of the biological organisms that you are examining, and in many cases, rigorous attention to the intense rigour associated with statistics.
Doctor Giles Seralini's problems are not the only ones that have plagued controversial claims about genetically modified crops. Doctor Arpad Pusztai also got into misadventures over the statistical interpretation of multiple observations on the same set of rodents.
With so many difficulties being created by the tricky statistics of animal feeding tests, the Pundits has realised there is a need to keep up posting his special series on GMO statistics.
So in this post we present a little bit more about what happens with a common statistical tests used for deciding if two groups of organisms or samples are different from one another.
The test involved is called the t-test and was invented by William Sealy Gosset (1876-1937), a brewer at the Guiness brewery in Dublin. Because he was indeed a modest man, he published his work under the pseudonym Student. Hence the test is sometimes called Student's t-test.
The t-test involves calculating the difference between two averages and dividing it by a measure of the degree of variation or variability in the samples.
The Pundit uses Jerrold H Zar's Biostatistical Analysis fourth edition Prentice-Hall 1999 to guide him through the pitfalls of statistical analysis and Student's t.
Chapter 10 of Professor Zar's book starts a discussion, that goes on for several chapters, about how to analyse experiments involving multiple comparisons. The comparison that are involved in deciding whether genetically modified foods have an effect on animals in laboratory tests stretch these concepts to the limit.
The clear message even from just reading this standard textbook is to consider carefully the numerous confounding factors when interpreting experiments where rodents are fed different diets and subjected to numerous different measurements of biological responses.
At the very least one needs to be cautious about any biological statistical data on rodent testing that have not been subjected to the rigours of professional peer review in appropriate journals. Such findings are only very preliminary until they have been rigorously critiqued.
Zar's textbook shows a very interesting table at the start of the chapter entitled "MULTI-SAMPLE HYPOTHESES: THE ANALYSIS OF VARIANCE".
(click to enlarge)
Zar explains that for each "two-sample" test performed at the "P equals 0.05 level of significance" there is low chance (5%) of making what is called a Type I error, that is to say accepting that two samples are significantly different from one another when in fact they are not,in circumstances where apparent differences are only due to chance.
This corresponds to the entry in the first line of the above table showing an error of 0.05 when k=2.
The act of making multiple comparisons corresponds to k equals larger numbers than 2 in the above table. One example is 10 different comparisons are made using a significant level in the t-test of P=0.05 the chances of making a Type I error rise to 0.90 all at 90%. This is shown in the 10th line of the table above.
In other words, making multiple comparisons using the t-test is most likely to lead to erroneous conclusions.
It is essential in these circumstances to read past the early chapters on the t-test and onto the later chapters of the textbook dealing with the topic of "THE ANALYSIS OF VARIANCE". These chapters also mention the topic ANOVA and describe the use of computer methods to assess the meaning of experiments.
The Pundit knows that most people reading this post will have given up at the first mention of statistics and so he will not inflict more damage and continue to labour the point about ANOVA.
But he will finish by leaving the kind readers who have laboured so far through this post with some scientific advice:
If readers happen to come across a densely packed statistical article about the scientifically controlled feeding of laboratory rodents various diets, which contains by lots of numbers and averages of numerous indicators of rodent health -- such as weight of rodent mommas, numbers of rodent bubbas, weights of rodent livers and thicknesses of their intestines and so forth, first ask these questions:
Is there a comprehensive investigation of ANOVA and analysis of variance up front as a preliminary assessment of the extent of variability in the experiments before proceeding to conclusions about variations seen among different observations?
Are there multiple applications of the t-test, and if so, is there appropriate mention of correcting P value level of significance to ensure the Type I error trap is avoided?Previous posts in this series:
Labels: Risk management, Safety and Regulations, Statistical interpretation

1 Comments:
Usually, to compare multiple means the F tests is used and not the t test. The F test lets you know if either all means are statistically equal or not. If they are not all equal, you can use a t test to do pairwise comparisons of all means.
Post a Comment
Links to this post:
Create a Link
<< Home