PhilGoetz comments on The Universal Medical Journal Article Error

PhilGoetz 7 Apr 2013 14:40 UTC
6 points

If we see the same amount of hyperactivity in the population prior to and after the study, then we cannot say that the dye causes hyperactivity in the general population.

Correct. But neither can we say that the dye does not cause hyperactivity in anyone.

The reason that the FDA says that food dyes are okay is because there is no evidence to the contrary. Food dye does not cause hyperactivity according to numerous studies,

Like that. That’s what we can’t say from the result of this study, and some other similar studies. For the reasons I explained in detail above.

Your making the claim “no evidence to the contrary” shows that you have not read the literature, have not done a PubMed search on “ADHD, food dye”, and have no familiarity with toxicity studies in general. There is always evidence to the contrary. An evaluation weighs the evidence on both sides. You can take any case where the FDA has said “There is no evidence that X”, and look up the notes from the panel they held where they considered the evidence for X and decided that the evidence against X outweighed it.

If you believe that there is no evidence that food dyes cause hyperactivity, fine. That is not the point of this post. This post analyzes the use of a statistical test in one study, and shows that it was used incorrectly to justify a conclusion which the data does not justify.

If 10% of kids become more hyperactive and 10% become less hyperactive after eating food coloring, such a methodology will never, ever detect it.[/quote]

While possible, how [b]likely[/b] is this? The answer is “Not very.”

(A) I analyzed their use of math and logic in an attempt to prove a conclusion, and showed that they used them incorrectly and their conclusions are therefore not logically correct. They have not proven what they claim to have proven.

(B) The answer is, “This is very likely.” This is how studies turn out all the time, partly due to genetics. Different people have different genetics, different bacteria in their gut, different lifestyles, etc. This makes them metabolize food differently. It makes their brain chemistry different. Different people are different.

This means that in order to have a high degree of confidence in the results of your study, you must increase the threshold for detection—massively.

That’s one of the problems I was pointing out! The F-test did not pass the threshold for detection. The threshold is set so that things that pass it are considered to be proven, NOT so that things that don’t pass it are considered disproven. Because of the peculiar nature of an F-test, not passing the threshold is not even weak evidence that the hypothesis being tested is false.
- CronoDAS 8 Apr 2013 1:55 UTC
  4 points
  Parent
  People aren’t that different. I really doubt that, for example, there are people whose driving skills improve after drinking the amount of alcohol contained in six cans of beer.
  - Decius 9 Apr 2013 18:42 UTC
    5 points
    Parent
    You haven’t searched hard:
    
    Consider the negative effects of high nervousness on driving skills, the nervousness-reducing effects of alcohol, the side effects of alcohol withdrawal on alcoholics, and the mediating effects of high body mass on the effects of alcohol:
    
    A severely obese alcoholic who is nervous enough about driving and suffering from the shakes might perform worse stone-cold sober than he does with the moderate BAC that he has after drinking a six-pack.
    
    What are the odds that there exists at lease one sufficiently obese alcoholic who is nervous about driving?
    
    That data point would not provide notable evidence that alcohol improves driving in the general population.
- buybuydandavis 8 Apr 2013 1:19 UTC
  4 points
  Parent
  
  There is always evidence to the contrary. An evaluation weighs the evidence on both sides. You can take any case where the FDA has said “There is no evidence that X”, and look up the notes from the panel they held where they considered the evidence for X and decided that the evidence against X outweighed it.
  
  The phrase “There is no evidence that X” is the single best indicator of someone statistically deluded or dishonest.
  - A1987dM 8 Apr 2013 18:42 UTC
    2 points
    Parent
    I’d normally take “evidence that [clause]” or “evidence for [noun phrase]” to mean ‘(non-negligible) positive net evidence’. (But of course that can still be a lie, or the result of motivated cognition.) If I’m talking about evidence of either sign, I’d say “evidence whether [clause]” or “evidence about [noun phrase]”.
    - buybuydandavis 8 Apr 2013 22:20 UTC
      0 points
      Parent
      I think your usage is idiosyncratic. People routinely talk about evidence for and against, and evidence for is not the net, but the evidence in favor.
      
      where they considered the evidence for X and decided that the evidence against X outweighed it.
      
      It’s quite standard to talk about evidence for and against a proposition in exactly this way, as he reports the FDA did. Having talked about “the evidence for” and weighing against the “evidence against”, you don’t then deny the existence of the “evidence for” just because, in balance, you find the evidence against more convincing.
      
      You’re slicing the language so thinly, and in such a nonstandard way, it seems like rationalization and motivated reasoning. No evidence means no evidence. No means no. It can mean *very very little too”. Fine. But it doesn’t mean “an appreciable amount that has a greater countervailing amount”.
      - Decius 9 Apr 2013 18:44 UTC
        2 points
        Parent
        But here the FDA has taken “The balance of the evidence is not enough for to be sure enough” and said “There is no evidence for”. The evidence cited as “no evidence” should move the estimate towards 84% certain that there is an effect in the general population.
        buybuydandavis 9 Apr 2013 21:16 UTC
        0 points
        Parent
        Very good point.
        
        In this case, honest eyeballing of the data would lead one to conclude that there is an effect.
        
        There actually isn’t any evidence against an effect hypothesis, because they’re not testing an effect hypothesis for falsification at all. There just isn’t enough evidence against the null by their arbitrarily too high standard.
        
        And this is the standard statistical test in medicine, whereby people think they’re being rigorously scientific. Still just 2 chromosomes away from chimpanzees.
        TitaniumDragon 16 Apr 2013 18:34 UTC
        2 points
        Parent
        This is why you never eyeball data. Humans are terrible at understanding randomness. This is why statistical analysis is so important.
        
        Something that is at 84% is not at 95%, which is a low level of confidence to begin with - it is a nice rule of thumb, but really if you’re doing studies like this you want to crank it up even further to deal with problems with publication bias. publish regardless of whether you find an effect or not, and encourage others to do the same.
        
        Publication bias (positive results are much more likely to be reported than negative results) further hurt your ability to draw conclusions.
        
        The reason that the FDA said what they did is that there isn’t evidence to suggest that it does anything. If you don’t have statistical significance, then you don’t really have anything, even if your eyes tell you otherwise.
        buybuydandavis 17 Apr 2013 3:03 UTC
        0 points
        Parent
        
        Humans are terrible at understanding randomness.
        
        Some are more terrible than others. A little bit of learning is a dangerous thing. Grown ups eyeball their data and know the limits of standard hypothesis testing.
        
        The reason that the FDA said what they did is that there isn’t evidence to suggest that it does anything.
        
        Yeah, evidence that the FDA doesn’t accept doesn’t exist.
        TitaniumDragon 17 Apr 2013 10:03 UTC
        5 points
        Parent
        The people who believe that they are grown-ups who can eyeball their data and claim results which fly in the face of statistical rigor are almost invariably the people who are unable to do so. I have seen this time and again, and Dunning-Kruger suggests the same—the least able are very likely to do this based on the idea that they are better able to do it than most, whereas the most able people will look at it and then try to figure out why they’re wrong, and consider redoing the study if they feel that there might be a hidden effect which their present data pool is insufficient to note. However, repeating your experiment is always dangerous if you are looking for an outcome (repeating your experiment until you get the result you want is bad practice, especially if you don’t adjust things so that you are looking for a level of statistical rigor that is sufficient to compensate for the fact that you’re doing it over again), so you have to keep it very carefully in mind and control your experiment and set your expectations accordingly.
        buybuydandavis 17 Apr 2013 10:32 UTC
        0 points
        Parent
        
        statistical rigor
        
        The problem we started with was that “statistical rigor” is generally not rigorous. Those employing it don’t know what it would mean under the assumptions of the test, and fewer still know that the assumptions make little sense.
- TitaniumDragon 16 Apr 2013 18:21 UTC
  0 points
  Parent
  [quote]Correct. But neither can we say that the dye does not cause hyperactivity in anyone.[/quote]
  
  No, but that is not our goal in the first place. Doing a test on every single possible trait is economically infeasible and unreasonable; ergo, net impact is our best metric.
  
  The benefit is “we get a new food additive to use”.
  
  The net cost is zero in terms of health impact (no more hyperactivity in the general population).
  
  Ergo, the net benefit is a new food additive. This is very simple math here. Net benefit is what we care about in this case, as it is what we are studying. If it redistributes ailments amongst the population, then there may be even more optimal uses, but we’re still looking at a benefit.
  
  If you want to delve deeper, that’s going to be a seperate experiment.
  
  [quote]Your making the claim “no evidence to the contrary” shows that you have not read the literature, have not done a PubMed search on “ADHD, food dye”, and have no familiarity with toxicity studies in general. There is always evidence to the contrary. An evaluation weighs the evidence on both sides. You can take any case where the FDA has said “There is no evidence that X”, and look up the notes from the panel they held where they considered the evidence for X and decided that the evidence against X outweighed it.[/quote]
  
  Your making the claim “evidence to the contrary” suggests that any of this is worth anything. The problem is that, unfortunately, it isn’t.
  
  If someone does a study on 20 different colors of M&Ms, then they will, on average, find that one of the M&Ms will change someone’s cancer risk. The fact that their study showed that, with 95% confidence, blue M&Ms increased your odds of getting cancer, [b]is not evidence for the idea that blue M&M’s cause cancer[/b].
  
  Worse, the odds of the negative finding studies being published is considerably less than the probability of the positive finding study being published. This is known as “publication bias”. Additionally, people are more likely to be biased against artificial additives than towards them, particularly “independent researchers” who very likely are researching it precisely because they harbor the belief that it does in fact have an effect.
  
  This is very basic and is absolutely essential to understanding any sort of data of this sort. When I say that there is no evidence for it, I am saying precisely that—just because someone studied 20 colors of M&M’s and found that one has a 95% chance of causing more cancer tells me nothing. It isn’t evidence for anything. It is entirely possible that it DOES cause cancer, but the study has failed to provide me for evidence of that fact.
  
  You are thinking in terms of formal logic, but that is not how science works. If you lack sufficient evidence to invalidate the null hypothesis, then you don’t have evidence. And the problem is that a mere study is often insufficient to actually demonstrate it unless the effects are extremely blatant.
  
  quote The answer is, “This is very likely.” This is how studies turn out all the time, partly due to genetics. Different people have different genetics, different bacteria in their gut, different lifestyles, etc. This makes them metabolize food differently. It makes their brain chemistry different. Different people are different.[/quote]
  
  For this to happen, you would require that the space to be very similar in size on both ends.
  
  Is it possible for things to help one person and harm another? Absolutely.
  
  Is it probable that something will help almost exactly as many people as it harms? No. Especially not some random genetic trait (there are genetic traits, such as sex, where this IS likely because it is an even split in the population, so you do have to be careful for that, but sex-dependence of results is pretty obvious).
  
  The probability of equal distribution of the traits is vastly outweighed by the probability of it not being equally distributed. Ergo the result you are espousing is in fact extremely unlikely.
  - PhilGoetz 4 May 2013 16:12 UTC
    −1 points
    Parent
    This is very basic and is absolutely essential to understanding any sort of data of this sort. When I say that there is no evidence for it, I am saying precisely that—just because someone studied 20 colors of M&M’s and found that one has a 95% chance of causing more cancer tells me nothing. It isn’t evidence for anything. It is entirely possible that it DOES cause cancer, but the study has failed to provide me for evidence of that fact.
    
    When I said that “making the claim “no evidence to the contrary” shows that you have not read the literature, have not done a PubMed search on “ADHD, food dye”, and have no familiarity with toxicity studies in general,” I meant that literally. I’m well-aware of what 95% means and what publication bias means. If you had read the literature on ADHD and food dye, you would see that it is closer to a 50-50 split between studies concluding that there is or is not an effect on hyperactivity. You would know that some particular food dyes, e.g., tartrazine, are more controversial than others. You would also find that over the past 40 years, the list of food dyes claimed not to be toxic by the FDA and their European counterparts has been shrinking.
    
    If you were familiar with toxicity studies in general, you would know that this is usually the case for any controversial substance. For instance, the FDA says there is “no evidence” that aspartame is toxic, and yet something like 75% of independent studies of aspartame concluded that it was toxic. The phrase “no evidence of toxicity”, when used by the FDA, is shorthand for something like “meta-analysis does not provide us with a single consistent toxicity narrative that conforms to our prior expectations”. You would also know that toxicity studies are frequently funded by the companies trying to sell the product being tested, and so publication bias works strongly against findings of toxicity.