(Phys.org)—Under ancient Jewish law, if a suspect on trial was unanimously found guilty by all judges, then the suspect was acquitted. This reasoning sounds counterintuitive, but the legislators of the time had noticed that unanimous agreement often indicates the presence of systemic error in the judicial process, even if the exact nature of the error is yet to be discovered. They intuitively reasoned that when something seems too good to be true, most likely a mistake was made.
In a new paper to be published in The Proceedings of The Royal Society A, a team of researchers, Lachlan J. Gunn, et al., from Australia and France has further investigated this idea, which they call the “paradox of unanimity.”
“If many independent witnesses unanimously testify to the identity of a suspect of a crime, we assume they cannot all be wrong,” coauthor Derek Abbott, a physicist and electronic engineer at The University of Adelaide, Australia, told Phys.org. “Unanimity is often assumed to be reliable. However, it turns out that the probability of a large number of people all agreeing is small, so our confidence in unanimity is ill-founded. This ‘paradox of unanimity’ shows that often we are far less certain than we think.”
The researchers demonstrated the paradox in the case of a modern-day police line-up, in which witnesses try to identify the suspect out of a line-up of several people. The researchers showed that, as the group of unanimously agreeing witnesses increases, the chance of them being correct decreases until it is no better than a random guess.
In police line-ups, the systemic error may be any kind of bias, such as how the line-up is presented to the witnesses or a personal bias held by the witnesses themselves. Importantly, the researchers showed that even a tiny bit of bias can have a very large impact on the results overall. Specifically, they show that when only 1% of the line-ups exhibit a bias toward a particular suspect, the probability that the witnesses are correct begins to decrease after only three unanimous identifications. Counterintuitively, if one of the many witnesses were to identify a different suspect, then the probability that the other witnesses were correct would substantially increase.
The mathematical reason for why this happens is found using Bayesian analysis, which can be understood in a simplistic way by looking at a biased coin. If a biased coin is designed to land on heads 55% of the time, then you would be able to tell after recording enough coin tosses that heads comes up more often than tails. The results would not indicate that the laws of probability for a binary system have changed, but that this particular system has failed. In a similar way, getting a large group of unanimous witnesses is so unlikely, according to the laws of probability, that it’s more likely that the system is unreliable.
This isn’t “more evidence can be bad”, but “seemingly-stronger evidence can be weaker”. If you do the math right, more evidence will make you more likely to get the right answer. If more evidence lowers your conviction rate, then your conviction rate was too high.
Briefly, I think what’s going on is that a ‘yes’ presents N bits of evidence for ‘guilty’, and M bits of evidence for ‘the process is biased’, where M>N. The probability of bias is initially low, but lots of yeses make it shoot up. So you have four hypotheses (bias yes/no cross guilty yes/no), the two bias ones dominate, and their relative odds are the same as when you started.
If you are more confident that the method is inaccurate when it is operating then it being low spread is indication that it is not operating. A TV that shows a static image that flickers when you kick it more likely is recieving actual feed than one that doesn’t flicker when punched.
If you have multiple TVs that all flicker at the same time it is likely that the cause was the weather rather than the broadcast
I have a device that displays three numbers when a button is pressed. If any two numbers are different then one of the numbers is the exact room temperature but no telling which one it is.
If all the numbers are the same number I don’t have any reason to think the displayed number would be the room temperature. In a way I have two info channels “did the button pressing result in a temperature reading?” and “if there was a temperature reading what it tells me about the true temperature?”. The first of these channels doesn’t tell me anything about the temperature but it tells me about something.
Or I could have three temperature meters one of which is accurate in cold, on in moderate temperatures and one in hot temperatures. Suppose that cold and hot don’t overlap. If all the temperature cauges show the same number it would mean both the cold and hot meters would in fact be accurate in the same temperatures. I can not be more certain about the temperature than the operating principles of the measuring device as the temperature is based on those principles. The temperature gauges showing differnt temperatures supports me being rigth about the operating principles. Them being the same is evidence that I am ignorant on how those numbers are formed.
That is the case that +ing amongs many should be gaussian. If the distribution is too narrow to be caussian it tells against the “+ing” theory. Someone who is amadant that it is just a very narrow caussian could never be proven conclusively wrong. However it places restraints on how ranodm the factors can be. At some point the claim of regularity will become implausible. If you have something that claims that throwing a fair dice will always come up with the same number there is an error lurking about.
The variance of the Gaussian you get isn’t arbitrary and related to the variance of variables being combined. So unless you expect people picking folks out of a lineup to be mostly noise-free, a very narrow Gaussian would imply a violation of assumptions of CLT.
This Jewish law thing is sort of an informal law version of how frequentist hypothesis testing works: assume everything is fine (null) and see how surprised we are. If very surprised, reject assumption that everything is fine.
having unanimous tesitimony means that the gaussian is too narrow to be the results of noisy testimonies. So either they gave absolutely accurate testimonies or they did something else than testify. Having them all agree raises more doubt on that everyone was trying to deliver justice than their ability to deliver it. If a jury answers a “guilty or not guilty” verdict with “banana” it sure ain’t a result of a valid justice process. Too certasin results are effectively as good as “banana” verdicts. If our assumtions about the process hold they should not happen.
I believe I read somewhere on LW about an investment company that had three directors, and when they decided whether to invest in some company, they voted, and invested only if 2 of 3 have agreed. The reasoning behind this policy was that if 3 of 3 agreed, then probably it was just a fad.
Why too much evidence can be a bad thing
See:
“Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes”
http://www.gwern.net/The%20Existential%20Risk%20of%20Mathematical%20Error
Jaynes on the Emperor of China fallacy
Schimmack’s incredibility index
Looks like the paper is now out: http://arxiv.org/pdf/1601.00900v1.pdf
Thanks Panorama and Gwern, incredibly interesting quote and links
This isn’t “more evidence can be bad”, but “seemingly-stronger evidence can be weaker”. If you do the math right, more evidence will make you more likely to get the right answer. If more evidence lowers your conviction rate, then your conviction rate was too high.
Briefly, I think what’s going on is that a ‘yes’ presents N bits of evidence for ‘guilty’, and M bits of evidence for ‘the process is biased’, where M>N. The probability of bias is initially low, but lots of yeses make it shoot up. So you have four hypotheses (bias yes/no cross guilty yes/no), the two bias ones dominate, and their relative odds are the same as when you started.
So, why not stab someone in front of everyone to ensure that they all rule you guilty?
If you are more confident that the method is inaccurate when it is operating then it being low spread is indication that it is not operating. A TV that shows a static image that flickers when you kick it more likely is recieving actual feed than one that doesn’t flicker when punched.
If you have multiple TVs that all flicker at the same time it is likely that the cause was the weather rather than the broadcast
Can you clarify what youre talking about without using the terms method, operating and spread.
I have a device that displays three numbers when a button is pressed. If any two numbers are different then one of the numbers is the exact room temperature but no telling which one it is.
If all the numbers are the same number I don’t have any reason to think the displayed number would be the room temperature. In a way I have two info channels “did the button pressing result in a temperature reading?” and “if there was a temperature reading what it tells me about the true temperature?”. The first of these channels doesn’t tell me anything about the temperature but it tells me about something.
Or I could have three temperature meters one of which is accurate in cold, on in moderate temperatures and one in hot temperatures. Suppose that cold and hot don’t overlap. If all the temperature cauges show the same number it would mean both the cold and hot meters would in fact be accurate in the same temperatures. I can not be more certain about the temperature than the operating principles of the measuring device as the temperature is based on those principles. The temperature gauges showing differnt temperatures supports me being rigth about the operating principles. Them being the same is evidence that I am ignorant on how those numbers are formed.
Very well explained :)
https://en.wikipedia.org/wiki/Central_limit_theorem
That is the case that +ing amongs many should be gaussian. If the distribution is too narrow to be caussian it tells against the “+ing” theory. Someone who is amadant that it is just a very narrow caussian could never be proven conclusively wrong. However it places restraints on how ranodm the factors can be. At some point the claim of regularity will become implausible. If you have something that claims that throwing a fair dice will always come up with the same number there is an error lurking about.
The variance of the Gaussian you get isn’t arbitrary and related to the variance of variables being combined. So unless you expect people picking folks out of a lineup to be mostly noise-free, a very narrow Gaussian would imply a violation of assumptions of CLT.
This Jewish law thing is sort of an informal law version of how frequentist hypothesis testing works: assume everything is fine (null) and see how surprised we are. If very surprised, reject assumption that everything is fine.
Thus our knowledge on people being noisy means the mean is illdefined instead of inaccurate.
Sorry, what?
having unanimous tesitimony means that the gaussian is too narrow to be the results of noisy testimonies. So either they gave absolutely accurate testimonies or they did something else than testify. Having them all agree raises more doubt on that everyone was trying to deliver justice than their ability to deliver it. If a jury answers a “guilty or not guilty” verdict with “banana” it sure ain’t a result of a valid justice process. Too certasin results are effectively as good as “banana” verdicts. If our assumtions about the process hold they should not happen.
I believe I read somewhere on LW about an investment company that had three directors, and when they decided whether to invest in some company, they voted, and invested only if 2 of 3 have agreed. The reasoning behind this policy was that if 3 of 3 agreed, then probably it was just a fad.
Unfortunately, I am unable to find the link.