Suppose X has murdered someone with a knife, and is being tried in a courthouse. Two witnesses step forward and vividly describe the murder. The fingerprints on the knife match X’s fingerprints. In fact, even X himself confesses to the crime. How likely is it that X is guilty?
It’s easy to construct hypotheses in which X is innocent, but which still fit the evidence. E.g. X has an enemy, Z, who bribes the two witness to give false testimony. Z commits the murder, then plants X’s fingerprints on the knife (handwave; assume Z is the type of person who will research and discover methods of transplanting fingerprints). X confesses to the murder which he did not commit because of the plea deal.
Is there any way to prove to Y (a single human) that X has committed the murder, with probability > 0.999999? (Even if Y witnesses the murder, there’s a >0.000001 chance that Y was hallucinating, or that the supposed victim is actually an animatronic, etc.)
People don’t generally form beliefs with that level of precision. “beyond a reasonable doubt” is the usual instruction, for exactly this reason. And the underlying belief is “appears likely enough that it’s preferable to hold the person publicly responsible”.
Having sat on a jury (for a rather dull case of a failed burglary), I concur with this.
Jury confidentiality is taken seriously in the UK, so I can’t comment on our deliberations, but the consensus was that it was him wot dunnit. He looked resigned rather than indignant when the verdict was read out, so with that and the evidence I’m as sure as I need to be that we got it right. I couldn’t put a number on it, but 0.000001 is way smaller than a reasonable doubt.
Six nines of reliability sounds like a lot, and it’s more than is usually achieved in criminal cases, but it’s hardly insurmountable. You just need to be confident enough that, given one million similar cases, you would make only one mistake. A combination of recorded video and DNA evidence, with reasonably good validation of the video chain of custody and of the DNA evidence-processing lab’s procedures, would probably clear this bar.
This still seems crazy confident to me though. I do think there are hypothetical people who could do it, but I don’t currently have strong reason to believe there actually exist even trained rationalists that could do it, even if they were extremely careful every single time.
Given a million evaluations of the video chain-of-custody or DNA evidence, you expect there are people who would not make a mistake (or, be actively deceived by an adversary, or have forgotten to eat lunch and not noticed they’re tired?) even twice?
If I sometimes write down a 6-nines confidence number because I’m sleepy, then this affects your posterior probability after hearing that I wrote down a 6-nines confidence number, but doesn’t reduce the validity of 6-nines confidence numbers that I write down when I’m alert. The 6-nines confidence number is inside an argument, while your posterior is outside the argument.
Not 100% sure I understand this.
My claim is “Basically everyone who writes down high confidence claims is, by default, miscalibrated and mistaken. It should take extraordinary evidence both for me to believe your high-confidence claim is calibrated, and separately, for you to believe a high confidence claim of yours is calibrated.” (But, I’d agree that you might have inside view knowledge that makes you justifiably more confident than me)
I do think there are types of things one could be theoretically 6-nine-confident about. (I’m probably that confident about how likely I am to stumble on my next footstep? But that’s because I’ve literally taken 1-3 million footsteps in my life)
I think my nearest-crux for this is “what is the actual world record for number of independent high-confidence claims anyone has made? Is there anyone with a perfect record for a large number of… even 99.99% claims, let alone 6-nine-claims?” (If there were someone who’d gotten a hundred 99.99% claims correct with no failures, I’d elevate to attention “this person might be the sort of person who can make 99.9999% claims and possibly be justified)
Do you think that overall reasoning is mistaken?
My short answer is “you probably can’t.” >0.999999 is just a lot of certainty.
There might exist particularly-well-calibrated humans who can have a justified >.0.999999 probability in a given murder trial, but my guess is that most Well Calibrated People still probably sort of cap-out in justified confidence at some point, based on what the human mind can reasonably process. After that, I think it makes less sense to think in terms of exact probabilities and more sense to think in terms of “real damn certain, enough that it’s basically certain for practical purposes, but you wouldn’t make complicated bets based on it.”
(I’m curious what Well Calibrate Rationalists think is the upper bound of how certain they can be about anything)
[Edit: yes, there are specific domains where you can fully understand a mathematical question, where you can be confident something won’t happen apart from “I might be insane or very misguided about reality” reasons.]
If I buy a ticket in the Euromillions lottery, I am over 0.99999999 sure I will lose. (There are more than 100 million possible draws.)
Yes, see response to Dagon. But, 0.99999999 seems overconfident to me. You have to account not only for “I might be insane” (what are the base rates on that?), but simpler things like “I misread the question or had a brain fart.”
Like, there’s an old LW chat log where someone claims they can be 99.999% confident about whether a low-digit number is prime. Then someone challenges them to answer “prime or not?” for ~100 numbers. And then like 25 questions in they get one wrong. 0.99999999 is a Really God Damn Confident.
I was curious to re-read the chat log, and had to do some digging on archive.org to find it. The guy made 17 bets about numbers being prime, and lost the bet on the 17th bet.
Transcript here
Sequence article that referenced it here.
Interesting followup by Chris Halliquist here:
I do think this is an important counterpoint, but still, while I agree that if a person actually thought carefully about each prime number, they’d have made it much farther than a 1-out-of-17 failure rate, I’d still bet against them successfully making 10,000 careful statements without ever screwing up in some dumb way.
Anecdata: In the mobile game Golf Rivals, it is trivial to sink a putt from any distance on the green, with a little bit of care. I (and opponents) miss about 1 in 1000 times
+3 for the concrete example.
Those could go either way.
Not so. “X is guilty” is a very specific hypothesis and 0.99999999 is Very Confident, so general increases in uncertainty should make you think it’s less likely that “X is guilty” is true. For example, if I’m told I misread the question, since I will not be 0.99999999 confident on nearly every question, since I now have non-trivial probability mass on other questions, I should become less confident.
The result is that it takes a specific misreading to make you more confident and that most misreadings will make you less confident, so you should become less confident.
In the log-odds space, both directions look the same. You can wander up as easily as down.
I don’t know what probability space you have in mind for the set of all possible phenomena leading to an error, that would give a basis for saying that most errors will lie in one direction.
When I calculated the odds for the Euromillions lottery, my first calculation omitted to divide by a factor to account for there being no ordering on the chosen numbers, giving a probability for winning that was too small by a factor of 240. The true value is about 140 million to 1.
I have noted before that ordinary people, too ignorant to know that clever people think it impossible, manage to collect huge jackpots. It is literally news when they do not.
It’s not a random walk among probabilities, it’s a random walk among questions, which have associated probabilities. This results in a non-random walk downwards in probability.
The underlying distribution might be described best as “nearly all questions cannot be decided with probabilities that are as certain as 0.999999”.
There is a difference in “error in calculation” versus “error in interpreting the question”. The former affects the result in such a way that makes it roughly as likely to go up as down. If you err in interpreting the question, you’re placing higher probability mass on other questions, which you are less than 0.999999 certain about on average. Roughly, I’m saying that you expect regression to the mean effects to apply in proportion to the uncertainty. E.g. If I tell you I scored an 90% on my test for which the average was a 70%, then you expect me to score a bit lower on a test of equal difficulty. However, if I tell you that I guessed on half the questions, then you should expect me to score a lot lower than you did if you assumed I guessed on 0 questions.
I don’t know why the last comment is relevant. I agree that 1 in a million odds happen 1 in a million times. I also agree that people win the lottery. My interpretation is that it means “sometimes people say impossible when they really mean extremely unlikely”, which I agree is true.
The point was not that people win the lottery. It’s that when they do, they are able to update against the over 100 million-to-one odds that this has happened. “No, no,” say the clever people who think the human mind is incapable of such a shift in log-odds, “far more likely that you’ve made a mistake, or the lottery doesn’t even exist, or you’ve had a hallucination.” The clever people are wrong.
Anecdata: people who win large lotteries often express verbal disbelief, and ask others to confirm that they are not hallucinating. In fact, some even express disbelief while sitting in the mansion they bought with their winnings!
And yet, despite saying “Inconceivable!” they did collect their winnings and buy the mansion.
Right, but they don’t update to that from a single data point (looking at the winning numbers and their ticket once), they seek out additional data until they have enough subjective evidence to update to the very, very, unlikely event (and they are able to do this because the event actually happened). Probably hundreds of people think they won any given lottery at first, but when they double-check, they discover that they did not.
Seems like what matters is “if you make 1000000 claims that you’re .999999 confident in, will you be right 999999 times?” Yes, insanity and brain farts could go in any direction, but it goes in sufficiently many directions (at least two) such that I bet you if you try to make a hundred 99.9999% confidence claims you’ll screw up at least once.
Even if you include esoteric options, like being a Boltzmann brain, you can have negatives with way more probability than 999999/1000000. It’s EASY to be more certain than that on “will I fail to win the next powerball drawing”. And more certain still on “did I fail to win the previous powerball drawing”.
Some recursive positives approach 1 - “I exist”. Tautologies remain actually 1: P → P.
But for random human-granularity events where you have only very indirect evidence, you’re right. 99% would be surprising, 95% would take a fair bit of effort.
Yeah, I agree there are domains where you can be more confident because you fully understand the domain (and then only have to account for model uncertainty in “I’m literally insane or in a simulation or whatever.”)
how can you tell what your own limits are?
I would start by trying to get calibrated generally, using something like the credence game. (You will probably start out not even able to be reliably confident in 90% likely statements).
I think there might be a better variant of the game available somewhere that someone has built in the past few years, but this what I could easily remember.
what’s an example of a complicated bet that you shouldn’t take even if you’re real damn certain?
Most of them. The fact that you’re being offered an unusual bet is itself evidence that you’re wrong. The Guys and Dolls quote applies pretty widely:
I’m pretty certain that an asteroid won’t destroy human civilisation next year but I still want better astroid defense (which is mostly more surveilance in our solar system).
This problem is known in the philosophy of science as the underdetermination problem. Multiple hypotheses can fit the data. If we don’t assign a priori probabilties to hypotheses, we will never reach a conclusion. For example, the hypothesis that (a) Stephen Hawking lived till 2018 against (b) There was a massive conspiracy by his relatives and friends to take his existence after his death in 1985. (That was an actual conspiracy theory). No quantity of evidence can refute the second theory. We can always increase the number of conspirators. The only reason we choose (1) over (2) is the implausibility of (2).
If X has confessed, how can he be on trial?
1. X confesses to police, but later claims that the confession was cooerced, and asks for a trial.
2. X confesses to some part of the crime “I was holding the knife that penetrated the deceased” but not all of it “but I was sleepwalking at the time, so it’s not Murder” or “but I was in a jealous rage at the time, so it’s not pre-meditated”
3. X confesses, but the prosecutor believes that other people were involved (regardless of the status of X) and is holding a joint trial for all the accused.