Confidence levels inside and outside an argument
Related to: Infinite Certainty
Suppose the people at FiveThirtyEight have created a model to predict the results of an important election. After crunching poll data, area demographics, and all the usual things one crunches in such a situation, their model returns a greater than 999,999,999 in a billion chance that the incumbent wins the election. Suppose further that the results of this model are your only data and you know nothing else about the election. What is your confidence level that the incumbent wins the election?
Mine would be significantly less than 999,999,999 in a billion.
When an argument gives a probability of 999,999,999 in a billion for an event, then probably the majority of the probability of the event is no longer in “But that still leaves a one in a billion chance, right?”. The majority of the probability is in “That argument is flawed”. Even if you have no particular reason to believe the argument is flawed, the background chance of an argument being flawed is still greater than one in a billion.
More than one in a billion times a political scientist writes a model, ey will get completely confused and write something with no relation to reality. More than one in a billion times a programmer writes a program to crunch political statistics, there will be a bug that completely invalidates the results. More than one in a billion times a staffer at a website publishes the results of a political calculation online, ey will accidentally switch which candidate goes with which chance of winning.
So one must distinguish between levels of confidence internal and external to a specific model or argument. Here the model’s internal level of confidence is 999,999,999/billion. But my external level of confidence should be lower, even if the model is my only evidence, by an amount proportional to my trust in the model.
One might be tempted to respond “But there’s an equal chance that the false model is too high, versus that it is too low.” Maybe there was a bug in the computer program, but it prevented it from giving the incumbent’s real chances of 999,999,999,999 out of a trillion.
The prior probability of a candidate winning an election is 50%1. We need information to push us away from this probability in either direction. To push significantly away from this probability, we need strong information. Any weakness in the information weakens its ability to push away from the prior. If there’s a flaw in FiveThirtyEight’s model, that takes us away from their probability of 999,999,999 in of a billion, and back closer to the prior probability of 50%
We can confirm this with a quick sanity check. Suppose we know nothing about the election (ie we still think it’s 50-50) until an insane person reports a hallucination that an angel has declared the incumbent to have a 999,999,999/billion chance. We would not be tempted to accept this figure on the grounds that it is equally likely to be too high as too low.
A second objection covers situations such as a lottery. I would like to say the chance that Bob wins a lottery with one billion players is 1⁄1 billion. Do I have to adjust this upward to cover the possibility that my model for how lotteries work is somehow flawed? No. Even if I am misunderstanding the lottery, I have not departed from my prior. Here, new information really does have an equal chance of going against Bob as of going in his favor. For example, the lottery may be fixed (meaning my original model of how to determine lottery winners is fatally flawed), but there is no greater reason to believe it is fixed in favor of Bob than anyone else.2
Spotted in the Wild
The recent Pascal’s Mugging thread spawned a discussion of the Large Hadron Collider destroying the universe, which also got continued on an older LHC thread from a few years ago. Everyone involved agreed the chances of the LHC destroying the world were less than one in a million, but several people gave extraordinarily low chances based on cosmic ray collisions. The argument was that since cosmic rays have been performing particle collisions similar to the LHC’s zillions of times per year, the chance that the LHC will destroy the world is either literally zero, or else a number related to the probability that there’s some chance of a cosmic ray destroying the world so miniscule that it hasn’t gotten actualized in zillions of cosmic ray collisions. Of the commenters mentioning this argument, one gave a probability of 1/3*10^22, another suggested 1/10^25, both of which may be good numbers for the internal confidence of this argument.
But the connection between this argument and the general LHC argument flows through statements like “collisions produced by cosmic rays will be exactly like those produced by the LHC”, “our understanding of the properties of cosmic rays is largely correct”, and “I’m not high on drugs right now, staring at a package of M&Ms and mistaking it for a really intelligent argument that bears on the LHC question”, all of which are probably more likely than 1/10^20. So instead of saying “the probability of an LHC apocalypse is now 1/10^20”, say “I have an argument that has an internal probability of an LHC apocalypse as 1/10^20, which lowers my probability a bit depending on how much I trust that argument”.
In fact, the argument has a potential flaw: according to Giddings and Mangano, the physicists officially tasked with investigating LHC risks, black holes from cosmic rays might have enough momentum to fly through Earth without harming it, and black holes from the LHC might not3. This was predictable: this was a simple argument in a complex area trying to prove a negative, and it would have been presumptous to believe with greater than 99% probability that it was flawless. If you can only give 99% probability to the argument being sound, then it can only reduce your probability in the conclusion by a factor of a hundred, not a factor of 10^20.
But it’s hard for me to be properly outraged about this, since the LHC did not destroy the world. A better example might be the following, taken from an online discussion of creationism4 and apparently based off of something by Fred Hoyle:
In order for a single cell to live, all of the parts of the cell must be assembled before life starts. This involves 60,000 proteins that are assembled in roughly 100 different combinations. The probability that these complex groupings of proteins could have happened just by chance is extremely small. It is about 1 chance in 10 to the 4,478,296 power. The probability of a living cell being assembled just by chance is so small, that you may as well consider it to be impossible. This means that the probability that the living cell is created by an intelligent creator, that designed it, is extremely large. The probability that God created the living cell is 10 to the 4,478,296 power to 1.
Note that someone just gave a confidence level of 10^4478296 to one and was wrong. This is the sort of thing that should never ever happen. This is possibly the most wrong anyone has ever been.
It is hard to say in words exactly how wrong this is. Saying “This person would be willing to bet the entire world GDP for a thousand years if evolution were true against a one in one million chance of receiving a single penny if creationism were true” doesn’t even begin to cover it: a mere 1/10^25 would suffice there. Saying “This person believes he could make one statement about an issue as difficult as the origin of cellular life per Planck interval, every Planck interval from the Big Bang to the present day, and not be wrong even once” only brings us to 1/10^61 or so. If the chance of getting Ganser’s Syndrome, the extraordinarily rare psychiatric condition that manifests in a compulsion to say false statements, is one in a hundred million, and the world’s top hundred thousand biologists all agree that evolution is true, then this person should preferentially believe it is more likely that all hundred thousand have simultaneously come down with Ganser’s Syndrome than that they are doing good biology5
This creationist’s flaw wasn’t mathematical; the math probably does return that number. The flaw was confusing the internal probability (that complex life would form completely at random in a way that can be represented with this particular algorithm) with the external probability (that life could form without God). He should have added a term representing the chance that his knockdown argument just didn’t apply.
Finally, consider the question of whether you can assign 100% certainty to a mathematical theorem for which a proof exists. Eliezer has already examined this issue and come out against it (citing as an example this story of Peter de Blanc’s). In fact, this is just the specific case of differentiating internal versus external probability when internal probability is equal to 100%. Now your probability that the theorem is false is entirely based on the probability that you’ve made some mistake.
The many mathematical proofs that were later overturned provide practical justification for this mindset.
This is not a fully general argument against giving very high levels of confidence: very complex situations and situations with many exclusive possible outcomes (like the lottery example) may still make it to the 1/10^20 level, albeit probably not the 1/10^4478296. But in other sorts of cases, giving a very high level of confidence requires a check that you’re not confusing the probability inside one argument with the probability of the question as a whole.
Footnotes
1. Although technically we know we’re talking about an incumbent, who typically has a much higher chance, around 90% in Congress.
2. A particularly devious objection might be “What if the lottery commissioner, in a fit of political correctness, decides that “everyone is a winner” and splits the jackpot a billion ways? If this would satisfy your criteria for “winning the lottery”, then this mere possibility should indeed move your probability upward. In fact, since there is probably greater than a one in one billion chance of this happening, the majority of your probability for Bob winning the lottery should concentrate here!
3. Giddings and Mangano then go on to re-prove the original “won’t cause an apocalypse” argument using a more complicated method involving white dwarf stars.
4. While searching creationist websites for the half-remembered argument I was looking for, I found what may be my new favorite quote: “Mathematicians generally agree that, statistically, any odds beyond 1 in 10 to the 50th have a zero probability of ever happening.”
5. I’m a little worried that five years from now I’ll see this quoted on some creationist website as an actual argument.
- The Library of Scott Alexandria by 14 Sep 2015 1:38 UTC; 126 points) (
- 8 Jan 2011 18:02 UTC; 81 points) 's comment on A Bayesian Argument for the Resurrection of Jesus by (
- Hearsay, Double Hearsay, and Bayesian Updates by 16 Feb 2012 22:31 UTC; 68 points) (
- What (standalone) LessWrong posts would you recommend to most EA community members? by 9 Feb 2022 0:31 UTC; 67 points) (EA Forum;
- On coincidences and Bayesian reasoning, as applied to the origins of COVID-19 by 19 Feb 2024 1:14 UTC; 62 points) (
- The Control Group Is Out Of Control by 29 Apr 2014 0:46 UTC; 43 points) (
- Book Review: The Precipice by 9 Apr 2020 21:21 UTC; 39 points) (EA Forum;
- A Suggested Reading Order for Less Wrong [2011] by 8 Jul 2011 1:40 UTC; 38 points) (
- The Use of Many Independent Lines of Evidence: The Basel Problem by 3 Jun 2013 4:42 UTC; 38 points) (
- 18 Aug 2022 9:54 UTC; 36 points) 's comment on Concrete Advice for Forming Inside Views on AI Safety by (EA Forum;
- Index of Yvain’s (Excellent) Articles by 30 Jun 2011 9:57 UTC; 36 points) (
- Model Uncertainty, Pascalian Reasoning and Utilitarianism by 14 Jun 2011 3:19 UTC; 35 points) (
- Yvain’s most important articles by 16 Aug 2015 8:27 UTC; 35 points) (
- When the uncertainty about the model is higher than the uncertainty in the model by 28 Nov 2014 18:12 UTC; 30 points) (
- Conjunction fallacy and probabilistic risk assessment. by 8 Mar 2012 15:07 UTC; 26 points) (
- 13 Jun 2011 16:09 UTC; 20 points) 's comment on Rewriting the sequences? by (
- 29 Nov 2014 0:49 UTC; 16 points) 's comment on When the uncertainty about the model is higher than the uncertainty in the model by (
- 30 Sep 2014 17:03 UTC; 15 points) 's comment on Open thread, Sept. 29 - Oct.5, 2014 by (
- On ‘Why Global Poverty?’ and Arguments from Unobservable Impacts by 13 Feb 2016 6:04 UTC; 15 points) (
- 2 Jun 2014 15:53 UTC; 12 points) 's comment on Rationality Quotes June 2014 by (
- 5 Nov 2012 0:08 UTC; 12 points) 's comment on 2012 Less Wrong Census/Survey by (
- Knightian Uncertainty from a Bayesian perspective by 4 Feb 2014 4:16 UTC; 12 points) (
- A Cruciverbalist’s Introduction to Bayesian reasoning by 4 Apr 2021 8:50 UTC; 11 points) (
- 16 Jun 2012 16:01 UTC; 11 points) 's comment on How confident is your atheism? by (
- 21 Sep 2023 3:29 UTC; 10 points) 's comment on Microdooms averted by working on AI Safety by (EA Forum;
- 18 Aug 2022 9:56 UTC; 9 points) 's comment on Concrete Advice for Forming Inside Views on AI Safety by (
- Multiplicitous by 18 Dec 2016 16:39 UTC; 9 points) (
- 26 Aug 2012 12:43 UTC; 9 points) 's comment on [SEQ RERUN] Psychic Powers by (
- On ‘Why Global Poverty?’ and Arguments from Unobservable Impacts by 25 Feb 2016 23:17 UTC; 8 points) (EA Forum;
- 22 Mar 2023 0:35 UTC; 8 points) 's comment on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by (
- 5 Feb 2023 20:48 UTC; 8 points) 's comment on DragonGod’s Shortform by (
- 4 Sep 2022 15:40 UTC; 7 points) 's comment on Value of Infomation, an example with GiveDirectly by (EA Forum;
- 17 Jul 2012 13:12 UTC; 7 points) 's comment on [LINK] Nick Szabo: Beware Pascal’s Scams by (
- 16 Aug 2022 18:45 UTC; 5 points) 's comment on Refuting longtermism with Fermat’s Last Theorem by (EA Forum;
- 7 Apr 2015 0:02 UTC; 5 points) 's comment on A Bayesian Argument for the Resurrection of Jesus by (
- 30 Jul 2014 12:05 UTC; 5 points) 's comment on Open thread, July 28 - August 3, 2014 by (
- 21 Feb 2020 21:24 UTC; 5 points) 's comment on Suspiciously balanced evidence by (
- 10 Oct 2012 0:03 UTC; 5 points) 's comment on Rationality: Appreciating Cognitive Algorithms by (
- 12 Sep 2013 14:31 UTC; 5 points) 's comment on Pascal’s Mugging: Tiny Probabilities of Vast Utilities by (
- 12 May 2012 2:16 UTC; 5 points) 's comment on Neil deGrasse Tyson on Cryonics by (
- 19 Dec 2011 11:01 UTC; 4 points) 's comment on [SEQ RERUN] Infinite Certainty by (
- 11 Aug 2013 19:22 UTC; 4 points) 's comment on Common sense as a prior by (
- 4 Oct 2012 3:24 UTC; 3 points) 's comment on Skill: The Map is Not the Territory by (
- LW is to rationality as AIXI is to intelligence by 6 Mar 2011 20:24 UTC; 3 points) (
- 3 Nov 2011 21:19 UTC; 3 points) 's comment on 2011 Less Wrong Census / Survey by (
- 28 Oct 2011 10:49 UTC; 3 points) 's comment on Best Intro to LW article for transhumanists by (
- 1 Apr 2012 15:48 UTC; 3 points) 's comment on How Much Evidence Does It Take? by (
- 4 Nov 2013 20:31 UTC; 3 points) 's comment on Rationality Quotes November 2013 by (
- 17 Sep 2011 13:42 UTC; 2 points) 's comment on The Optimizer’s Curse and How to Beat It by (
- Meetup : West LA—R:AZ Part C, Noticing Confusion by 3 Apr 2015 7:51 UTC; 2 points) (
- 12 Oct 2013 20:09 UTC; 2 points) 's comment on But There’s Still A Chance, Right? by (
- 7 Aug 2011 13:31 UTC; 1 point) 's comment on Beware of Other-Optimizing by (
- 24 Apr 2012 20:14 UTC; 1 point) 's comment on How can we get more and better LW contrarians? by (
- How to convince Y that X has committed a murder with >0.999999 probability? by 19 May 2020 22:55 UTC; 1 point) (
- 3 Sep 2021 20:40 UTC; 1 point) 's comment on LVSN’s Shortform by (
- 24 Nov 2013 17:29 UTC; 1 point) 's comment on 2013 Less Wrong Census/Survey by (
- 21 Dec 2011 6:25 UTC; 1 point) 's comment on [SEQ RERUN] 0 And 1 Are Not Probabilities by (
- 16 May 2012 14:12 UTC; 1 point) 's comment on Open Thread, May 16-31, 2012 by (
- 1 Aug 2012 14:22 UTC; 1 point) 's comment on Thoughts on a possible solution to Pascal’s Mugging by (
- 28 Aug 2011 6:26 UTC; 1 point) 's comment on Welcome to Less Wrong! (2010-2011) by (
- The Art is Long and Nerdy by 20 Jan 2017 17:49 UTC; 1 point) (
- 20 Nov 2017 1:09 UTC; 1 point) 's comment on Less Wrong Lacks Representatives and Paths Forward by (
- 8 Mar 2012 15:41 UTC; 1 point) 's comment on Conjunction fallacy and probabilistic risk assessment. by (
- 26 Oct 2011 20:46 UTC; 0 points) 's comment on Amanda Knox: post mortem by (
- 14 Jun 2013 15:41 UTC; 0 points) 's comment on Anticipating critical transitions by (
- 13 Sep 2017 18:52 UTC; 0 points) 's comment on P: 0 ⇐ P ⇐ 1 by (
That reminds me of one of my favourites, from a pro-abstinence blog:
In Terry Pratchett’s Discworld series, it is a law of narrative causality that 1 in a million chances work out 9 times out of 10. Some characters once made a difficult thing they were attempting artificially harder, to try to make the probability exactly 1 in a million and invoke this trope.
That’s pretty awesome. (He’s already on my list of authors to read if I ever acquire an attention span sufficient for novels.)
It’s worth pointing out that two of his books (Hogfather and Color of Magic) have been made in to movies. I’m not sure how hard they are to find, but I know NetFlix has at least one of them. I’ve only seen Hogfather, but I thought it was a pretty good adaptation of the book :)
Pratchett is near the top of my to-read list, but I don’t know which book(s) to start with. Color of Magic was the first in the series, but it doesn’t seem like the kind of series that needs to be read in order. Mort, Hogfather, Wee Free Men, and Witches Abroad have all been mentioned favorably on LW, so maybe one of those? Recommendations?
I started with Color of Magic, but didn’t really get into it much. It was fine writing, but nothing very special. Then I read some later works and realised that he got much better. As there’s no reason to read them in order (as you say), this means that you probably shouldn’t!
(My favourite is Night Watch, but I’ve still only read a few, so you should probably ignore that.)
This question comes up a lot! A fan has come up with a very sensible and helpful chart, in many languages no less! http://www.lspace.org/books/reading-order-guides/
There are more connections between the books than are laid out in that chart though. The Last Hero, for instance, features members of the Night Watch cast about as strongly as the Wizards cast, and other books have minor connections to each other that are simply inconvenient to draw out because they’re far away from each other on the chart.
Rincewind’s stories are pretty much all in the vein of fantasy novel satire, while later books tended more towards social commentary in a humorous fantasy setting, so they do end up being a bit disconnected from the books that come later in the series.
Thanks! (distributed also to the other replies)
I think I’ll start with Mort and then go from there.
This confirms my vague feeling that Rincewind’s stuff is not particuarly well connected to the rest of Discworld.
I went to a talk by Pratchett and he pretty much admitted the same thing. He suggested starting with book 6 or so. :)
I’ve read all of them except the Tiffany Aching ones, and Night Watch is still my favorite.
I think it’s better if you’re already well familiar with the Night Watch books and the setting of Ankh Morpork before you read it though.
Read the Tiffany Aching ones. They’re not just for children, but especially read them if you have or ever expect to have children. These are the stories on which baby rationalists ought to be raised.
I have read the first three since I left that comment (so all but I Shall Wear Midnight,) and I thought they were, at least pretty good, as all the Discworld books were, but as far as younger-readers’ Discworld books go, I rate The Amazing Maurice and His Educated Rodents more highly.
Same here. I never finished CoM, but became hooked after picking up Equal Rites.
I started by reading a few from around the middle in no particular order (starting with Soul Music), then bought the whole series and read them from the start. Reading them in the disorder is not much of a problem, even books that are part of the same series with the same characters have stories that stand up wholly on their own.
The series:
The Rincewind series: the first Discworld books are in it, but it’s not the best; I’d recommend the others first. It’s probably best to read the books in this series in order.
The Witches series: starts with Equal Rites, but starting with Wyrd Sisters is fine (Equal Rites is one of the early books, and not very heavily linked to the rest). I’d recommend reading Wyrd Sisters ⇒ Witches Abroad ⇒ Lords and Ladies etc. in order. Probably my favorite series.
The city watch series: starts with Guards! Guards!, I’d recommend reading them in order. A pretty good series.
The Death series: has several books, but they aren’t heavily linked to one another, except maybe towards the end (I’d recommend reading Soul Music before Hogfather).
Standalone books: Small Gods, Moving Pictures, Pyramids … not part of any series, but quite good.
Moist von Lipvig—Going Postal, Making Money. Don’t miss them.
Thief of Time (standalone but loosely related to the Death books) is a favourite of mine too.
Do you ever go to movies?
Once in a while.
In my experience reading a (good) novel requires little, if any, more attention than watching a movie. I do read unusually quickly, but I honestly find it almost easier to be wrapped up in a good book than to be invested in a movie, especially if it’s a book as good as one of Pratchett’s. You should definitely give him a try.
One thing I find is that books require a bit of effort to get into, whereas movies force themselves upon you.
I find almost the reverse. Movies seem to be significantly more likely to have weird errors or other elements that break my suspension of disbelief, whereas in books the fact that I’m imagining most of the events allows me to kind of filter anything that seems too implausible into a more logical narrative.
Interesting. I find it’s much easier to suspend disbelief and make excuses for movies, since I know that they only have two hours to work for—it’s much easier to convince myself that the explanation is correct, and they just didn’t have time to go in to it on screen :)
Try and do that with Rudy Rucker, I dare you. I only endured first thirty or so pages of his “Postsingular” before all that was left of my suspension of disbelief were sad ashes and smoke started to come out of my ears.
EDIT: Although, to be fair, I haven’t tried his other books. I hear the ‘ware’ trilogy is quite good. I can’t shake off the distaste after trying “Postsingular”, though.
I would say this is true for engaging novels. This is not precisely the same set as good novels, though there is certainly much overlap. Discworld, I think, is even more representative of the former set than the latter, though, so it certainly should apply here—though no doubt the stickiness varies from person to person.
They are only admitting their poor calibration.
Heh.
Though, admitting poor calibration that way is like saying “I incorrectly believe X to be true, its actually Y”.
I was in some discussion at SIAI once and made an estimate that ended up being off by something like three hundred trillion orders of magnitude. (Something about giant look-up tables, but still.) Anyone outdo me?
Wow. The worst I’ve ever done is giving 9 orders of magnitude inside my 90% confidence interval for the velocity of the earth and being wrong. (It turns out the earth doesn’t move faster than the speed of light!)
Surely declaring “x is impossible”, before witnessing x, would be the most wrong you could be?
I take more issue with the people who incredulously shout “That’s impossible!” after witnessing x.
I don’t. You can witness a magician, e.g., violating conservation of matter, and still declare “that’s impossible!”
Basically, you’re stating that you don’t believe that the signals your senses reported to you are accurate.
The colloquial meaning of “x is impossible” is probably closer to “x has probability <0.1%” than “x has probability 0”
This is good, but I feel like we’d better represent human psychology if we said:
Most people don’t make a distinction between the concepts of “x has probability <0.1%” and “x is impossible”.
I say this because I think there’s an important difference between the times when people have a precise meaning in mind, which they’ve expressed poorly, and the times when people’s actual concepts are vague and fuzzy. (Often, people don’t realise how fuzzy their concepts are).
Probability zero and impossibility are not exactly the same thing. A possible event can have the probability 0. But an impossible event has the probability 0.
You are referring to the mathematical definition of impossibility, and I am well aware of the fact that it is different from probability zero (flipping a coin forever without getting tails has probability zero but is not mathematically impossible). My point is that neither of those is actually what most people (as opposed to mathematicians and philosophers) mean by impossible.
Probabilities of 1 and 0 are considered rule violations and discarded.
What should we take for P(X|X) then?
And then what can I put you down for the probability that Bayes’ Theorem is actually false? (I mean the theorem itself, not any particular deployment of it in an argument.)
He’s addressed that:
and then
Ah, thanks for the pointer. Someone’s tried to answer the question about the reliability of Bayes’ Theorem itself too I see. But I’m afraid I’m going to have to pass on this, because I don’t see how calling something a syntactic elimination rule instead a law of logic saves you from incoherence.
I’d be interested to hear your thoughts on why you believe EY is incoherent? I thought that what EY said makes sense. Is the probability of a tautology being true 1? You might think that it is true by definition, but what if the concept is not even wrong, can you absolutely rule out that possibility? Your sense of truth by definition might be mistaken in the same way as the experience of a Déjà vu. The experience is real, but you’re mistaken about its subject matter. In other words, you might be mistaken about your internal coherence and therefore assign a probability to something that was never there in the first place. This might be on-topic:
Nothing has a probability of 1, including this sentence, as doubt always remains, or does it? It’s confusing for sure, someone with enough intellectual horsepower should write a post on it.
Did I accuse someone of being incoherent? I didn’t mean to do that, I only meant to accuse myself of not being able to follow the distinction between a rule of logic (oh, take the Rule of Detachment for instance) and a syntactic elimination rule. In virtue of what do the latter escape the quantum of sceptical doubt that we should apply to other tautologies? I think there clearly is a distinction between believing a rule of logic is reliable for a particular domain, and knowing with the same confidence that a particular instance of its application has been correctly executed. But I can’t tell from the discussion if that’s what’s at play here, or if it is, whether it’s being deployed in a manner careful enough to avoid incoherence. I just can’t tell yet. For instance,
I don’t know what this amounts to without following a more detailed example.
It all seems to be somewhat vaguely along the lines of what Hartry Field says in his Locke lectures about rational revisability of the rules of logic and/or epistemic principles; his arguments are much more detailed, but I confess I have difficulty following him too.
Althoug I’m not sure exactly what to say about it, there’s some kind of connection here to Created Already in Motion and The Bedrock of Fairness—in each case you have an infinite regress of asking for a logical axiom justifying the acceptance of a logical axiom justifying the acceptance of a logical axiom, asking for fair treatment of people’s ideas of fair treatment of people’s ideas of fair treatment, or asking for the probability that a probability of a ratio of probabilities being correct is correct.
Is the probability for the correctness of this statement—smaller than 1?
Obviously
So, you say, it’s possible it isn’t true?
I would say that according to my model (i.e. inside the argument (in this post’s terminology)), it’s not possible that that isn’t true, but that I assign greater than 0% credence to the outside-the-argument possibility that I’m wrong about what’s possible.
(A few relevant posts: How to Convince Me That 2 + 2 = 3; But There’s Still A Chance, Right?; The Fallacy of Gray)
You can think for a moment, that 1024*10224=1048578. You can make an honest arithmetic mistake. More probable for bigger numbers, less probable for smaller. Very, very small for 2 + 2 and such. But I wouldn’t say it’s zero, and also not that the 0 is always excluded with the probability 1.
Exclusion of 0 and 1 implies, that this exclusion is not 100% certain. Kind of a probabilistic modus tollens.
What is it that is true? (Just to clarify..)
This:
Discarding 0 and 1 from the game implies, that we have a positive probability—that they are wrongly excluded.
Indeed
I get quite annoyed when this is treated as a refutation of the argument that absolute truth doesn’t exist. Acknowledging that there is some chance that a position is false does not disprove it, any more than the fact that you might win the lottery means that you will.
Someone claiming that absolute truths don’t exist has no right to be absolutely certain of his own claim. This of course has no bearing on the actual truth of his claim, nor the truth of the supposed absolute truth he’s trying to refute by a fully generic argument against absolute truths.
I rather prefer Eliezer’s version, that confidence of 2^n to 1, requires [n—log base 2 of prior odds] bits of evidence to be justified. Not only does this essentially forbid absolute certainty (you’d need infinite evidence to justify absolute certainty), but it is actually useful for real life.
That’s quite a lot. Can you tell us what the estimate was?
Well there are billions of people who believe things with p=1… things like “God exists.”
Wow. Eliminating all “zero” probability estimates as illegal under the game rules, it’s possible that you singlehandedly dragged down the average Bayesian score of the human species by a noticeable decrement.
I’m a bit irked by the continued persistence of “LHC might destroy the world” noise. Given no evidence, the prior probability that microscopic black holes can form at all, across all possible systems of physics, is extremely small. The same theory (String Theory[1]) that has led us to suggest that microscopic black holes might form at all is also quite adamant that all black holes evaporate, and equally adamant that microscopic ones evaporate faster than larger ones by a precise factor of the mass ratio cubed. If we think the theory is talking complete nonsense, then the posterior probability of an LHC disaster goes down, because we favor the ignorant prior of a universe where microscopic black holes don’t exist at all.
Thus, the “LHC might destroy the world” noise boils down to the possibility that (A) there is some mathematically consistent post-GR, microscopic-black-hole-predicting theory that has massively slower evaporation, (B) this unnamed and possibly non-existent theory is less Kolmogorov-complex and hence more posterior-probable than the one that scientists are currently using[2], and (C) scientists have completely overlooked this unnamed and possibly non-existent theory for decades, strongly suggesting that it has a large Levenshtein distance from the currently favored theory. The simultaneous satisfaction of these three criteria seems… pretty f-ing unlikely, since each tends to reject the others. A/B: it’s hard to imagine a theory that predicts post-GR physics with LHC-scale microscopic black holes that’s more Kolmogorov-simple than String Theory, which can actually be specified pretty damn compactly. B/C: people already have explored the Kolmogorov-simple space of post-Newtonian theories pretty heavily, and even the simple post-GR theories are pretty well explored, making it unlikely that even a theory with large edit distance from either ST or SM+GR has been overlooked. C/A: it seems like a hell of a coincidence that a large-edit-distance theory, i.e. one extremely dissimilar to ST, would just happen to also predict the formation of LHC-scale microscopic black holes, then go on to predict that they’re stable on the order of hours or more by throwing out the mass-cubed rule[3], then go on to explain why we don’t see them by the billions despite their claimed stability. (If the ones from cosmic rays are so fast that the resulting black holes zip through Earth, why haven’t they eaten Jupiter, the Sun, or other nearby stars yet? Bombardment by cosmic rays is not unique to Earth, and there are plenty of celestial bodies that would be heavy enough to capture the products.)
[1] It’s worth noting that our best theory, the Standard Model with General Relativity, does not predict microscopic black holes at LHC energies. Only String Theory does: ST’s 11-dimensional compactified space is supposed to suddenly decompactify at high energy scales, making gravity much more powerful at small scales than GR predicts, thus allowing black hole formation at abnormally low energies, i.e. those accessible to LHC. And naked GR (minus the SM) doesn’t predict microscopic black holes. At all. Instead, naked GR only predicts supernova-sized black holes and larger.
[2] The biggest pain of SM+GR is that, even though we’re pretty damn sure that that train wreck can’t be right, we haven’t been able to find any disconfirming data that would lead the way to a better theory. This means that, if the correct theory were more Kolmogorov-complex than SM+GR, then we would still be forced as rationalists to trust SM+GR over the correct theory, because there wouldn’t be enough Bayesian evidence to discriminate the complex-but-correct theory from the countless complex-but-wrong theories. Thus, if we are to be convinced by some alternative to SM+GR, either that alternative must be Kolmogorov-simpler (like String Theory, if that pans out), or that alternative must suggest a clear experiment that leads to a direct disconfirmation of SM+GR. (The more-complex alternative must also somehow attract our attention, and also hint that it’s worth our time to calculate what the clear experiment would be. Simple theories get eyeballs, but there are lots of more-complex theories that we never bother to ponder because that solution-space doesn’t look like it’s worth our time.)
[3] Even if they were stable on the order of seconds to minutes, they wouldn’t destroy the Earth: the resulting black holes would be smaller than an atom, in fact smaller than a proton, and since atoms are mostly empty space the black hole would sail through atoms with low probability of collision. I recall that someone familiar with the physics did the math and calculated that an LHC-sized black hole could swing like a pendulum through the Earth at least a hundred times before gobbling up even a single proton, and the same calculation showed it would take over 100 years before the black hole grew large enough to start collapsing the Earth due to tidal forces, assuming zero evaporation. Keep in mind that the relevant computation, t = (5120 × π × G^2 × M^3) ÷ (ℏ × c^4), shows that a 1-second evaporation time is equal to 2.28e8 grams[3a] i.e. 250 tons, and the resulting radius is r = 2 × G × M ÷ c^2 is 3.39e-22 meters[3b], or about 0.4 millionths of a proton radius[3c]. That one-second-duration black hole, despite being tiny, is vastly larger than the ones that might be created by LHC -- 10^28 larger by mass, in fact[3d]. (FWIW, the Schwarzschild radius calculation relies only on GR, with no quantum stuff, while the time-to-evaporate calculation depends on some basic QM as well. String Theory and the Standard Model both leave that particular bit of QM untouched.)
[3a] Google Calculator: “(((1 s) h c^4) / (2pi 5120pi G^2)) ^ (1/3) in grams”
[3b] Google Calculator: “2 G 2.28e8 grams / c^2 in meters”
[3c] Google Calculator: “3.3856695e-22 m / 0.8768 femtometers”, where 0.8768 femtometers is the experimentally accepted charge radius of a proton
[3d] Google Calculator: “(2.28e8 g * c^2) / 14 TeV”, where 14 TeV is the LHC’s maximum energy (7 TeV per beam in a head-on proton-proton collision)
I wonder how the anti-LHC arguments on this site might look if we substitute cryptography for the LHC. Mathematicians might say the idea of mathematics destroying the world is ridiculous, but after all we have to trust that all mathematicians announcing opinions on the subject are sane, and we know the number of insane mathematicians in general is greater than zero. And anyway, their arguments would (almost) certainly involve assuming the probability of mathematics destroying the world is 0, so should obviously be disregarded. Thus, the danger of running OpenSSH needs to be calculated as an existential risk taking in our future possible light cone. (Though handily, this would be a spectacular tour de force against DRM.) For an encore, we need someone to calculate the existential risk of getting up in the morning to go to work. Also, did switching on the LHC send back tachyons to cause 9/11? I think we need to be told.
I reject Solomonoff induction as the correct technical formulation of Occam’s razor, and as an adequate foundation for Bayesian epistemology.
Looking back over ancient posts, I saw this. I upvoted it earlier, and am leaving that, but I’d like to quibble with one thing:
I think the bigger issue would be ‘this unnamed and possibly non-existent theory is an accurate description of reality’. If it’s more Kolmogorov-complex, so be it, that’s the universe’s prerogative. Increasing the Kolmogorov complexity decreases only our prior for it; it won’t change whether it is the case.
I’m not sure why one might be tempted to make this response. Is the idea that, when making any calculation at all, one is equally likely to get a number that is too big as one that is too small? But then, that’s before you have looked at the number.
Yet another counter-response is that even if the response were true, the false model could be much too high, but it can only be slightly too low, since 1-10^-9 is quite close to 1.
This is contingent upon the scale you have chosen for representing the answer. If you measure chances in log odds, they range from negative infinity to positive infinity, so any answer you come up with could have an unbounded error in either direction. See https://www.lesswrong.com/posts/QGkYCwyC7wTDyt3yT/0-and-1-are-not-probabilities
But I’m uncertain why this would be significant anyway? An asymmetry of maximum error does not necessarily imply an asymmetry of expected error.
Why does looking at the number matter?
If you have a prior expectation about what the number is likely to be, then you might reason that the true answer is likely to be closer to your prior than farther from it. But that’s essentially the answer Scott already gave in the essay—that any argument is pushing us away from our prior, and our confidence in the argument determines how far it is able to push us.
Your phrasing seems to imply you believe you are giving a different reason for thinking that the expected error is asymmetrical than the one Scott gave. If that is the case, then I don’t understand your implied reasoning.
First, great post. Second, general injunctions against giving very low probabilities to things seems to be taken by many casual readers as endorsements of the (bad) behavior “privilege the hypothesis”—e.g. moving the probability from very small to moderately small that God exists. That’s not right, but I don’t have excellent arguments for why it’s not right. I’d love it if you wrote an article on choosing good priors.
Cosma Shalizi has done some technical work that seems (to my incompetent eye) to be relevant:
http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.ejs/1256822130&page=record
That is, he takes Bayesian updating, which requires modeling the world, and answers the question ‘when would it be okay to use Bayesian updating, even though we know the model is definitely wrong—e.g. too simple?’. (Of course, making your model “not obviously wrong” by adding complexity isn’t a solution.)
I am still confused about how small the probability I should use in the God question is. I understand the argument about privileging the hypothesis and about intelligent beings being very complex and fantastically unlikely.
But I also feel that if I tried to use an argument at least that subtle, when applied to something I am at least as confused about as how ontologically complex a first cause should be, to disprove things at least as widely believed as religion, a million times, I would be wrong at least once.
See Advancing Certainty. The fact that this statement sounds comfortably modest does not exempt it from the scrutiny of the Fundamental Question of Rationality (why do you believe what you believe?). I respectfully submit that if the answer is “because I have been wrong before, where I was equally confident, in previous eras of my life when I wasn’t using arguments this powerful (they just felt powerful to me at the time)”, that doesn’t suffice—for the same reason that the Lord Kelvin argument doesn’t suffice to show that arguments from physics can’t be trusted (unless you don’t think physics has learned anything since Kelvin).
I’ve got to admit I disagree with a lot of Advancing Certainty. The proper reference class for a modern physicist who is well acquainted with the mistakes of Lord Kelvin and won’t do them again is “past scientists who were well acquainted with the mistakes of their predecessors and plan not to do them again”, which I imagine has less than a hundred percent success rate and which might have included Kelvin.
It would be a useful exercise to see whether the most rational physicists of 1950 have more successful predictions as of 2000 than the most rational physicists of 1850 did as of 1900. It wouldn’t surprise me if this were true, and so, then the physicists of 2000 could justly put themselves in a new reference class and guess they will be even more successful as of 2050 than the 1950ers were in 2000. But if the success rate after fifty years remains constant, I wouldn’t want to say “Yeah, well , we’ve probably solved all those problems now, so we’ll do better”.
Do you actually disagree with any particular claim in Advancing Certainty, or does it just seem “off” to you in its emphasis? Because when I read your post, I felt myself “disagreeing” (and panicking at the rapid upvoting), but reflection revealed that I was really having something more like an ADBOC reaction. It felt to me that the intent of your post was to say “Boo confident probabilities!”, while I tend to be on the side of “Yay confident probabilities!”—not because I’m in favor of overconfidence, but rather because I think many worries about overconfidence here tend to be ill-founded (I suppose I’m something of a third-leveler on this issue.)
And indeed, when you see people complaining about overconfidence on LW, it’s not usually because someone thinks that some political candidate has a 0.999999999 chance of winning an election; almost nobody here would think that a reasonable estimate. Instead, what you get is people saying that 0.0000000001 is too low a probability that God exists—on the basis of nothing else than general worry about human overconfidence.
I think my anti-anti-overconfidence vigilance started when I realized I had been socially intimidated into backing off from my estimate of 0.001 in the Amanda Knox case, when in fact that was and remains an entirely reasonable number given my detailed knowledge of the case. The mistake I made was to present this number as if it were something that participants in my survey should have arrived at from a few minutes of reading. Those states—the ones that survey participants were in, with reference classes like “highly controversial conviction with very plausible defense arguments”—are what probabilities like 0.1 or 0.3 are for. My state, on the other hand, was more like “highly confident inside-view conclusion bolstered by LW survey results decisively on the same side of 50%”.
But this isn’t what the overconfidence-hawks argued. What they said, in essence, was that 0.001 was just somehow “inherently” too confident. Only “irrational” people wear the attire of “P(X) = 0.001”; We Here, by contrast, are Aware Of Biases Like Overconfidence, and only give Measured, Calm, Reasonable Probabilities.
That is the mistake I want to fight, now that I have the courage to do so. Though I can’t find much to literally disagree about in your post, it unfortunately feels to me like ammunition for the enemy.
I definitely did have the “ammunition for the enemy” feeling about your post, and the “belief attire” point is a good one, but I think the broad emotional disagreement does express itself in a few specific claims:
Even if you were to control for getting tired and hungry and so on, even if you were to load your intelligence into a computer and have it do the hard work, I still don’t think you could judge a thousand such trials and be wrong only once. I admit this may not be as real a disagreement as I’m thinking, because it may be a confusion on what sort of reference class we should use to pick trials for you.
I think we might disagree on the Lord Kelvin claim. I think I would predict more of today’s physical theories are wrong than you would.
I think my probability that God exists would be several orders of magnitude higher than yours, even though I think you probably know about the same number of good arguments on the issue as I do.
Maybe our disagreement can be resolved empirically—if we were to do enough problems where we gave confidence levels on questions like “The area of Canada is greater than the area of the Mediterranean Sea” and use log odds scoring we might find one of us doing significantly better than the other—although we would have to do quite a few to close off my possible argument that we just didn’t hit that one “black swan” question on which you’d say you’re one in a million confident and then get it wrong. Would you agree that this would get to the heart of our disagreement, or do you think it revolves solely around more confusing philosophical questions?
(I took a test like that yesterday to test something and I came out overconfident, missing 2⁄10 questions at the 96% probability level. I don’t know how that translates to more real-world questions and higher confidence levels, but it sure makes me reluctant to say I’m chronically underconfident)
When I first saw this, I agreed with it. But now I don’t, partly because of the story (which I don’t have a link to, but it was linked to from LW somewhere) about someone who would bet they knew whether or not a number was a prime. This continued until they made a mistake (doing it mentally), and then they lost.
If they had a calculator, could they go up to the 1000th odd number and be wrong at most once? I’m pretty sure they could, actually. And so the question isn’t “can you judge 1000 trials and only get one wrong?” but “can you judge 1000 obvious trials and only get one wrong?”, or, more appropriately, “can you judge 1000 trials as either ‘obvious’ and ‘contested’ and only be wrong at most once?”. Because originally I was imagining being a normal trial judge- but a normal trial judge has to deal with difficult cases. Ones like the Amanda Knox case (are/should be) rare. I’m pretty confident that once you put in a reasonable amount of effort (however much komponisto did for this case), you can tell whether or not the case is one you can be confident about or one you can’t, assuming you’re carefully thinking about what would make them not open-and-shut cases.
Added to Absolute certainty LW wiki page.
This raises the question: Should scientific journals adjust the p-value that they require from an experiment, to be no larger than the probability (found empirically) that a peer-reviewed article contains a factual, logical, methodological, experimental, or typographical error?
The meta-science part would change with time, e.g. how many people read the article and found no mistakes. Doesn’t seem to mix well with a fixed result.
Maybe some separate, online thing that just reported on the probability of claims could handle the meta-science.
I don’t think the lottery is an exception. There’s a chance that you misheard and they said “million”, not “billion”.
There are really two claims here. The first one—that if some guy on the Internet has a model predicting X with 99.99% certainty, then you should assign less probability to X, absent other evidence—seems interesting, but relatively easy to accept. I’m pretty sure I’ve been reasoning this way in the past.
The second claim is exactly the same, but applied to oneself. “If I have come up with an argument that predicts X with 99.99% certainty, I should be less than 99.99% certain of X.” This is not something that people do by default. I doubt that I do it unless prompted. Great post!
Stylistic nitpick, though: things like “999,999,999 in a billion” are tricky to parse, especially when compared to “999,999,999,999 in a trillion” (which I initially read as approximately 1 in 1000 before counting the 9s) or “1/1 billion”. Counting the 9s is part of the problem, the other that the numerator is a number and the denominator is a word. What’s wrong with writing 99.99% and 99.9999%? These are different from the original values in the post, but still carry the argument, and are easier to read.
I personally find the best way to deal with such numbers is to talk about nines.
999,999,999 in a billion=99.9 999 999%= 9 nines
999,999,999,999 in a trillion=99.9 999 999 999%= 12 nines
Only to the extent you didn’t trust in the statement other than because this model says it’s probably true. It could be that you already believe in the statement strongly, and so your external level of confidence should be higher than the model suggests, or the same, etc. Closer to the prior, in other words, and on strange questions intuitive priors can be quite extreme.
Another voting example; “Common sense and statistics”, Andrew Gelman:
* “Is it Rational to Vote? Five Types of Answer and a Suggestion”, Dowding 2005; fulltext: https://pdf.yt/d/5veaHe6F5j-k6oNQ / https://www.dropbox.com/s/fxgfa04hmpfntgh/2005-dowding.pdf / http://libgen.org/scimag/get.php?doi=10.1111%2Fj.1467-856x.2005.00188.x
** 1/1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ; or to put it in context, ‘inside’ the argument, the claim is that you could hold a presidential election for every atom in the universe, and still not ever have a candidate win by one vote
*** From the comments:
What leads you to conclude that the chance of a vote margin of 1 is anywhere near 1/X of the chance of a vote margin of X? That’s not obvious, and your quote doesn’t try to derive it.
The easy-but-not-very-rigorous method is to use the principle of indifference, since there’s no particular reason a tie +/-1 should be much less likely than any other result.
If the election is balanced (the mean of the distribution is a tie), and the distribution looks anything like normal or binomial, 1/X is an underestimate of P(tie | election is within vote margin of X), since a tie is actually the most likely result. A tie +/- 1 is right next to the peak of the curve, so it should also be more than 1/X.
The 10^-90 figure cited in the paper was an example of how the calculation is very sensitive to slight imbalances—a 50⁄50 chance for each voter gave a .00006 chance of tie, while 49.9/50.1 gave the 10^-90. But knowing that an election will be very slightly imbalanced in one direction is a hard epistemic state to get to. Usually we just know something like “it’ll be close”, which could be modeled as a distribution over possible near-balances. If that distribution is not itself skewed either direction, then we again find that individual results near the mean should be at least 1/X.
I recently wrote about why voting is a terrible idea and fell into the same error as Gelman (I assumed 49.9-50.1 a priori is conservative). Wes and gwern, thanks for correcting me! In fact, due to the Median Voter Theorem and with better and better polling and analysis we may assume that the distribution of voter distributions should have a peak at 50-50.
Of course, there are other great reasons not to vote (mainly to avoid “enlisting in the army” and letting your mind be killed. My suggestion is always to find a friend who is a credible threat to vote for the candidate you despise most and invite him to a beer on election day under the condition that neither of you will vote and you will not talk about politics. Thus, you maintain your friendship while cancelling out the votes. I call it the VAVA (voter anti-voter annihilation) principle.
“Politics is the mindkiller” is an argument for why people should avoid getting into political discussion on Lesswrong; it is not an argument against political involvement in general. Rationalists completely retreating from Politics would likely lower the sanity waterline as far as politics is concerned. Rationalists should get more involved in politics (but outside Lesswrong) of course.
That’s an important and non-obvious assumption to make.
So, in short, the 10^-90 figure is based on the explicit assumption that the election is not balanced?
That’s why the two methods you mention produce such wildy different figures; they base their calculations on different basic assumptions. One can argue back and forth about the validity or lack thereof of a given set of assumptions, of course...
Yes, I agree.
I’m much more sympathetic to the 10^-90 estimate in the paper than Gelman’s quote is; I think he misrepresents the authors in claiming they asserted that probability, when actually they offered it as a conditional (if you model it this way, then it’s 10^-90).
That is why I posted it as a comment on this particular post, after all. It’s clear that our subjective probability of casting a tie-breaking vote is going to be far less extreme than 10^-90 because our belief in the binomial idealization being correct puts a much less extreme bound on the tie-breaking vote probability than just taking 10^-90 at face value.
This one seems pretty relevant here:
http://arxiv.org/abs/0810.5515
Thanks, also added to the wiki page (which now seems to have two related but non-identical topics and probably needs to split).
It seems to me we can use the very high confidence levels and our understanding of the area in question to justify ignoring, heavily discounting, or accepting the arguments. We can do this on the basis that it takes a certain amount of evidence to actually produce accurate beliefs.
In the case of the creationist argument, a confidence level of 10^4,478,296 to 1 requires (really) roughly 12,000,000 bits of evidence. (10^4,000,000 =~ 2^12,000,000). The creationist presents these twelve million bits in the form of observations about cells. Now, using our knowledge of biology and cells (specifically, their self-assembling nature, restrictions on which proteins can combine, persistence and reproduction) we can confidently say that observations of cells do not provide 12,000,000 bits of evidence.
I’m not knowledgeable about biology, so I can’t say how many bits of evidence for a creator they provide in this manner but I gather it’s not many. We then adjust the argument’s strength down to that many bits of evidence. In effect, we are discounting the creationist argument for a lack of understanding, and discounting it by exactly how much it lacks understanding.
Applying this to the LHC argument: the argument specifies odds of 10^25 to 1, and the evidence is in the form of cosmic ray interactions not destroying the world. Based on our understanding of the physics involved (including our understanding of the results of Giddings and Mangano), we can say that cosmic ray interactions don’t provide quite as much evidence as the argument claims—but they provide most of the evidence they claimed to (even if we have to resort to our knowledge about white dwarf stars).
I think we should prefer to downgrade the argument from our knowledge about the relevant area rather than hallucinations or simple error, because the prior for ‘our understanding is not complete’ is higher than ‘hallucinating’ and ‘simple error’ put together—and, to put it bluntly, in the social process of beliefs and arguments, most people are capable of completely dismissing arguments from hallucination and simple error, but are far less capable of dismissing arguments from incomplete knowledge.
As for inside / outside the argument, I found it helpful while reading the post to think of the outside view as a probability mass split between A and ~A, and then inside the argument tells us how much probability mass the argument steals for its side. This made it intuitive, in that if I encountered an argument that boasted of stealing all the probability mass for one side, and I could still conceive of the other side having some probability mass left over, I should distrust that argument.
As I recall, there was a paper in 2008 or 2009 about the LHC problem which concluded effectively that the tiny errors that an analysis was incorrectly carried out cumulatively put a high floor on what small risk we could conclude the LHC posed.
Unfortunately, I can’t seem to refind it to see whether it’s a better version of this argument, so perhaps someone else remembers specifics.
Probing the Improbable
Looks like it, thanks:
Very interesting principle, and one which I will bear in mind since I very recently had a spectacular failure to apply it.
What happens if we apply this type of thinking to Bayesian probability in general? It seems like we have to assign a small amount of probability to the claim that all our estimates are wrong, and that our methods for coming to those estimates are irredeemably flawed. This seems problematic to me, since I have no idea how to treat this probability, we can’t use Bayesian updating on it for obvious reasons.
Anyone have an idea about how to deal with this? Preferably a better idea than “just don’t think about it” which is my current strategy.
The issue is basically that the idealized Bayesian agent is assumed to be logically omniscient and humans clearly are not. It’s an open problem in the Bayesian epistemology literature.
There is an Eliezer post on just this subject. Anyone remember the title?
I’ve been looking through some of Eliezer’s posts on the subject and the closest I’ve come is “Where Recursive Justification Hits Bottom”, which looks at the problem that if you start with a sufficiently bad prior you will never attain accurate beliefs.
This is a slightly different problem to the one I pointed out (though no less serious, in fact I would say it’s more likely by several orders of magnitude). However, unlike that case, where there really is nothing you can do but try to self improve and hope you started above the cut-off point, my problem seems like it might have an actual solution, I just can’t see what it is.
You might be thinking of Ends Don’t Justify Means, which considers the question “What if I’m running on corrupt hardware”. It doesn’t actually say much about how a (would-be) rational agent ought to adjust its opinion-forming mechanisms to deal with that possibility, though.
[EDITED to remove superfluous apostrophe.]
I have been toying with an idea for this based on an analogy to evolutionary biology.
An organism attempts to adapt to the environment it attempts to find itself in, up to the limits allowed by its genetic programming. But a population of organisms, all exposed to the same environment, can adapt even further—by mutating the genetic programming of some of its members, and then using natural selection to change the relative proportions of different genomes in the population.
Similarly, a Bayesian attempts to adjust his belief probabilities according to the evidence he is exposed to, up to the limits allowed by his system of core assumptions and priors. But a population of Bayesians, all exposed to the same evidence, can adjust even further—by mutating priors and core beliefs, and then using a selection process to extinguish those belief systems that don’t work well in practice and to replicate variants that do perform well.
Now, imagine that this population of Bayesians exists within the head of a single rational agent (well, almost rational) and that decision making is done by some kind of proportional voting scheme (with neural-net-like back-feedback).
In this scheme, assigning probabilities of 0 or 1 to propositions is OK for a member of this Bayesian population. If that assignment is never refuted, then there is some efficiency in removing the epsilons from the calculations. However, such a sub-agent risks being extinguished should contradictory evidence ever arise.
A true Bayesian is epistemically perfect. I could have different subroutines computing estimates conditional on different chunks of my prior as a way to approximate true Bayesianism, but if you have access to one Bayesian, you don’t need another.
Are you 100% sure about that?
I don’t know how to compute beliefs, conditional on it being false.
My point is that there are some propositions—for instance the epistemic perfection of Bayesianism—to which you attach a probability of exactly 1.0. Yet you want to remain free to reject some of those “100% sure” beliefs at some future time, should evidence or argument convince you to do so. So, I am advising you to have one Bayesian in your head who believes the ‘obvious’, and at least one who doubts it. And then if the obvious ever becomes falsified, you will still have one Bayesian you can trust.
I don’t think the other guy counts as a Bayesian.
That’s definitely a good approximation of the organizational structure of the human mind of an imperfect Bayesian. You have a human consciousness simulating a Bayesian probability-computer, but the human contains heuristics powerful enough to, in some situations, overrule the Bayesian.
This has nothing to do with arguments, though.
This doesn’t really solve the problem. If Bayesian updating is flawed, and all the sub-agents use Bayesian updating, then they are all untrustworthy. A better approach might be to make some of the agents non-Bayesian (giving them very low initial weights). However, this only pushes back the problem, as it requires me to put 100% of my confidence in your method, rather than in Bayes theorem.
But Bayesian updating is not flawed. What may be flawed are prior assumptions and probabilities. All of the subagents should be Bayesian because Bayes’s theorem is the one unique solution to the problem of updating. But there is no one unique solution to the problem of axiomatizing logic and physics and ontology. No one unique way to choose priors. That is where choosing a variety of solutions and choosing among them using a natural selection process can be useful.
The problem I was specifically asking to solve is “what if Bayesian updating is flawed”, which I thought was an appropriate discussion on an article about not putting all your trust in any one system.
Bayes theorem looks solid, but I’ve been wrong about theorems before. So has the mathematical community (although not very often and not for this long, but it could happen and should not be assigned 0 probability). I’m slightly sceptical of the uniqueness claim, given I’ve often seen similar proofs which are mathematically sound, but make certain assumptions about what it allowed, and are thus vulnerable to out-of-the-box solutions (Arrow’s impossibility theorem is a good example of this). In fact, given that a significant proportion of statisticians are not Bayesians, I really don’t think this is a good time for absolute faith.
To give another example, suppose tomorrow’s main page article on LW is about an interesting theorem in Bayesian probability, and one which would affect the way you update in certain situations. You can’t quite understand the proof yourself, but the article’s writer is someone whose mathematical ability you respect. In the comments, some other people express concern with certain parts of the proof, but you still can’t quite see for yourself whether its right or wrong. Do you apply it?
Assign a probability 1-epsilon to your belief that Bayesian updating works. Your belief in “Bayesian updating works” is determined by Bayesian updating; you therefore believe with 1-epsilon probability that “Bayesian updating works with probability 1-epsilon”. The base level belief is then held with probability less than 1-epsilon.
As the recursive nature of holding Bayesian beliefs about believing Bayesianly allows chains to tend toward large numbers, the probability of the base level belief tends towards zero.
There is a flaw with Bayesian updating.
I think this is just a semi-formal version of the problem of induction in Bayesian terms, though. Unfortunately the answer to the problem of induction was “pretend it doesn’t exist and things work better”, or something like that.
I think this is a form of double-counting the same evidence. You can only perform Bayesian updating on information that is new; if you try to update on information that you’ve already incorporated, your probability estimate shouldn’t move. But if you take information you’ve already incorporated, shuffle the terms around, and pretend it’s new, then you’re introducing fake evidence and get an incorrect result. You can add a term for “Bayesian updating might not work” to any model, except to a model that already accounts for that, as models of the probability that Bayesian updating works surely do. That’s what’s happening here; you’re adding “there is an epsilon probability that Bayesian updating doesn’t work” as evidence to a model that already uses and contains that information, and counting it twice (and then counting it n times).
You can also fashion a similar problem regarding priors.
Determine what method you should use to assign a prior in a certain situation.
Then determine what method you should use to assign a prior to “I picked the wrong method to assign a prior in that situation”.
Then determine what method you should to assign a prior to “I picked the wrong method to assign a prior to “I picked the wrong method to assign a prior in that situation” ”.
This doesn’t seem like double-counting of anything to me; at no point can you assume you have picked the right method for any prior-assigning with probability 1.
This one is different, in that the evidence you’re introducing is new. However, the magnitude of the effect of each new piece of evidence on your original probability falls off exponentially, such that the original probability converges.
I’m pretty sure there is an error in your reasoning. And I’m pretty sure the source of the error is an unwarranted assumption of independence between propositions which are actually entangled—in fact, logically equivalent.
But I can’t be sure there is an error unless you make your argument more formal (i.e. symbol intensive).
I think it would take the form of X being an outcome, p(X) being the probability of the outcome as determined by Bayesian updating, “p(X) is correct” being the outcome Y, p(Y) being the probability of the outcome as determined by Bayesian updating, “p(Y) is correct” being the outcome Z, and so forth.
If you have any particular style or method of formalising you’d like me to use, mention it, and I’ll see if I can rephrase it in that way.
I don’t understand the phrase “p(X) is correct”.
Also I need a sketch of the argument that went from the probability of one proposition being 1-epsilon to the probability of a different proposition being smaller than 1-epsilon.
p(X) is a measure of my uncertainty about outcome X—“p(X) is correct” is the outcome where I determined my uncertainty about X correctly. There are also outcomes where I incorrectly determined my uncertainty about X. I therefore need to have a measure of my uncertainty about outcome “I determined my uncertainty correctly”.
The argument went from the initial probability of one proposition being 1-epsilon to the updated probability of the same proposition being less than 1-epsilon, because there was higher-order uncertainty which multiplies through.
A toy example: We are 90% certain that this object is a blegg. Then, we receive evidence that our method for determining 90% certainty gives the wrong answer one case in ten. We are 90% certain that we are 90% certain, or in other words—we are 81% certain that the object in question is a blegg.
Now that we’re 81% certain, we receive evidence that our method is flawed one case in ten—we are now 90% certain that we are 81% certain. Or, we’re 72.9% certain. Etc. Obviously epsilon degrades much slower, but we don’t have any reason to stop applying it to itself.
Thank-you for expressing my worry in much better terms than I managed to. If you like, I’ll link to your comment in my top-level comment.
I still don’t know why everyone thinks this is the problem of induction. You can certainly have an agent which is Bayesian but doesn’t use induction (the prior which assigns equal probability to all possible sequences of observation is non-inductive). I’m not sure if you can have a non-Bayesian that uses induction, because I’m very confused about the whole subject of ideal non-Bayesian agents, but it seems like you probably could.
Interesting that Bayesian updating seems to be flawed if an only if you assign non-zero probability to the claim that is flawed. If I was feeling mischievous I would compare it to a religion, it works so long as you have absolute faith, but if you doubt even for a moment it doesn’t.
It’s similar to Hume’s philosophical problem of induction (here and here specifically). Induction in this sense is contrasted with deduction—you could certainly have a Bayesian agent which doesn’t use induction (never draws a generalisation from specific observations) but I think it would necessarily be less efficient and less effective than a Bayesian agent that did.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
Feel free! I am all for increasing the number of minds churning away at this problem—the more Bayesians that are trying to find a way to justify Bayesian methods, the higher the probability that a correct justification will occur. Assuming we can weed out the motivated or biased justifications.
I’d love to see someone like EY tackle the above comment.
On a side note, why do I get an error if I click on the username of the parent’s author?
I’m actually planning on tackling it myself in the next two weeks or so. I think there might be a solution that has a deductive justification for inductive reasoning. EY has already tackled problems like this but his post seems to be a much stronger variant on Hume’s “it is custom, and it works”—plus a distinction between self-reflective loops and circular loops. That distinction is how I currently rationalise ignoring the problem of induction in everyday life.
Also—I too do not know why I don’t have an overview page.
You have piqued my curiosity. A trick to get around Arrow’s theorem? Do you have a link?
Regarding your main point: Sure, If you want some members of your army of mutant rational agents to be so mutated that they are no longer even Bayesians, well … go ahead. I suppose I have more faith in the rough validity of trial-and-error empiricism than I do in Bayes’s theorem. But not much more faith.
I’m afraid I don’t know how to post links.
I think there is already a main-page article on this subject, but the general idea is that Arrow’s theorem assumes the voting system is preferential (you vote by ranking voters) and so you can get around it with a non-preferential system.
Range voting (each voter gives each candidate as score out of ten, and the candidate with the highest total wins) is the one that springs most easily to mind, but it has problems of its own, so somebody who knows more about the subject can probably give you a better example.
As for the main point, I doubt you actually put 100% confidence in either idea. In the unlikely event that either approach led you to a contradiction, would you just curl up in a ball and go insane, or abandon it.
Ah. You mean this posting. It is a good article, and it supports your point about not trusting proofs until you read all of the fine print (with the warning that there is always some fine print that you miss reading).
But it doesn’t really overthrow Arrow. The “workaround” can be “gamed” by the players if they exaggerate the differences between their choices so as to skew the final solution in their own favor.
All deterministic non-dictatorial systems can be gamed to some extent (Gibbard Satterthwaite theorem, I’m reasonably confident that this one doesn’t have a work-around) although range voting is worse than most. That doesn’t change the fact that it is a counter-example to Arrow.
A better one might be approval voting, where you have as many votes as you want but you can’t vote for the same candidate more than once (equivalent to a the degenerate case of ranging where there are only two rankings you can give.
Thanks for the help with the links.
Next time you comment, click on the Help link to the lower right of the comment editing box.
Great post!
The moment the topic came up, I also thought back to something I once heard a creationist say. Most amusingly, not only did that probability have some fatuously huge order of magnitude, its mantissa was quoted to about 5 decimal places.
One gets ‘target confusion’ in such cases—shall I point out that no engineer would ever quote a probability like that to their boss, on pain of job loss? Shall I ask if my interlocutor even knows what a “power” IS?
This is at best weakly related to the statistics of error in a communications channel. Here, simulations are often used to run trillions of trials to simulate (monte carlo calculate) the conditions to get bit error rates (BER) of 10^-7, 10^-8, and so on. As an engineer more familiar with the physical layer (transistor amplifiers, thermal noise in channels, scattering of RF etc), I know that the CONDITIONS for these monte carlo calculations to mean something in the real circuits are complex and not as common as the new PhD doing the calculation thinks they are. Further, the lower the BER calculated, the more likely something else has come along to bite you on the arse and raise the actual error rate in an actual circuit. STILL, in engineering presentation after presentation, people put these numbers up and other people nod gravely when they see them.
Amazingly, I”m finding the feeling of the post but in error rates which are gigantic compared to the probabilities discussed in the article. We get wiggly when a 1 has 6 zeros in front of it, you are using exponential notation to avoid writing much longer strings of zeros.
Maybe the “great filter” that prevents us seeing a universe filled with at least a few other intelligent species is that finally, one of the big physics experiments large smart civilizations build finally does destroy the local solar system. Maybe we should ban successors to the Large Hadron Collider until we are ensconced in at least one other solar system.
We have hypothesis H and evidence E, and we dutifully compute
P(H) * P(E | H) / P(E)
It sounds like your advice is: don’t update yet! Especially if this number is very small. We might have made a mistake. But then how should we update? “Round up” seems problematic.
I read it to mean “update again” based on the probability that E is flawed. This well tend to adjust back toward your prior.
While you do that, the probability for the estimate being dynamically unstable should go up and then down again. Otherwise, you might make some strange decisions in-between, where the tradeoff between waiting for new information and deciding right now will be as for the honest estimate and not an intermediate step in a multi-step updating procedure with knowably incorrect intermediate results.
I’m not saying not to use Bayes’ theorem, I’m saying to consider very carefully what to plug into “E”. In the election example, your evidence is “A guy on a website said that there was a 999,999,999 in a billion chance that the incumbent would win.” You need to compute the probability of the incumbent winning given this actual evidence (the evidence that a guy on a website said something), not given the evidence that there really is a 999,999,999/billion chance. In the cosmic ray example, your evidence would be “There’s an argument that looks like it should make a less than 10^20 chance of apocalypse”, which may have different evidence value depending on how well your brain judges the way arguments look.
EDIT: Or what nerzhin said.
I think this amounts to saying: real-world considerations force an upper bound on abs(log(P(E | H) / P(E))). I’m on board with that, but can we think about how to compute and increase this bound?
Yes.
P(E) can be broken down into P(E|A)P(A) + P(E|~A)P(~A). Our temptation, when looking at a model, is to treat P(E|~A)*P(~A) as smaller than it really is—the question is, “Is the number of worlds in which the hypothesis is false but the evidence exists anyway large or small?” Yvain is noting that, because we are crazy, we tend to forget about many (or most) of these worlds when looking at evidence. We should expect the number of these worlds to be much larger than the number of worlds in which our probabililty calculations are everywhere and always correct.
The math doesn’t work out to “round up” exactly. It’s situation-dependent. It’s entirely possible that the model is so ill-specified that every variable has the wrong sign. The math will usually work out to deviation towards priors, even if only slightly.
Here’s a post on the same problem in social sciences.
What’s A?
“Deviation towards priors” sounds again like we are positing a bound on log(P(E|H)/P(E)). How can I estimate this bound?
I have a different response to this than the one you gave.
Consider your meta (“outside”) uncertainty over log-odds, in which independent evidence can be added, instead of probabilities. A distribution that averages out to the “internal” log-odds would, when translated back into probabilities, have an expected probability closer to 1⁄2 than the “inside” probability.
If you apply this to your prior probability as well as the evidence, this should generally move your probabilities towards 1⁄2.
This looks wrong to me. You can write your priors as a log-odds, and your pieces of evidence as several log-likelihood ratios, but while it’s it’s fairly obvious to me that your meta-uncertainty over log-likelihoods sends the extra evidence toward 0 and thus the overall probability toward the prior, I don’t see at all why it makes sense to do something analogous to the log-odds prior which sends that to 0 and thus the overall probability to 0.5.
What’s going on? Is the argument something like “well I have one possibility and then not-that-possibility, so if I look purely at the structure I should say ‘two possibilities, symmetric, 50/50!’”? I think that works if you generate all possibilities in estimations like this uniformly (esp. a possibility and its complement)? Anyway, IMO it’s a much stricter “outside view” to send your priors to 0.5 than it is to send your evidence to 0.
It might help to work an example.
Suppose we are interested in an event B with prior probability P(B) = 1⁄2 which is prior log odds L(B) = 0, and have observed evidence E which is worth 1 bit, so L(B|E) = 1 and P(B|E) = 2⁄3 ~= .67. But if we are meta uncertain of the strength of evidence E such that we assign probability 1⁄2 that it is worth 0 bits, and probability 1⁄2 that it is worth 2 bits, then the expected log odds is EL(B|E) = 1, but the expected probability EP(B|E) = (1/2)*(1/2) + (1/2)*(4/5) = (.5 + .8)/2 = .65, decreasing towards 1⁄2 from P(B|E) ~= .67.
But what if instead the prior probability was P(B) = 1⁄5, or L(B) = −2. Then, with the same evidence with the same meta uncertainty, EL(B|E) = L(B|E) = −1, P(B|E) = 1⁄3 ~= .33, and EP(B|E) = .35, this time increasing towards 1⁄2.
Note this did not even require meta uncertainty over the prior, only the uncertainty over the total posterior log-odds is important. Also note that even though uncertainty moves the expected probability towards 1⁄2, it does not move the expected log-odds towards 0.
Note that your observation does not generalize to more complex logodds-distributions. Here is a simple counterexample:
Let’s say that L(B|E)=1+x with chance 2⁄3, and L(B|E)=1-2x with chance 1⁄3. It still holds that EL(B|E)=1. But the expected probability EP(B|E) is now not a monotone function of x. It has a global minimum at x=2.
x EP(B|E)
0 0.66666666666666663
1 0.64444444444444438
2 0.62962962962962954
3 0.63755199049316691
4 0.64904862579281186
5 0.65706002898985361
Indeed. It looks like the effect I described occurs when the meta uncertainty is over a small range of log-odds values relative to the posterior log-odds, and there is another effect that could produce arbitrary expected probabilities given the right distribution over an arbitrarily large range of values. For any probability p, let L(B|E) = average + (1-p)*x with probability p and L(B|E) = average—p*x with probability (1-p), and then the limit of the expected probability as x approaches infinity is p.
I notice that this is where |1 + x| = |1 − 2x|. That might be interesting to look into.
(Possible more rigorous and explicit math to follow when I can focus on it more)
I let L(B|E) be uniform from x-s/2 to x+s/2 and got that P(B|E) =
where A is the odds if L(B|E)=x. In the limit as s goes to infinity, it looks like the interesting pieces are a term that’s the log of the prior probability dropping off as s grows linearly, plus a term that eventually looks like (1/s)*ln(e^(s/2))=1/2 which means we approach 1⁄2.Oh I see, I thought you were saying something completely different. :D Yes, it looks like keeping the expectation of the evidence constant, the final probability will be closer to 0.5 the larger the variance of the evidence. I thought you were talking about what our priors should be on how much evidence we will tend to receive for propositions in general from things we intuit as one source or something.
Splitting it by internal/external is a nice system.
I think people do this instinctively in real life. Exhibit A: people buy lottery tickets. My theory for this is that they know that the odds of winning are too low to justify buying a ticket assuming it is actually fully random. However, most people are willing to put the probability that karma, divine justice, God’s plan or their lucky ritual might swing the lottery in their direction at some nonzero value. If they believe in one of these things with even 1% certainty then the ticket is a good deal for them.
A lottery ticket can be justified in terms of utility even if it can’t be justified in terms of expected value.
On the LHC black holes vs cosmic ray black holes, both kinds of black holes emerge with nonzero charge and will very rapidly brake to a halt. And there’s cosmic rays hitting neutron stars, as well, and cosmic rays colliding in the magnetic field of neutron stars, LHC style. Bottom line is, the HLC has to be extremely exceptional to destroy the earth. It just doesn’t look this exceptional.
The thing is that a very tiny black hole has incredibly low accretion rate (quite reliable argument here; it takes a long time to push Earth through a needle’s eye, even at very high pressure) and even if we had many of those inside stars, planets, etc. we would never know. The HLC may have ‘doomed’ the Earth—to be destroyed in many billions years timespan.
The more interesting example would be PRA—probabilistic risk analysis—such as done for space shuttle, nuclear reactors, et cetera. The risk is calculated based on a sum of risks over very small selection of events (picked out of the space of possible events), and the minuscule risk figures that get calculated is representative not of low probability of failure but of low probability that the failure will be among the N guesses.
At same time, we have no good reason to believe PRA works at all, and a plenty of examples (Space Shuttle, nuclear reactors) where PRA was found off by a factor of 1000 (high confidence result ’cause its highly unlikely space shuttle PRA was correct yet two were lost).
The way I’d describe PRA is as estimating failure rate of a ball bearing in a car by adding up failure rates of the individual balls and other components. That’s obviously absurd; the balls and their environment interact in such complicated, non-linear ways that you can’t predict their failure rates by adding up component failure rates.
If a method clearly won’t work for something as simple as a ball bearing, why would anyone assume it’d work for space shuttle or nuclear power plant which are much much more complex than ball bearing? My theory is that those things are so complicated that a person has such difficulty of reasoning about them as to be unable to even see that they are too complex for PRA to work; while ball bearing is simple enough. At same time there’s a demand for some number to be given; this demand creates pseudoscience.
The map being distinct from the territory, you must go outside your map to discount your probability calculations made in the map. But how to do this? You must resort to a stronger map. But then the calculations there are subject to the errors in designing that map.
You can run this logic down to the deepest level. How does a rational person adopt a Bayesian methodology? Is there not some probability that the choice of methodology is wrong? But how do you conceive of that probability, when Bayesian considerations are the only ones available to evaluate truth from given evidence?
Why don’t these considerations prove that Bayesian epistemology isn’t the true account of knowledge?
Looks to me like you’ve proved that no one can ever change their beliefs or methodology, so not only have you disproven Bayesian epistemology, you’ve managed to disprove everything else too!
Counter example: I changed my epistemology from Aristotelian to Aristotle + Bayes + frequentism.
You are unwinding past the brain that does the unwinding.
A rational agent goes “golly, I seem to implement Occam’s Razor, and looking at that principle with my current implementation of Occam’s Razor, it seems like it is a simple hypothesis describing that hypotheses should be simple because the universe is simple.”
That is literally all you can do. If you implement anti-occamian priors the above goes something like: “It seems like a stochastic hypothesis describing that hypotheses should all differ and be complicated because the universe is complicated and stochastic.”
So, you cannot ‘run this logic down to the deepest level’ because at the deepest level there is nothing to argue with.
Particularly in the light of the fact that he seems to have got the numbers the wrong way round from what he intended in the final sentence.
Did he? I thought he just meant ‘odds’ when he said ‘probability’.
Not really; “The odds that God created the living cell are 10 to the 4,478,296 power to 1” would mean that it’s that ridiculously improbable that God created the cell, which is clearly not what that author was arguing.
No, no. The guy’s worse mistake is not that. If he really thinks that a cell can be jigsawwed from individual proteins etc. and think of all the water and ions and stuff), in a single event, then the odds he gives are the odds of God getting the cell right.
I speculate there’s at least two problems with the creationism odds calculation. First, it looks like the person doing the calculation was working with maybe 60,000 protein molecules rather than zillions of protein molecules.
The second problem I’m having trouble putting precisely in words, concerning the use of the uniform distribution as a prior. Sometimes the use of the uniform distribution as a prior seems to me to be entirely justified. An example of this is where there is a well-constructed model as to subsequent outcomes.
Other times, when the model for subsequent outcomes is sketchy, the uniform distribution is used as a prior simply as a default. Or, as in this case, it’s clearly not an appropriate prior. In this case, the person is probably assuming that all combinations of proteins are equally likely (I suspect this assumption is false.)
Consider that 1) There is more than one possible arrangement of proteins which qualifies as a living cell, and that 2) the materials of which proteins are made had quite a long time to shuffle around and try out different configurations between when the earth cooled and the present day, to say nothing of other planets elsewhere in the universe, and that 3) once a living, self-replicating, self-repairing cell has come to exist in an area with appropriate raw materials and a steady energy source it will create more such cells, so it only has to happen once.
So, we’re looking at a sample size equal to, by my back-of-the-envelope estimation, the number of cell-sized volumes in Earth’s atmosphere and oceans, times the number of planck instants in a little over four billion years, times the number of earth-like planets in the universe. The actual universe, not just the part we can see.
For intelligent design to be the most reasonable explanation, the probability of life emerging spontaneously would have to be low enough that, in a sample of that size, we wouldn’t expect to see it happen even once, and, furthermore, the designer’s own origin would need to be explained in such a way as to be less improbable.
You shouldn’t use Planck times unless the protean can rearrange themselves that quickly.
If the temperature is high enough that there’s molecular movement at all, you could observe a collection of proteins every Planck-instant and see a (slightly) different arrangement each time. You might be stuck with similar ones, especially stable configurations, for a long time… but that’s exactly the sort of bias that makes life possible.
Isn’t the problem more like: they are ignoring the huge number of bits of evidence that say that cells in fact exist. They aren’t comparing between hypotheses that say cells exist. They are comparing the uniform prior for cells existing to a the prior for only random proteins existing. They sound more like they are trying to argue that all our experiences cannot be enough evidence that there are cells, which seems weird.
This is a misinterpretation. The argument goes like this:
True statement: There is lots of evidence or cells. P(Evidence|Cells)/P(Evidence|~Cells)>>1.
False statement: Without intelligent design, cells could only be produced by random chance. P(Cells|~God) is very very small.
Debatable statement: P(Cells|God) is large.
Conclusion: We update massively in favor of God and against ~God, because of, not in opposition to, the massive evidence in favor of the existence of cells.
This is valid Bayesian updating, it’s just that the false statement is false.
You’re absolutely right! This is one of the key mistaken beliefs that creationists hold. I’ve had the most success in convincing them otherwise (or at least making them doubt) using the argument given by Dawkins in The God Delusion:
Our likelihood heuristic is strongly tied to both our lifespans and the subjective rate at which we experience time passing. Example: if we lived hundreds of times longer, current probabilities of, say, dying in a car accident, would appear totally unacceptable, because the expected number of car accidents in our lifetime would corresponding be hundreds of times higher.
The hundreds of millions of years between the formation of the Earth and the appearance of life are simply much too large of a time-span for our likelihood heuristic to apply, and doing some simple math [omitted; if someone wants to give some approximate numbers that’d be nice] shows that the probability of replicators arising in that time-span is far from negligible.
Upvoted for successfully correcting my confusion about this example and helping me get updating a little better.
Edit: wow, this was a really old comment reply. How did I just notice it...
This argument doesn’t work for anthropic reasons. It could be that in the vast majority of Everett branches Earth was wiped out by cosmic ray collisions.
Anthropic reasoning only goes this far. Even if I accept the silliness in which zillion of Earths are destroyed every year for each one that survives… the other planets in the solar system could also have been destroyed. And the stars and galaxies in the sky would all be devoured by now, no? And no anthropic reasons would prevent us from witnessing that from a safe distance.
Here’s a fun game: Try to disprove the hypothesis that every single time someone says “Abracadabra” there’s a 99.99% chance that the world gets destroyed.
We haven’t been anthropically forced into a world where humans can’t say “Abracadabra”.
Oh, but a non-trivial number of people have mild superstitions against saying “Abracadabra”. Does this not constitute (weak) anthropic evidence?
Warning, spoiler alert:
Abracadabra.
You’ve just murdered 99.99% of all Earths. Our Everett branch survived for anthropic reasons.
This is totally testable. I’m going to download some raw quantum noise. If the first byte isn’t FF I will say the magic word. I will then report back what the first byte was.
Update: the first byte was 1B
...
Abracadabra.
Still here.
Initially this was anthropic evidence for normality, until people would have had time to replicate the experiment. Suppose the word was that dangerous, and the first byte had been FF. By now, all the people replicating the experiment have destroyed those universes. Only the universes where the experiment failed to show FF on the first try are still around.
Which means we have to cut down on the worlds where FF didn’t happen. Say it with me everyone.
Abracadabra, Abracadabra, Abracadabra, Abracadabra, Abracadabra, Abracadabra...
If everyone who reads this comments says the word say, thirty times, we should be good, right?
At what point would you have accepted that saying “Abracadabra” does destroy the world? How would you have felt about that? And what service have you been using? I only know about random.org. Thanks.
ETA:
HotBits generates random numbers from radioactive decay.
QRBG Quantum Random Bit Generator
I used this one. After two FFs I would have decided I was in a simulation which some Less Wrong poster had set up post-singularity to screw with us. Those kind of Cartesian Joker scenarios are way more probable than “Abracadabra” destroying the world…
Just two FFs? That doesn’t seem all that improbable even forgetting all thought of world destruction. After about 100 FFs I would suspect that there was a problem with my experimental procedure (eg. internet quantum byte source broken). That too would be testable. (“I’m not going to say Abracadabra this time. FF? FF? Now I am. FF? FF?)
Well two FFs by chance is 1 in 65536. And my prior for “I’m in a simulation” isn’t that low. You’re right about the service being broken or fraudulent and really right about needing to test what happens if I don’t say Abracadabra. But you definitely don’t have to wait for 100 FFs!
That isn’t the number to consider here. The relevant prior is “I’m in a simulation and this particular simulation involves the abracadabra trick”. That number is quite a bit lower!
True enough. I estimate that I’d start testing after 4 or 5. :)
Yeah. Hmm. I don’t really have a stable estimate of that probability. Of course, it’s not like like I would have stopped after two trials, but at that point I’m poring myself a drink. Worth noting that by coming up with the hypothesis I drastically increased its probability and then by mentioning it here I increased it’s probability even further.
Would you mind attempting to narrate any internal dialog you’d imagine yourself having after the 3rd? Lol.
“Um. WTF? Is this even working?”
(Yes, since the test is so trivial I might even click through a test after 2. I just wouldn’t start suspecting modded sims.)
Really?
Well chance is 1 in 65,536. Is there some hypothesis I’ve neglected?
The person running the qrng server decided to screw with you.
Damn!
I accept this counter-argument.
This is unlikely because it is wildly incompatible with everything we know about physics, not because we have never observed it to happen. It is unlikely because it has an extremely low prior probability, not because we have any (direct) evidence against it.
I should like to know Yvain’s prior on this.
On the “abracadabra” example? The overwhelming majority would come from the possibility that any time anything whatsoever happens the world is “destroyed”, for some weird, maybe anthropic use of the word “destroyed” I don’t understand compatible with me still being here.
If we limit it to “abracadabra” and nothing else, that’s complex enough that < 1/trillion just picking it out of hypothesis space (lots of combinations of sounds that could destroy the world, lots of things that aren’t combinations of sounds).
Just the world? Well, all you need is a good rocket ship so you aren’t on it anymore, and take a look.
If you mean destroy the MW branch in which it’s said, then Nick Tarleton’s answer works—that rule would make the choice to say ‘Abracadabra’ far smaller in probability than saying similar things that don’t destroy the world. People saying that one thing would be greatly suppressed relative to, say, “Alakazam” or “Poof” or “Presto Change-o”, and it would quickly leave the lexicon.
Indeed—none of us would have ever heard it.
Perhaps rather than just causing a black hole, it causes a tear in space-time that expands at the speed of light. By the time you see it, you’re already dead.
Of course, there’s still the fact that early worlds would be weighted much more heavily, so this is probably about the first instant that you exist. And there’s the fact that, if that’s true, the LHC wouldn’t decrease the expected lifetime of the world by a noticeable amount.
I feel vaguely disapproving of anthropic reasoning when it rewards elaborate and contrived scenarios over simpler ones with similar characteristics.
There are some interesting replies here.
“This person believes he could make one statement about an issue as difficult as the origin of cellular life per Planck interval, every Planck interval from the Big Bang to the present day, and not be wrong even once” only brings us to 1/10^61 or so.”
Wouldn’t that be 1/ 2^(10^61) or am I missing something?
I’m a bit irked by the continued persistence of “LHC might destroy the world” noise. Given no evidence, the prior probability that microscopic black holes can form at all, across all possible systems of physics, is extremely small. The same theory (String Theory[1]) that has led us to suggest that microscopic black holes might form is also quite adamant that all black holes evaporate, and just as adamant that microscopic ones evaporate faster than larger ones, by a precise factor of the mass ratio cubed. If we think the theory is talking complete nonsense, then the posterior probability of an LHC black hole forming in the first place goes down, because we slide back to the prior of a universe without microscopic black holes.
Thus, the “LHC might destroy the world” noise boils down to the possibility that (A) there is some mathematically consistent post-GR, microscopic-black-hole-predicting theory that has massively slower evaporation, (B) this unnamed and possibly non-existent theory is less Kolmogorov-complex and hence more posterior-probable than the one that scientists are currently using[2], and (C) scientists have completely overlooked this unnamed and possibly non-existent theory for decades, strongly suggesting that it has a large Levenshtein distance from the currently favored theory. The simultaneous satisfaction of these three criteria seems… pretty fing unlikely, since each tends to reject the others. A/B: it’s hard to imagine a theory that predicts post-GR physics with LHC-scale microscopic black holes that’s more Kolmogorov-simple than String Theory, which can actually be specified pretty damn compactly. B/C: people already have explored the Kolmogorov-simple space of post-Newtonian theories pretty heavily, and even the simple post-GR theories are pretty well explored, making it unlikely that even a theory with large edit distance from either ST or SM+GR has been overlooked. C/A: it seems like a hell of a coincidence that a large-edit-distance theory, i.e. one extremely dissimilar to ST, would just happen to also predict the formation of LHC-scale microscopic black holes, then* go on to predict that they’re stable* on the order of hours or more by throwing out the mass-cubed rule[3], then* go on to explain why we don’t see them by the billions despite their claimed stability. (If the ones from cosmic rays are so fast that the resulting black holes zip through Earth, why haven’t they eaten Jupiter, the Sun, or other nearby stars yet? Bombardment by cosmic rays is not unique to Earth, and there are plenty of celestial bodies that would be heavy enough to capture the products.)
[1] It’s worth noting that our best theory, the Standard Model with General Relativity, does not predict microscopic black holes at LHC energies. Only String Theory does: ST’s 11-dimensional compactified space is supposed to suddenly decompactify at high energy scales, making gravity much more powerful at small scales than GR predicts, thus allowing black hole formation at abnormally low energies, i.e. those accessible to LHC. And GR without the SM doesn’t predict microscopic black holes. At all. Naked GR only predicts supernova-sized black holes and larger.
[2] The biggest pain of SM+GR is that, even though we’re pretty damn sure that that train wreck can’t be right, we haven’t been able to find any disconfirming data that would lead the way to a better theory. This means that, if the correct theory were more Kolmogorov-complex than SM+GR, then we would still be forced as rationalists to trust SM+GR over the correct theory, because there wouldn’t be enough Bayesian evidence to discriminate the complex-but-correct theory from the countless complex-but-wrong theories. Thus, if we are to be convinced by some alternative to SM+GR, either that alternative must be Kolmogorov-simpler (like String Theory, if that pans out), or that alternative must suggest a clear experiment that leads to a direct disconfirmation of SM+GR. (The more-complex alternative must also somehow attract our attention, and also hint that it’s worth our time to calculate what the clear experiment would be. Simple theories get eyeballs, but there are lots of more-complex theories that we never bother to ponder because that solution-space doesn’t look like it’s worth our time.)
[3] Even if they were stable on the order of seconds to minutes, they wouldn’t destroy the Earth: the resulting black holes would be smaller than an atom, in fact smaller than a proton, and since atoms are mostly empty space the black hole would sail through atoms with low probability of collision. I recall that someone familiar with the physics did the math and calculated that an LHC-sized black hole could swing like a pendulum through the Earth a hundred times before gobbling up even a single proton, and the same calculation showed it would take over 100 years before the black hole grew large enough to start collapsing the Earth due to tidal forces, assuming zero evaporation. Keep in mind that the relevant computation, t = (5120 × π × G^2 × M^3) ÷ (ℏ × c^4), shows that a 1-second evaporation time is equal to 2.28e8 grams[3a] i.e. 250 tons, and the resulting radius is r = 2 × G × M ÷ c^2 is 3.39e-22 meters[3b], or about 0.4 millionths of a proton radius[3c]. That one-second-duration black hole, despite being tiny, is vastly larger than the ones that might be created by LHC -- 10^28 larger in fact[3d]. (FWIW, the Schwarzschild radius calculation relies only on GR, with no quantum stuff, while the time-to-evaporate calculation depends on some basic QM as well. String Theory and the Standard Model both leave that particular bit of QM untouched.)
[3a] Google Calculator: “(((1 s) h c^4) / (2pi 5120pi G^2)) ^ (1/3) in grams” [3b] Google Calculator: “2 G 2.28e8 grams / c^2 in meters” [3c] Google Calculator: “3.3856695e-22 m / 0.8768 femtometers”, where 0.8768 femtometers is the experimentally accepted charge radius of a proton [3d] Google Calculator: “(2.28e8 g * c^2) / 14 TeV”, where 14 TeV is the LHC’s maximum energy (7 TeV per beam in a head-on proton-proton collision)
To ground this issue in more concrete terms, imagine you are writing an algorithm to compress images made up of 8-bit pixels. The algorithm plows through several rows until it comes to a pixel, and predicts that the distribution of that pixel is Gaussian with mean of 128 and variance of .1. Then the model probability that the real value of the pixel is 255 is some astronomically small number—but the system must reserve some probability (and thus codespace) for that outcome. If it does not, then it violates the general contract that a lossless compression algorithm should assign a code to any input, though some inputs will end up being inflated. In other words it risks breaking.
On the other hand, it is completely reasonable that it should assign zero probability to the outcome that the pixel value is 300. That all pixels values fall between 0 and 255 is a deductive consequence of the problem definition.
What is your argument for claiming that the LHC will not destroy the world?
That the world still exists albeit ongoing experiments is easily explained by the fact that we are necessarily living in those branches of the universe where the LHC didn’t destroy the world. (On an related side note: Has the great filter been found yet?)
Good point. I’ve changed this to “since the LHC did not destroy the world”, which is true regardless of whether it destroyed other branches.
This post raises very similar issues to those discussed in comments here.