What Bayesianism taught me
David Chapman criticizes “pop Bayesianism” as just common-sense rationality dressed up as intimidating math[1]:
Bayesianism boils down to “don’t be so sure of your beliefs; be less sure when you see contradictory evidence.”Now that is just common sense. Why does anyone need to be told this? And how does [Bayes’] formula help?
[...]
The leaders of the movement presumably do understand probability. But I’m wondering whether they simply use Bayes’ formula to intimidate lesser minds into accepting “don’t be so sure of your beliefs.” (In which case, Bayesianism is not about Bayes’ Rule, after all.)
I don’t think I’d approve of that. “Don’t be so sure” is a valuable lesson, but I’d rather teach it in a way people can understand, rather than by invoking a Holy Mystery.
What does Bayes’s formula have to teach us about how to do epistemology, beyond obvious things like “never be absolutely certain; update your credences when you see new evidence”?
I list below some of the specific things that I learned from Bayesianism. Some of these are examples of mistakes I’d made that Bayesianism corrected. Others are things that I just hadn’t thought about explicitly before encountering Bayesianism, but which now seem important to me.
I’m interested in hearing what other people here would put on their own lists of things Bayesianism taught them. (Different people would make different lists, depending on how they had already thought about epistemology when they first encountered “pop Bayesianism”.)
I’m interested especially in those lessons that you think followed more-or-less directly from taking Bayesianism seriously as a normative epistemology (plus maybe the idea of making decisions based on expected utility). The LW memeplex contains many other valuable lessons (e.g., avoid the mind-projection fallacy, be mindful of inferential gaps, the MW interpretation of QM has a lot going for it, decision theory should take into account “logical causation”, etc.). However, these seem further afield or more speculative than what I think of as “bare-bones Bayesianism”.
So, without further ado, here are some things that Bayesianism taught me.
Banish talk like “There is absolutely no evidence for that belief”. P(E | H) > P(E) if and only if P(H | E) > P(H). The fact that there are myths about Zeus is evidence that Zeus exists. Zeus’s existing would make it more likely for myths about him to arise, so the arising of myths about him must make it more likely that he exists. A related mistake I made was to be impressed by the cleverness of the aphorism “The plural of ‘anecdote’ is not ‘data’.” There may be a helpful distinction between scientific evidence and Bayesian evidence. But anecdotal evidence is evidence, and it ought to sway my beliefs.
Banish talk like “I don’t know anything about that”. See the post “I don’t know.”
Banish talk of “thresholds of belief”. Probabilities go up or down, but there is no magic threshold beyond which they change qualitatively into “knowledge”. I used to make the mistake of saying things like, “I’m not absolutely certain that atheism is true, but it is my working hypothesis. I’m confident enough to act as though it’s true.” I assign a certain probability to atheism, which is less than 1.0. I ought to act as though I am just that confident, and no more. I should never just assume that I am in the possible world that I think is most likely, even if I think that that possible world is overwhelmingly likely. (However, perhaps I could be so confident that my behavior would not be practically discernible from absolute confidence.)
Absence of evidence is evidence of absence. P(H | E) > P(H) if and only if P(H | ~E) < P(H). Absence of evidence may be very weak evidence of absence, but it is evidence nonetheless. (However, you may not be entitled to a particular kind of evidence.)
Many bits of “common sense” rationality can be precisely stated and easily proved within the austere framework of Bayesian probability. As noted by Jaynes in Probability Theory: The Logic of Science, “[P]robability theory as extended logic reproduces many aspects of human mental activity, sometimes in surprising and even disturbing detail.” While these things might be “common knowledge”, the fact that they are readily deducible from a few simple premises is significant. Here are some examples:
It is possible for the opinions of different people to diverge after they rationally update on the same evidence. Jaynes discusses this phenomenon in Section 5.3 of PT:TLoS.
Popper’s falsification criterion, and other Popperian principles of “good explanation”, such as that good explanations should be “hard to vary”, follow from Bayes’s formula. Eliezer discusses this in An Intuitive Explanation of Bayes’ Theorem and A Technical Explanation of Technical Explanation.
Occam’s razor. This can be formalized using Solomonoff induction. (However, perhaps this shouldn’t be on my list, because Solomonoff induction goes beyond just Bayes’s formula. It also has several problems.)
You cannot expect[2] that future evidence will sway you in a particular direction. “For every expectation of evidence, there is an equal and opposite expectation of counterevidence.”
Abandon all the meta-epistemological intuitions about the concept of knowledge on which Gettier-style paradoxes rely. Keep track of how confident your beliefs are when you update on the evidence. Keep track of the extent to which other people’s beliefs are good evidence for what they believe. Don’t worry about whether, in addition, these beliefs qualify as “knowledge”.
What items would you put on your list?
ETA:
[1] See also Yvain’s reaction to David Chapman’s criticisms.
[2] ETA: My wording here is potentially misleading. See this comment thread.
- Bayesianism for humans: “probable enough” by 2 Sep 2014 21:44 UTC; 52 points) (
- Why is Bayesianism important for rationality? by 1 Sep 2020 4:24 UTC; 37 points) (
- Bayesianism for humans: prosaic priors by 2 Sep 2014 21:45 UTC; 30 points) (
- 26 Oct 2018 20:47 UTC; 19 points) 's comment on Schools Proliferating Without Practicioners by (
- Rationality Compendium: Principle 1 - A rational agent, given its capabilities and the situation it is in, is one that thinks and acts optimally by 23 Aug 2015 8:01 UTC; 9 points) (
- 1 Jun 2014 19:14 UTC; 7 points) 's comment on June 2014 Media Thread by (
- Meetup : Philadelphia—What Bayesianism taught me by 4 Oct 2013 11:50 UTC; 6 points) (
- Consequences of Bayesian Epistemology? by 6 Jul 2021 20:05 UTC; 5 points) (
- 29 Oct 2013 16:33 UTC; 5 points) 's comment on Bayesianism for Humans by (
- 21 Apr 2014 22:20 UTC; 3 points) 's comment on Rationality Quotes April 2014 by (
- 4 Feb 2014 22:33 UTC; 2 points) 's comment on Rationality & Low-IQ People by (
- 2 Dec 2013 13:58 UTC; 0 points) 's comment on Reasons to believe by (
- 10 Feb 2014 12:13 UTC; 0 points) 's comment on White Lies by (
- 26 Jan 2017 11:30 UTC; 0 points) 's comment on Too Much Effort | Too Little Evidence by (
- 14 Jun 2017 21:15 UTC; 0 points) 's comment on Epistemology vs Critical Thinking by (
- 4 May 2014 0:22 UTC; 0 points) 's comment on The Universal Medical Journal Article Error by (
- 18 Sep 2013 19:43 UTC; 0 points) 's comment on Welcome to Less Wrong! (5th thread, March 2013) by (
- 1 May 2014 22:42 UTC; 0 points) 's comment on Open Thread, April 27-May 4, 2014 by (
- 16 Feb 2016 22:00 UTC; -1 points) 's comment on Is Spirituality Irrational? by (
The (related) way I would expand this is “if you know what you will believe in the future, then you ought to believe that now.”
Quoting myself from Yvain’s blog:
Another useful thing for qualitative Bayes from Jaynes—always include a background information I in the list of information you’re conditioning on. It reminds you that your estimates are fully contextual on all your knowledge, most of which is unstated and unexamined.
Actually, this seems like a General Semantics meets Bayes kind of principle. Surely Korzybski had a catchy phrase for a similar idea. Anyone got one?
Korzybski did “turgid” rather than “catchy”, but this seems closely related to his insistence that characteristics are always left out by the process of abstraction, and that one can never know “all” about something. Hence his habitual use of “etc.”, to the degree that he invented abbreviations for it.
Anecdotal evidence is filtered evidence. People often cite the anecdote that supports their belief, while not remembering or not mentioning events that contradict them. You can find people saying anecdotes on any side of a debate, and I see no reason the people who are right would cite anecdotes more.
Of course, if you witness an anecdote with your own eyes, that is not filtered, and you should adjust your beliefs accordingly.
Unless you too selectively (mis)remember things.
Or selectively expose yourself to situations.
If I can always expose myself to situations in which I anecdotally experience success, isn’t that Winning?
Yes. What it isn’t is an unbiased scientific study. The anecdotal experience of situations which are selected to to provide success is highly filtered evidence.
I think the value of anecdotes often doesn’t lie so much in changing probabilities of belief but in illustrating what a belief actually is about.
That, and existence/possibility proofs, and, in the very early phases of investigation, providing a direction for inquiry.
Right, the existence of the anecdote is the evidence, not the occurrence of the events that it alleges.
It is true that, if a hypothesis has reached the point of being seriously debated, then there are probably anecdotes being offered in support of it. (… assuming that we’re taking about the kinds of hypotheses that would ever have an anecdote offered in support of it.) Therefore, the learning of the existence of anecdotes probably won’t move much probability around among the hypotheses being seriously debated.
However, hypothesis space is vast. Many hypotheses have never even been brought up for debate. The overwhelming majority should never come to our attention at all.
In particular, hypothesis space contains hypotheses for which no anecdote has ever been offered. If you learned that a particular hypothesis H were true, you would increase your probability that H was among those hypotheses that are supported by anecdotes. (Right? The alternative is that which hypotheses get anecdotes is determined by mechanisms that have absolutely no correlation, or even negative correlation, with the truth.) Therefore, the existence of an anecdote is evidence for the hypothesis that the anecdote alleges is true.
A typical situation is that there’s a contentious issue, and some anecdotes reach your attention that support one of the competing hypotheses.
You have three ways to respond:
You can under-update your belief in the hypothesis, ignoring the anecdotes completely
You can update by precisely the measure warranted by the existence of these anecdotes and the fact that they reached you.
You can over-update by adding too much credence to the hypothesis.
In almost every situation you’re likely to encounter, the real danger is 3. Well-known biases are at work pulling you towards 3. These biases are often known to work even when you’re aware of them and trying to counteract them. Moreover, the harm from reaching 3 is typically far greater than the harm from reaching 1. This is because the correct added amount of credence in 2 is very tiny, particularly because you’re already likely to know that the competing hypotheses for this issue are all likely to have anecdotes going for them. In real-life situations, you don’t usually hear anecdotes supporting an incredibly unlikely-seeming hypothesis which you’d otherwise be inclined to think as capable of nurturing no anecdotes at all. So forgoing that tiny amount of credence is not nearly as bad as choosing 3 and updating, typically, by a large amount.
The saying “The plural of anecdotes is not data” exists to steer you away from 3. It works to counteract the very strong biases pulling you towards 3. Its danger, you are saying, is that it pulls you towards 1 rather than the correct 2. That may be pedantically correct, but is a very poor reason to criticize the saying. Even with its help, you’re almost always very likely to over-update—all it’s doing is lessening the blow.
Perhaps this as an example of “things Bayesianism has taught you” that are harming your epistemic rationality?
A similar thing I noticed is disdain towards “correlation does not imply causation” from enlightened Bayesians. It is counter-productive.
This is the problem. I know, as an epistemic matter of fact, that anecdotes are evidence. I could try to ignore this knowledge, with the goal of counteracting the biases to which you refer. That is, I could try to suppress the Bayesian update or to undo it after it has happened. I could try to push my credence back to where it was “manually”. However, as you point out, counteracting biases in this way doesn’t work.
Far better, it seems to me, to habituate myself to the fact that updates can by miniscule. Credence is quantitative, not qualitative, and so can change by arbitrarily small amounts. “Update Yourself Incrementally”. Granting that someone has evidence for their claims can be an arbitrarily small concession. Updating on the evidence doesn’t need to move my credences by even a subjectively discernible amount. Nonetheless, I am obliged to acknowledge that the anecdote would move the credences of an ideal Bayesian agent by some nonzero amount.
So, let’s talk about measurement and detection.
Presumably you don’t calculate your believed probabilities to the n-th significant digit, so I don’t understand the idea of a “miniscule” update. If it has no discernible consequences then as far as I am concerned it did not happen.
Let’s take an example. I believe that my probability of being struck by lightning is very low to the extent that I don’t worry about it and don’t take any special precautions during thunderstorms. Here is an anecdote which relates how a guy was stuck by lightning while sitting in his office inside a building. You’re saying I should update my beliefs, but what does it mean?
I have no numeric estimate of P(me being struck by lightning) so there’s no number I can adjust by 0.0000001. I am not going to do anything differently. My estimate of my chances to be electrocuted by Zeus’ bolt is still “very very low”. So where is that “miniscule update” that you think I should make and how do I detect it?
P.S. If you want to update on each piece of evidence, surely by now you must fully believe that product X is certain to enlarge your penis?
It is interesting that you think of this as typical, or at least typical enough to be exclusionary of non-contentious issues. I avoid discussions about politics and possibly other contentious issues, and when I think of people providing anecdotes I usually think of them in support of neutral issues, like the efficacy of understudied nutritional supplements. If someone tells you, “I ate dinner at Joe’s Crab Shack and I had intense gastrointestinal distress,” I wouldn’t think it’s necessarily justified to ignore it on the basis that it’s anecdotal. If you have 3 more friends who all report the same thing to you, you should rightly become very suspicious of the sanitation at Joe’s Crab Shack. I think the fact that you are talking about contentious issues specifically is an important and interesting point of clarification.
Thanks for that comment! Eliezer often says people should be more sensitive to evidence, but an awful lot of real-life evidence is in fact much weaker, noisier, and easier to misinterpret than it seems. And it’s not enough to just keep in mind a bunch of Bayesian mantras—you need to be aware of survivor bias, publication bias, Simpson’s paradox and many other non-obvious traps, otherwise you silently go wrong and don’t even know it. In a world where most published medical results fail to replicate, how much should we trust our own conclusions?
Would it be more honest to recommend people to just never update at all? But then everyone will stick to their favorite theories forever… Maybe an even better recommendation would be to watch out for “motivated cognition”, try to be more skeptical of all theories including your favorites.
Doesn’t look implausible to me. Here’s an alternative hypothesis: the existence of anecdotes is a function of which beliefs are least supported by strong data because such beliefs need anecdotes for justification.
In general, I think anecdotes are way too filtered and too biased as an information source to be considered serious evidence. In particular, there’s a real danger of treating a lot of biased anecdotes as conclusive data and that danger, seems to me, outweighs the miniscule usefulness of anecdotes.
We may agree. It depends on what work the word “serious” is doing in the quoted sentence.
In this context “serious” = “I’m willing to pay attention to it”.
I would raise a hypothesis to consideration because someone was arguing for it, but I don’t think anecdotes are good evidence in that I would have similar confidence in a hypothesis supported by an anecdote, and a hypothesis that is flatly stated with no justification. The evidence to raise it to consideration comes from the fact that someone took the time to advocate it.
This is more of a heuristic than a rule, because there are anecdotes that are strong evidence (“I ran experiments on this last year and they didn’t fit”), but when dealing with murkier issues, they don’t count for much.
Yes, it may be that the mere fact that a hypothesis is advocated screens off whether that hypothesis is also supported by an anecdote. But I suspect that the existence of anecdotes still moves a little probability mass around, even among just those hypotheses that are being advocated.
I mean, if someone advocated for a hypothesis, and they couldn’t even offer an anecdote in support of it, that would be pretty deadly to their credibility. So, unless I am certain that every advocated hypothesis has supporting anecdotes (which I am not), I must concede that anecdotes are evidence, howsoever weak, over and above mere advocacy.
Here’s a situation where an anecdote should reduce our confidence in a belief:
A person’s beliefs are usually well-supported.
When he offers supporting evidence, he usually offers the strongest evidence he knows about.
If this person were to offer an anecdote, it should reduce our confidence in his proposition, because it makes it unlikely he knows of stronger supporting evidence.
I don’t know how applicable this is to actual people.
I don’t think this is necessarily valid, because people also know that anecdotes can be highly persuasive. So for many people, if you have an anecdote it will make sense to say so, since most people argue not to reach the truth but to persuade.
I agree that it is at least hypothetically possible that the offering of an anecdote should reduce our credence in what the anecdote claims.
… For example, if you told me that you once met a powerful demon who works to stop anyone from ever telling anecdotes about him (regardless of whether the anecdotes are true or false), then I would decrease my credence in the existence of such a demon.
Still evidence.
After accounting for the filtering, which way does it point? If you’re left with a delta log-odds of zero, it’s “evidence” only in the sense that if you have no apples you have “some” apples.
Yes, “Daaad, Zeus the Greek god ate my homework!” isn’t strong evidence, certainly.
But the way it points (in relation to P(Zeus exists)) is clear. I agree with your second sentence, but I’m not sure I understand your first one.
I don’t think it is. If Zeus really had eaten the homework, I wouldn’t expect it to be reported in those terms. Some stories are evidence against their own truth—if the truth were as the story says, that story would not have been told, or not in that way. (Fictionally, there’s a Father Brown story hinging on that.)
And even if it theoretically pointed in the right direction, it is so weak as to be worthless. To say, “ah, but P(A|B)>P(A)!” is not to any practical point. It is like saying that a white wall is evidence for all crows being black. A white wall is also evidence, in that sense, for all crows being magenta, for the moon being made of green cheese, for every sparrow falling being observed by God, and for no sparrow falling being observed by God. Calling this “evidence” is like picking up from the sidewalk, not even pennies, but bottle tops.
Yes, in a world in which Zeus existed, people would not proclaim the importance of faith in Zeus, anymore than they proclaim the importance of faith in elephants or automobiles. Everyone would just accept that they exist.
I don’t know: consider the classic cargo cult. It proclaims the importance of faith in airplanes.
Or consider Christianity: people who fully believe in Jesus Christ (=from their point of view they live in the world in which Jesus exists) tend to proclaim the importance of faith in Jesus.
Yes, that’s the point—people don’t tend to proclaim the importance of faith in things that actually exist. You won’t hear them say “have faith in the existence of tables” or “have faith in the existence of chairs”.
I would suspect that this is because a) everybody believes in tables and chairs (with the exception of a few very strange people, who are probably easy enough to spot), and b) nobody (again with a few odd exceptions) believes in any sort of doctrine or plan of action for chair-and-table-believers, so faith doesn’t have many consequences (except for having somewhere to sit and place things on).
We, on the other hand, proclaim the importance of confidence in rational thought, for the same reasons that theists proclaim the importance of belief in their god: it is a belief which is not universal in the population, and it is a belief which we expect to have important consequences and prescriptions for action.
What I was just about to say. See also Yvain on self-defeating arguments.
Okay, but…
How so?
Every white wall is a non-sparrow not observed by God, hence evidence for God observing every sparrow’s fall. It is also a, um, no, you’re right, the second one doesn’t work.
How do we know that the wall is not observed by God?
Ah, quite so. God sees all, sparrows and walls alike. Both of those examples are broken.
An omnipotence-omniscience paradox: “God, look away!”—“I can’t!”
“There’s something a human could do that God couldn’t do, namely committing suicide.”
-- someone long ago, IIRC (Google is turning up lots of irrelevant stuff)
And since we usually desire the one thing we cannot have …
That one’s easily solvable, isn’t it? God could look away if he wanted to, but chose not to.
If sparrows do not exist, then “every sparrow falling is observed by God” and “no sparrow falling is observed by God” are both true. (And of course, every white wall is a tiny bit of evidence for “sparrows do not exist”, although not very good evidence since there are so many other things in the universe that also need to be checked for sparrow-ness.)
Well, we could use the word “evidence” in different ways (you requiring some magnitude-of-prior-shift).
But then you’d still need a word for “that-which-[increases|decreases]-the-probability-you-assign-to-a-belief”. Just because that shift is tiny doesn’t render it undefined or its impact arbitrary. You can say with confidence that 1/x remains positive for any positive x however large, and be it a googolplex (btw, TIL in which case 1/x would be called a googolminex).
Think of what you’re advocating here: whatever would we do if we disallowed strictly-speaking-correct-nitpicks on LW?
There’s a handy table, two of them in fact, of terminology for strength of evidence here. Up to 5 decibans is “barely worth mentioning”. How many microbans does “Zeus ate my homework” amount to?
You may be joking, but I do think LW (and everywhere else) would be improved if people didn’t do that. I find nitpicking as unappealing as nose-picking.
Nitpicking is absolutely critical in any public forum. Maybe in private, with only people who you know well and have very strong reason to believe are very much more likely to misspeak than to misunderstand, nitpicking can be overlooked. Certainly, I don’t nitpick every misspoken statement in private. But when those conditions do not hold, when someone is speaking on a subject I am not certain they know well, or when I do not trust that everyone in the audience is going to correctly parse the statement as misspoken and then correctly reinterpret the correct version, nitpicking is the only way to ensure that everyone involved hears the correct message.
Charitably I’ll guess that you dislike nitpicking because you already knew all those minor points, they were obvious to anyone reading after all, and they don’t have any major impact on the post as a whole. The problem with that is that not everyone who reads Less Wrong has a fully correct understanding of everything that goes into every post. They don’t spot the small mistakes, whether those be inconsequential math errors or a misapplication of some minor rule or whatever. And the problem is that just because the error was small in this particular context, it may be a large error in another context. If you mess up your math when doing Bayes’ Theorem, you may thoroughly confuse someone who is weak at math and trying to follow how it is applied in real life. In the particular context of this post, getting the direction of a piece of evidence wrong is inconsequential if the magnitude of that evidence is tiny. But if you are making a systematic error which causes you to get the direction of certain types of evidence, which are usually small in magnitude, wrong, then you will eventually make a large error. And unless you are allowed to call out errors dealing with small magnitude pieces of evidence, you won’t ever discover it.
I’d also like to say that just because a piece of evidence is “barely worth mentioning” when listing out evidence for and against a claim, does not mean that that evidence should be immediately thrown aside when found. The rules which govern evidence strong enough to convince me that 2+2=3 are the same rules that govern the evidence gained from the fact that when I drop an apple, it falls. You can’t just pretend the rules stop applying and expect to come out ok in every situation. In part you can gain practice from applying the rules to those situations, and in part it’s important to remember that they do still apply, even if in the end you decide that their outcome is inconsequential.
I disagree. Not all things that are true are either relevant or important. Irrelevancies and trivialities lower discussion quality, however impeccable their truth. There is practically nothing that anyone can say, that one could not find fault with, given sufficient motivation and sufficient disregard for the context that determines what matters and what does not.
In the case at hand, “evidence” sometimes means “any amount whatever, including zero”, sometimes “any amount whatever, except zero, including such quantities as 1/3^^^3”, and sometimes “an amount worth taking notice of”.
In practical matters, only the third sense is relevant: if you want to know the colour of crows, you must observe crows, not non-crows, because that is where the value of information is concentrated. The first two are only relevant in a technical, mathematical context.
The point of the Bayesian solution to Hempel’s paradox is to stop worrying about it, not to start seeing purple zebras as evidence for black crows that is worth mentioning in any other context than talking about Hempel’s paradox.
Few enough that it’s in the “barely worth mentioning” bracket, of course. (Under any kind of resource constraint, it wouldn’t be mentioned at all, however that only relates to its infinitesimal weight, not the nature of what it is (evidence).)
You say that shouldn’t be classified as evidence, I say it should. Note that the table is about strength of evidence.
If you look into your spam folder you’ll find plenty of evidence for penis extension pills and the availability of large amount of money in abandoned accounts at Nigerian banks.
This is actually a really tidy example of Bayesian thinking. People send various types of emails for a variety of reasons. Of those who send penis extension pill emails, there are (vaguely speaking) three possible groups:
People who have invented penile embiggening pills and honestly want to sell them. (I’ve never confirmed anybody to be in this group, so it may be empty.)
Scammers trying to find a sucker by spamming out millions of emails.
Trolls.
If you see emails offering to “Eml4rge your m3mber!!”, this is evidence for the existence of someone from one or more of these groups. Which group do you think is largest? Those spam emails are evidence for all of these, but not such strong evidence for choosing between them.
Don’t spam algorithms actually use Bayes rule to filter spam from non-spam, updating when you click “this is spam” or “this is not spam”?
Yes, this is exactly how Paul Graham went about solving the spam problem.
The value of anecdotal evidence on a subject depends on how good the other sources are. For example, in something like medicine where something like 1 in 5 studies wind up retracted, anecdotal evidence is reasonable useful. To say nothing of the social “sciences”.
Chapman’s follow-up.
“Absence of evidence isn’t evidence of absence” is such a ubiquitous cached thought in rationalist communities (that I’ve been involved with) that its antithesis was probably the most important thing I learned from Bayesianism.
I find it interesting that Sir Arthur Conan Doyle, the author of the Sherlock Holmes stories, seems to have understood this concept. In his story “Silver Blaze” he has the following conversation between Holmes and a Scotland Yard detective:
I am confused. I always thought that the “Bayes” in Bayesianism refers to the Bayesian Probability Model. Bayes’ rule is a powerful theorem, but it is just one theorem, and is not what Bayesianism is all about. I understand that the video being criticized was specifically talking about Bayes’ rule, but I do not think that is what Bayesianism is about at all. The Bayesian probability model basically says that probability is a degree of belief (as opposed to other models that only really work with possible possible worlds or repeatable experiments). I always thought this was the main thesis of Bayesianism was “The best language to talk about uncertainty is probability theory,” which agrees perfectly with the interpretation that the name comes from the Bayesian probability model, and has nothing to do with Bayes’ rule. Am I using the word in a way differently than everyone else?
That’s how I use it. This showed up in Yvain’s response:
That sounds too weak. Bayes is famous because of his rule—surely, Bayesianism must invoke it.
Just because Bayes did something awesome, doesn’t mean that Bayesianism can’t be named after other stuff that he worked on.
However, Bayesianism does invoke Bayes’ Theorem as part of probability theory. Bayes’ Theorem is a simple and useful part of Bayesian probability, so it makes a nice religious symbol, but I don’t see it as much more than that. Saying Bayesianism is all about Bayes’ Rule is like saying Christianity is about crosses. It is a small part of the belief structure.
Seems more like saying that Christianity is all about forgiveness. There’s a lot more to it than that, but you’re getting a lot closer than ‘crosses’ would suggest.
Yeah, that was a bit of an exaggeration.
I didn’t get a lot out of Bayes at the first CFAR workshop, when the class involved mentally calculating odds ratios. It’s hard for me to abstractly move numbers around in my head. But the second workshop I volunteered at used a Bayes-in-everyday-life method where you drew (or visualized) a square, and drew a vertical line to divide it according to the base rates of X versus not-X, and then drew a horizontal line to divide each of the slices according to how likely you were to see evidence H in the world where X was true, and the world where not-X was true. Then you could basically see whether the evidence had a big impact on your belief, just by looking at the relative size of the various rectangles. I have a strong ability to visualize, so this is helpful.
I visualize this square with some frequency when I notice an empirical claim about thing X presented with evidence H. Other than that, I query myself “what’s the base rate of this?” a lot, or ask myself the question “is H actually more likely in the world where X is true versus false? Not really? Okay, it’s not strong evidence.”
Maybe this wasn’t your intent, but framing this post as a rebuttal of Chapman doesn’t seem right to me. His main point isn’t “Bayesianism isn’t useful”—more like “the Less Wrong memeplex has an unjustified fetish for Bayes’ Rule” which still seems pretty true.
While this is true mathematically, I’m not sure it’s useful for people. Complex mental models have overhead, and if something is unlikely enough then you can do better to stop thinking about it. Maybe someone broke into my office and when I get there on Monday I won’t be able to work. This is unlikely, but I could look up the robbery statistics for Cambridge and see that this does happen. Mathematically, I should be considering this in making plans for tomorrow, but practically it’s a waste of time thinking about it.
(There’s also the issue that we’re not good at thinking about small probabilities. It’s very hard to keep unlikely possibilities from taking on undue weight except by just not thinking about them.)
I think about such things every time I lock a door. Or at least, I lock doors because I have thought about such things, even if they’re not at the forefront of my mind when I do them. Do you not lock yours? Do you have an off-site backup for your data? Insurance against the place burning down?
Having taken such precautions as you think useful, thinking further about it is, to use Eliezer’s useful concept, wasted motion. It is a thought that, predictably at the time you think it, will as events transpire turn out to not have contributed in any useful way. You will go to work anyway, and see then whether thieves have been in the night.
Tiny probabilities do not, in general, map to tiny changes in actions. Decisions are typically discontinuous functions of the probabilities.
I always lock doors without thinking, because the cost of thinking about whether it’s worth my time is higher than the cost of locking the door.
“Someone will break into something I own someday” is much more likely than “someone will break into my office tonight”. The former is likely enough that I do take general preparations (a habit of locking doors) but while there are specific preparations I would make to handle the intersection of that I planned to do at the office tomorrow and dealing with the aftermath of a burglary, that’s unlikely enough to to be worth it.
Does locking doors generally lead to preventing break-ins? I mean, certianly in some cases (cars most notably) it does, but in general, if someone has gone up to your back door with the intent to break in, how likely are they to give up and leave upon finding it locked?
And then one day 4 years later you find out that a black swan event has occurred and because you never prepare for such things (‘it’s a waste of time thinking about it’) you will face losses big enough to influence you greatly all at once.
Or not. - that’s the thing with rare events.
Well … you can have an expected direction, just not if you account for magnitudes.
For example if I’m estimating the bias on a weighted die, and so far I’ve seen 2⁄10 rolls give 6′s, if I roll again I expect most of the time to get a non-6 and revise down my estimate of the probability of a 6; however on the occasions when I do roll a 6 I will revise up my estimate by a larger amount.
Sometimes it’s useful to have this distinction.
Yes, on reflection it was a poor choice of words. I was using “expect” in that sense according to which one expects a parameter to equal zero if the expected value of that parameter is zero. However, while “expected value” has a well-established technical meaning, “expect” alone may not. It is certainly reasonably natural to read what I wrote as meaning “my opinion is equally likely to be swayed in either direction,” which, as you point out, is incorrect. I’ve added a footnote to clarify my meaning.
That’s not what “expected” means in these contexts.
Maybe ‘on expectation’ is clearer?
I’m well aware of this. My point was that there’s a subtle difference between “direction of the expectation” and “expected direction”.
The expectation of what you’ll think after new evidence has to be the same as you think now, so can’t point in any particular direction. However “direction” is a binary variable (which you might well care about) and this can have a particular non-zero expectation.
I’m being slightly ambiguous as to whether “expected” in “expected direction” is meant to be the technical sense or the common English one. It works fine for either, but to interpret it as an expectation you have to choose an embedding of your binary variable in a continuous space, which I was avoiding because it didn’t seem to add much to the discussion.
So to summarise in pop Bayesian terms, akin to “don’t be so sure of your beliefs; be less sure when you see contradictory evidence.” :
There is always evidence; if it looks like the contrary, you are using too high a bar. (The plural of ‘anecdote’ is ‘qualitative study’.)
You can always give a guess; even if it later turns out incorrect, you have no way of knowing now.
The only thing that matters is the prediction; hunches, gut feelings, hard numbers or academic knowledge, it all boils down to probabilities.
Absence of evidence is evidence of absence, but you can’t be picky with evidence.
The maths don’t lie; if it works, it is because somewhere there are numbers and rigour saying it should. (Notice the direction of the implication “it works” ⇒ “it has maths”.)
The more confident you are, the more surprised you can be; if you are unsure it means you expect anything.
“Knowledge” is just a fancy-sounding word; ahead-of-time predictions or bust!
ETA:
Choosing to believe is wishful thinking.
Yes—though that idea can usefully be generalised to conservation of evidence.
I’ll add the Bayesian definition of evidence an awareness of selection effects to the list.
Belongs in Main, methinks.
I suppose we all came across Bayesianism from different points of view—my list is quite a bit different.
For me the biggest one is that the degree to which I should believe in something is basically determined entirely by the evidence, and IS NOT A MATTER OF CHOICE or personal belief. If I believe something with degree of probability X, and see Y happen that is evidence for X, then the degree of probability Z which which I then should believe is a mathematical matter, and not a “matter of opinion.”
The prior seems to be a get-out clause here, but since all updates are in principle layered on top of the first prior I had before receiving any evidence of any kind, it surely seems a mistake to give it too much weight.
My own personal view is also that often it’s not optimal to update optimally. Why? Lack of computing power between the ears. Rather than straining the grey matter to get the most out of the evidence you have, it’s often best to just go out and get more evidence to compensate. Quantity of evidence beats out all sorts of problems with priors or analysis errors, and makes it more difficult to reach the wrong conclusions.
On a non-Bayesian note, I have a rule to be careful of cases which consist of lots of small bits of evidence combined together. This looks fine mathematically until someone points out the lots of little bits of evidence pointing to something else which I just ignored or didn’t even see. Selection effects apply more strongly to cases which consist of lots of little parts.
Of course if you have the chance to actually do Bayesian mathematics rather than working informally with the brain, you can of course update exactly as you should, and use lots of little bits of evidence to form a case. But without a formal framework you can expect your innate wetware to mess up this type of analysis.
Reading this clarified something for me. In particular, “Banish talk like “There is absolutely no evidence for that belief”.
OK, I can see that mathematically there can be very small amounts of evidence for some propositions (e.g. the existence of the deity Thor.) However in practice there is a limit to how small evidence can be for me to make any practical use of it. If we assign certainties to our beliefs on a scale of 0 to 100, then what can I realistically do with a bit of evidence that moves me from 87 to 87.01? or 86.99? I don’t think I can estimate my certainty accurately to 1 decimal place—in fact I’m not sure I can get it to within one significant digit on many issues—and yet there’s a lot of evidence in the world that should move my beliefs by a lot less than that.
Mathematically it makes sense to update on all evidence. Practically, there is a fuzzy threshold beyond which I need to just ignore very weak evidence, unless there’s so much of it that the sum total crosses the bounds of significance.
Very small amounts of evidence? Entire mythologies are quite strong evidence of something thor-like. The point is to be able to say “I don’t believe in Thor” and “That is strongish evidence for the existence of Thor” without conflict.
Your point about neglecting small shifts (likelihood ratio 1.0001) is well made, but your numbers are too charitable. When someone says “there is no evidence for X”, there is usually some substantial piece of evidence (LR>10) evidence, even quite strong evidence, known to them, not a tiny shift, but not totally conclusive either. The problem is that even substantial evidence usually has the problem you are pointing out (cost of consideration exceeds Value of Inforation).
Consider the difficulties of programming something like that:
Ignore evidence. If the accumulated ignored evidence crosses some threshold, process the whole of it.
You see the problem. If the quoted sentence is your preferred modus operandi, you’ll have to restrict what you mean by “ignore”. You’ll still need to file it somewhere, and to evaluate it somehow, just so when the cumulative weight exceeds your given threshold, you’ll be able to still update on it.
Realistically, humans seem to ignore it (and forget about it) unless they get a lot all at once. Yes, that’s a failure mode, but it’s not usually a major problem.
Or, I suppose, if they want to believe it, but that’s hardly the same thing.
We should unpack “banish talk of X” to mean that we should avoid assessments/analysis that would naturally be expressed in such surface terms.
Since most of us don’t do deep thinking unless we use some notation or words, “banish talk of” is a good heuristic for such training, if you can notice yourself (or others can catch you) doing it.
The selection biases in anecdotes make them nearly useless for updating. A more correct version would be that you can update on the first anecdote, less on a similar second one, even less on a third, and so on. Once you have ten or so anecdotes pointing in the same direction, then extra anecdotes should have essentially no impact.
So yes, the plural of anecdote is not data. Their value does not scale in the same way.
It depends on how independent the anecdotes are.
I wouldn’t agree that 100 reviews on TripAdvisor for a hotel should weigh little more than 10 reviews.
That approaches data more.
Essentially the difference between the two is how systematic the gathering of information is. A juicy urban legend gets passed around and repeated with small variations all other place: hearing it ten times is uninformative. TripAdivsor gathers reviews in a more systematic way, so is better. If people started sending each other the snarkiest reviews they saw on TripAdvisor, this would degenerate more into anecdotes again.
The question is always, are you getting a reasonable sample of the anecdotes out there.
The plural of anecdote is qualitative study.
One thing it apparently taught Jaynes:
Interesting that Jaynes took that position! It seems to mesh with the MWI position on these things, that all quantum uncertainty about the future is really a kind of anticipated indexical uncertainty.
What were all the physicists smoking? .
There’s consistency there, but since when did consistency imply correspondence with reality?
Did you learn these lessons exclusively by exposing yourself to the Bayesian ideas floating on LessWrong, or would you credit these insights at least partly to “external” sources? You mention Jaynes’s book. Has reading this book taught you some of the lessons you list above? Is there other material you’d recommend for folks interested in having more Bayesian Aha! moments?
I’d just like to note that Bayes’ Rule is one of the first and simplest theorems you prove in introductory statistics classes after you have written down the preliminary definitions/axioms of probability. It’s literally taught and expected that you are comfortable using it after the first or second week of classes.
I have never been taught Bayes’ theorem in statistics classes.
And, presumably, you have taken an introductory statistics class? Hm. Probably in high school, and then in college (assuming you’ve been) you skipped the introductory class and took one with only T-tests etc. and no counting problems? Seems like the most likely way to miss learning Bayes’ theorem and still make that statement.
I have had no choice over my classes and the statistics classes at university I had as a part of my program taught us from the very basics (and I’ve had 4 times more statistics classes than any other class as a part of my program except for research methods). I studied Psychology in the UK.
Bayes’ Theorem is taught in High-School here, at all levels of math.
Where is “here”? I didn’t encounter Bayes’ Rule in an academic setting until I took a finite maths class at university (in Arizona, US).
Edit: Well, actually I was recommended a book called Choice and Chance by Brian Skyrms by a philosophy professor which explicitly teaches it in the context of Bayesian epistemology, but that was the result of out-of-class conversation and was not related to any particular course I was taking. BTW, I whole-heartedly recommend the book as an introduction to inductive logic.
By “here,” I meant Israel.
A data point from me as well:
At my university, I learned Bayes’ theorem in both my Intro to Statistics class (there was a whole section on Bayesian probability), and in my AI class.
Data point—in my intro stats course (at college), ‘Bayes Theorem’ was never explicitly taught, but you get all the probability required and are given Bayes-like problems (that I explicitly used Bayes’ to solve) - they just never put an intimidating theorem on the board.
This may or may not be standard in intro stats courses
In the CMU OLI course on statistics, Bayes Theorem is presented late on in the course, very briefly, and as restricted to simple population sampling; it’s very easy to see how someone taking it could forget about it the day after doing the problems.
I’d like to add that if the curriculum has a distinction between “probability” and “statistics”, it is taught in the “probability” class. Much later, the statistics class has “frequentist” part and “bayesian” part.
Honestly, I feel like if Eliezer had left out any mention of the math of Bayes’ Theorem from the sequences, I would be no worse off. The seven statements you wrote seem fairly self-evident by themselves. I don’t feel like I need to read that P(A|B) > P(A) or whatever to internalize them. (But perhaps certain people are highly mathematical thinkers for whom the formal epistemology really helps?)
Lately I kind of feel like rationality essentially comes down to two things:
Recognizing that as a rule you are better off believing the truth, i.e. abiding by the Litany of Tarski.
Having probabilistic beliefs, i.e. abiding by the Bayesian epistemology and not the Aristotelian or the Anton-Wilsonian as Yvain defined in his reaction to Chapman, or having an many-color view as opposed to a two-color view or a one-color view as Eliezer defined in the Fallacy of Gray.
Once you’ve internalized these two things, you’ve learned this particular Secret of the Universe. I’ve noticed that people seem to have their minds blown by the sequences, not really learn all that much more by spending a few years in the rationality scene, and then go back to read the sequences and wonder how they could have ever found them anything but obvious. (Although apparently CFAR workshops are really helpful, so if that’s true that’s evidence against this model.)
It’s a bit like learning thermodynamics. It may seem self-evident that things have temperatures, that you can’t get energy from nowhere, and that the more you put things together, the more they fall apart, but the science of thermodynamics puts these intuitively plausible things on a solid foundation (being respectively the zeroth, first, and second laws of thermodynamics). That foundation is itself built on lower-level physics. If you do not know why perpetual motion machines are ruled out, but just have an unexplained intuition that they can’t work, you will not have a solid ground for judging someone’s claim to have invented one.
The Bayesian process of updating beliefs from evidence by Bayes theorem is the foundation that underlies all of these “obvious” statements, and enables one to see why they are true.
Yes, who knows how many other ‘obvious’ statements you might believe otherwise, such as “Falsification is a different type of process from confirmation.”
Falsifying X is obviously the same as confirming not-X … but confirming that the culprit was Mortimer Q. Snodgrass is quantitatively very different from confirming that the culprit was not Mortimer Q. Snodgrass, and like someone once said, a qualitative difference is just a quantitative difference that is large enough.
I saw Yvain describe this experience. My experience was actually kind of the opposite. When I read the sequences, they seemed extremely well written, but obvious. I thought that my enjoyment of them was the enjoyment of reading what I already knew, but expressed better than I could express it, plus the cool results from the heuristics-and-biases research program. It was only in retrospect that I noticed how much they had clarified my thinking about basic epistemology.
That’s very interesting that your experience was the opposite.
And yeah, I saw where Yvain wrote that he and a friend shared that experience, and I noticed that I shared it exactly as well. It also seems to match with attitudes I had seen around, so I feel like it could be fairly general.
For me, reading the first chapter of Probability Theory by Jaynes showed me that what thus far had only been a vague intuition of mine (that neither what Yvain calls Aristotelianism nor what Yvain calls Anton-Wilsonism were the full story) actually had a rigorous quantitative form that can be derived mathematically from a few entirely reasonable desiderata, which did put it on a much more solid ground in my mind.
Really? Even the fifth one ;) ?
What happens when they reach this post?
The math definitely very much helped me understand the concepts. I’ve found myself sitting down and explicitly working out the probability calculations when reading some posts in the Sequences (and other posts on LW). (I guess I count as a “highly mathematical thinker”?)
If you are not going to do an actual data analysis, then I don’t think there is much point of thinking about Bayes’ rule. You could just reason as follows: “here are my prior beliefs. ooh, here is some new information. i will now adjust my believes, by trying to weigh the old and new data based on how reliable and generalizable i think the information is.” If you want to call epistemology that involves attaching probabilities to beliefs, and updating those probabilities when new information is available, ‘bayesian’ that’s fine. But, unless you have actual data, you are just subjectively weighing evidence as best you can (and not really using Bayes’ rule).
The thing that can be a irritating is when people then act as if that kind of reasoning is what bayesian statisticians do, and not what frequentist statisticians do. In reality, both types of statisticians use Bayes’ rule when it’s appropriate. I don’t think you will find any statisticians who do not consider themselves ‘bayesian’ who disagree with the law of total probability.
If you are actually going to analyze data and use bayesian methods, you would end up with a posterior distribution (not simply a single probability). If you simply report the probability of a belief (and not the entire posterior distribution), you’re not really doing conventional bayesian analysis. So, in general, I find the conventional Less Wrong use of ‘bayesian’ a little odd.
Yes, the importance of thinking in terms of distributions instead of individual probabilities is another valuable lesson of “pop” Bayesianism.
...Common? Maybe in successful circles of academia.
What a bizarre question. I find it difficult to believe that this person has any experience with the average/median citizen.
Your argument might be helped if you provided some examples of the average/median citizen needing to be told such things. There might even be a name for what those examples are, which you would present to induce others to be less sure of the beliefs which these examples contradict.
I’ve heard “evidence” tossed around as something you might want to provide.
How about religion? There is a variety of them and they can’t all be right (many claim to be the only true one), yet people tend to just believe whichever one they happen to have been raised to believe. They are believing in these massive cosmic arrangements and belief structures...by accident of where they happen to have been raised. And I always have to tell them this.
I expected a very high “obviousness” to my assertion that the median citizen needs to be told these things; that’s why I didn’t even bother giving evidence. Why is this necessary?
Nightly news being so incredibly vacuous is pretty strong evidence that the mean citizen is bad at weighing probabilistic evidence. (e.g. “What common household item might kill you? Find out at the end of this newshour.”)
“Knife”, “bleach”, “alcohol”, “(overdose of) medication”, “rope”, “plastic bag”, “Hufflepuff bones...”.
Matter of fact there are thresholds below which extra processing cost does not pay off (or, in case of human head, when it is extremely implausible that the processing is even going to be performed correctly).
Probabilistic reasoning is, in general, computationally expensive. In many situations, a very large number of combinations of uncertain parameters has to be processed, the cross correlations must be accounted for, et cetera.
Actions conditioned on evidence generally have higher expected utility than those not conditioned on evidence, and processing of beliefs conditioned on evidence likewise so.
The expected utility sums for things such as expenditure of resources have a term for resources being kept for the future uses which may be more conditional on evidence, and the actions that are less evidence conditioned than usual ought to lose out to the bulk of possible ways one may act in the future (edit: the ways which you can’t explicitly enumerate).
That’s just some of the thresholds that an optimally programmed intelligence (on a physically plausible computer) would apply.
This is sort of like going on about what Maxwell’s equations taught you about painting. Maxwell’s equations are quite far from painting, about as far in terms of inference length as Bayes theorem is from most actual decision making or belief forming. edit: make that much further, actually, considering that there’s no AI.
Let’s make an example to clarify. There is Bob. Bob being rich would be evidence for him having a good job. Bob having a good job would be evidence for him being rich. Both of those would be evidence with regards to Bob’s education, and so on and so forth. Everything cross correlates to everything else, the belief propagation is NP complete, the algorithms for computing it are very nontrivial, and various subtle implementation errors would make everything converge on a completely wrong value.
Strengths of relevant relations between beliefs about Bob are themselves beliefs, so the graph is pretty damn huge. When known probabilities get fairly close to 0 and 1, it is tractable whenever unknowns are close to 0 or 1. But when they’re closer to the middle, you’re dealing with a very complicated relation. And if you could compute the resulting equation in your head, well, there’s a lot of lesser engineering tasks that you should absolutely breeze through.
edit: expanded a bit. Really, we do know how to progressively approximate from quantum electrodynamics to geometric optics to drawing 3d shapes to painting, but we do not know how to get from Bayes theorem to a full blown AI on physically plausible hardware.
I agree with your points about the value of information. Indeed, as Vaniver said, Bayesianism (i.e., “qualitative Bayes”), together with the idea of expected-utility maximization, makes the importance of VoI especially salient and easy to understand. So I’m a little puzzled by your conclusion that
… because your argument leading up to this conclusion seems to me to be steeped in Bayesian thinking through-and-through :). E.g., this:
I’d describe Bayesianism as a belief in powers of qualitative Bayes.
E.g. you seem to actually believe that taking into account low grade evidence, and qualitatively at that, is going to make you form more correct beliefs. No it won’t. Myths about Zeus are weak evidence for great many things, a lot of which would be evidence against Zeus.
The informal algebra of “small”, “a little”, “weak”, “strong”, “a lot”, just doesn’t work for the equations involved, and even if you miraculously used actual real numbers behind those labels, you’d still have enormously huge sums over all the things implied by existence of the myths.
Firstly, I’m trying to deal just with the things that I am very confident about (computational difficulties), so the inferences are normal logic, and secondarily, I’m trying to persuade you, so I express that in your ideology.
edit: To summarize. You are accustomed to processing evidence1, and to saying that many things are not evidence1. Bayes taught you that everything is evidence2 . You started treating everything as evidence1 because it’s the same word. Whereas evidence1 is evidence that is strong enough and unequivocal enough that a lot of quite rough but absolutely essential approximations work correctly (and it can be more or less usefully processed), and evidence2 is weak and nearly equivocal, all things considering, and those approximations will just plain not work, while exact solutions are too expensive and very complicated even for simple cases such as my Bob example above.
Occam’s razor should be on your list. Not in the “Solomonoff had the right definition of complexity” sense, but in the sense that any proper probability distribution has to integrate to 1, and so for any definition of complexity satisfying a few common sense axioms the limit of your prior probability has to go to zero as the complexity goes to infinity.
I think you’ve oversimplified the phrasing of 6 (not your fault, though; more the fault of the English language). Although your expected value for your future estimate of P(H) should be the same as your current estimate of P(H), that doesn’t imply symmetry of expected future evidence. For example, I have a very high expectation that future evidence will very slightly increase my already very strong belief that aliens are not visiting Earth; this is mostly balanced out by a very tiny expectation that future evidence will strongly decrease that belief.
What are these axioms?
Right. In general, the distribution for your posterior probability is by no means symmetric about your prior probability.
Assuming you think only in terms of discrete options, I think the only axiom you need is that for any level of complexity k there is at least one option that complex.
EDIT: I’m wrong, you don’t even need this.
Does this give one any reason to believe that, if two hypotheses are under consideration, the simpler one is a priori more likely? If not, it seems to me to be missing something too crucial to be called a formalization of Occam’s razor.
Right, you’d need more than that one axiom before you could really say you had a formulation of Occam’s Razor. I’m just making a more specific point, that whatever formulation of complexity you come up with, so long as it satisfies the axiom above, will have the property that any probability distribution over discrete outcomes must assign diminishing probability to increasingly complex hypotheses in the limit.
EDIT: actually even without that axiom, so long as you consider only discrete hypotheses and your definition of complexity maps hypotheses to a real positive number representing complexity, you will have that the mass of probability given to hypotheses more complex than x falls to zero as x goes to infinity.
Assuming you restrict to discrete probability distributions.
Modern Bayesianism may perhaps be notable for showing the limitations of Occam’s razor—which was previously a widely accepted doctrine.
There might be a normative rule to that effect , but probabilities in your brain can’t change by infinitesimal increments. Bayes as applied by cognitively limited agents like humans has to have some granularity.
Why not? Because it is not useful? Because those problems have been solved?
In the interests of full disclosure :-)
I think of Bayesianism as a philosophy in statistics, specifically one which is opposed to frequentism. The crucial difference is the answer to the question “What does ‘probability’ mean?”
There is also Bayesian analysis in statistics which is a toolbox of techniques and approaches (see e.g. Gelman) but which is not an “-ism”.
I do not apply the label of “Bayesianism” to all the Kahneman/Tversky kind of stuff, I tend to think of it a mind bugs (in the programming sense) and mind hacks.
I do not think of Bayesianism as an epistemology. I understand Yvain’s middle ground between Aristotle and Robert Anson Wilson, but it looks pretty obvious to me and I wouldn’t call it Bayesianism.
Why not? It looks to me like the Bayesianist statistical philosophy’s answer to “What does ‘probability’ mean?” is the same as Yvain’s middle ground epistemology’s answer to “what does ‘certainty’ mean?”, and if they’re not the same they seem very close.
(Also, I’m curious how you would unpack the practical consequences of a position being obvious to you. I think ‘empiricism’ is obvious, but I also think that labeling it is a very good idea.)
I think of epistemology as bigger than just “what does ‘certainty’ mean?”—e.g. the issues of what is knowledge, how can it be acquired, etc. You can build an epistemology on the foundation of the Bayesian approach, but you cannot reduce the whole epistemology to it.
In this particular case it’s easy. Yvain set up an axis with two endpoints: the black-and-white Aristotelian approach and the pure-black Wilsonian approach. Both of these endpoints fail pretty clearly in real life. No one can operate either on the basis “I only believe what I proved to be 100% true and do not believe the rest at all” or “I do not believe anything”. Everyone has degrees of belief regardless of how ready they are to formalize it in some framework. Therefore a spectrum-of-grey approach, the “middle ground” looks obvious to me.
The thing as I understand is that Bayesian Reasoning is half of the pie. In Highly Advanced Epistemology 101 for Beginners, EY explains the thesis that “meaning” is twofold: Either you talk about causality links in the Bayesian sense (which requires Bayesian reasoning), or you talk about mathematical proofs and definitions.
In short, Inductive logic is Bayesian reasoning, and deductive logic is proof writing. That age old dispute out of the way:
“What is knowledge?” it’s a word. It corresponds to an epistemic cluster in thingspace, comprised mostly of high-probability beliefs, supported by much physical evidence and experience. How you go about acquiring it should be obvious from that description.
What is an “epistemic cluster in thingspace”?
And let’s take, say, a mystic, a Christian mystic as an example. Does she have knowledge of God?
1) The Cluster Structure of Thingspace, an artifact of reality is that similar things are grouped together in gaussian-like distributions in the phase space of observables’ values. We, as humans, have evolved to use that.
2) Subjectively to herself, yes. Subjectively to me, no. A mystic and I most likely have differing opinions on what constitutes evidence; I however describe evidence in such a way that it works when I do things out in the real world. Rationalists should Win and all that.
Still not making sense to me.
Knowledge corresponds to the claim “that similar things are grouped together in gaussian-like distributions in the phase space of observables’ values”, really?
So, I know how to ride a bike. How does that knowledge fit into this cluster?
Is knowledge subjective, then? Me and you can have radically different knowledge?
Lumifer, you are falling prey to several of the traps detailed in A Human’s Guide to Words. So far I have basically parroted EY’s 102 material.
Meditation: Taboo “Knowledge” and describe your relation with riding a bicycle.
Meditation: Taboo “Knowledge” and describe your relation with some field of science you are proficient in.
Meditation: Taboo “Knowledge” and describe a religious person’s views on god.
...
...
You and me both know what ‘knowledge’ is in everyday speech. The problem is what constitues ‘knowlege’ in extreme situations.
The thing is that “Knowledge” is ambiguous in everyday speech. We misunderstood each other when I initially answered your question: I thought you were speaking about the tried and tired philosophical issue that have been discussed for ages.
The answer in the Philosophical Issue of Knowledge is: “You philosophers are all morons; you are using the same word to mean different things.”
Plato has a famous definition of “Knowledge”: Justified True Belief. Notice how he has moved the problem of explaining “Knowledge” into the problem of explaining “Justification.” (And “True.” And “Belief.” Neither concepts were actually well explained when Plato was alive and kicking.)
“Knowledge” can also be a synonym for “Skill.” Such as knowing how to ride a bicycle. Notice how the grammatical construction “knowing how to .” is different from “knowing to be true.” One could argue that they are the same thing, but I think they are not. So we have at least two types of everyday discussed knowledge: Procedural Knowledge (how to do stuff) and Object Knowledge (facts and stuff).
The distinction between the two is obvious when you really taboo it: Procedural knowledge is like a tool. It is a means to an end, an extension of your primitive action set. Having lots of procedural knowledge is a boon in Instrumental Rationality, but most skills are irrelevant to Epistemological Rationality. (Riding a cicycle will only very rarely tell you the secrets of the universe.)
Object Knowledge, or Facts, are thingies in your mental model of how the world works. This mental model is what you use when you want to predict how the world is going to behave in the future, so that you can make plans. (Because you have goals you want to attain.)
Your world model is updated automatically by processes which you do not control. A sufficiently advanced agent might be able to excercise some control, at least at the design level, of it’s updating algorithms. In short, you take in sensory data and crunch some numbers and out comes a bayesian-esque update.
So my standing viewpoint is: I don’t care what you call it; “knowledge” or “hunch” or “divine inspiration.” I care about what your probability distribution over future events is. I don’t care what you call it “skills” or “knowledge” or “talent.” I care about what sort of planning algorithm you implement.
And on the topic of subjectivity: If I have trained skills or observed evidence different from you, then yes we have subjectively different “knowledge.” I for instance know 12 programming languages and intimate facts about my significant other.
But the thing is that there is only One Correct Way of updating on evidence: Bayes Theorem. If you deviate from that you will have less than optimal predictive power.
I really suggest you go and read some of the core sequences to refresh this.
I think the dichotomy between procedural knowledge and object knowledge is overblown, at least in the area of science. Scientific object knowledge is (or at least should be) procedural knowledge: it should enable you to A) predict what will happen in a given situation (e.g. if someone drops a mento into a bottle of diet coke) and B) predict how to set up a situation to achieve a desired result (e.g. produce pure L-glucose).
Sure. I have the ability to manipulate the physical object commonly known as “bicycle” to perform actions which roughly correspond to my wishes.
Sure. I am familiar with a commonly accepted (in this particular field) set of facts about the reality and I can use the usual (to this particular fields) methods to explore the reality further and/or use the methods to figure out the outputs/consequences knowing the inputs/conditions/actions.
Sure. I specifically mentioned mystics, so a mystic has had direct, personal experience of being in the presence of God and of communicating with God.
To continue, I am aware of the difference between procedural knowledge and object knowledge. It’s not absolute, of course, and can be argued to be an artifact of the map, not present in the territory. Note that both are subtypes of knowledge.
You can think of both of these types as knowing which levers of reality to press and which dials to turn to get the results you want. You say that object knowledge is “mental model of how the world works”—but isn’t this exactly what procedural knowledge is? You can make the argument that procedural knowledge is “active” and objective knowledge is “passive”, but that doesn’t look like that major a difference.
Partially. My world model is updated both consciously and subconsciously.
Well, just because that’s the only thing you care about doesn’t mean the rest of the humanity is limited in the same way.
The Sacred Truth Not To Be Doubted! :-D
I think you’re confusing some basic statistics and real life which is, to put it very mildly, complex.
Only provided you have looked, and looked in the right place.
Many things, if real, would have some nonzero chance of obtruding on your awareness, even if you haven’t looked for them. The fact that this hasn’t happened is evidence against their existence.
That presents and interesting chicken-and-egg problem, don’t you think?
I can’t consider existence or non-existence of something without that something “obtruding on my awareness” which automatically grants it evidence for existence. And I cannot provide this evidence against the existence of anything because as soon as it enters my mind, poof! the evidence against disappears and the evidence for magically appears in its place.
Anyway, I know the point you’re trying to make. But taking it to absurd lengths leads to absurd results which are generally not the desired outcome.
Sorry, I wasn’t clear. I didn’t mean “obtruding on your awareness” in the sense of having the idea of the thing occur to you. I meant that you encounter the thing in a way that is overwhelming evidence for its existence. Like, maybe you aren’t looking for goblins, but you might one day open the lid of your trashcan and see a goblin in there.
I am confused. So if you DON’T “encounter the thing in a way that is overwhelming evidence for its existence” then you have evidence against its existence?
That doesn’t seem reasonable to me.
Yes. Let
H = “Goblins exist.”
E = “I’ve seen a goblin in my trashcan under circumstances that make the existence of goblins overwhelmingly likely (in particular, the probability that I was hallucinating or dreaming is very low).”
Let us further suppose that the prior probability that I assign to the existence of goblins is very low.
Then P(H | E) > P(H). Hence, P(H | ~E) < P(H). Therefore, the fact that I haven’t seen a goblin in my trashcan is evidence against the existence of goblins.
Of course, it may be very weak evidence. It may not be evidence that I, as a computationally limited being, should take any time to weigh consciously. But it is still evidence.
As I said, I understand the point. To demonstrate my problem, replace goblins with tigers. I don’t think the fact that I haven’t seen a tiger in my trashcan is evidence against the existence of tigers.
In a world where tigers didn’t exist, I wouldn’t expect to see one in my trashcan. In a world where tigers did exist, I also wouldn’t expect to see a tiger in my trashcan, but I wouldn’t be quite as surprised if I did see one. My prior probability that tigers exist is very high, since I have lots of independent reasons to believe that they do exist. The conditional probability of observing no tiger in my trashcan is skewed very slightly towards the world where tigers do not exist, but not enough to affect a prior probability that is very close to 100% already. You could say the same for the goblin example, etc–my prior probability is close to zero, and although I’m more likely not to observe a goblin in my trashcan in the world where goblins don’t exist, I’m also not likely to see one in the world where goblins do exist. The prior probability is far more skewed than the conditional probability, so the evidence of not observing a goblin doesn’t affect my belief much.
The fact that you haven’t seen a tiger in your trashcan is, however, evidence that there is no tiger in your trashcan.
Edit: Which I think is more or less harmonious with your original post. It appears to me, however, that at some step in the discussion, there was a leap of levels from “absence of evidence for goblins in the trashcan is evidence of absence of goblins from the trashcan” to “absence of evidence for goblins in the trashcan is evidence for the complete nonexistence of goblins”.
For practical purposes, sure, this is a case where “absence of evidence is evidence of absence” is not a very useful refrain. The evidence is so weak that it’s a waste of time to think about it. P(I see a tiger in my trashcan|Tigers exist) is very small, and not much higher than P(I see [hallucinate] a tiger in my trashcan|Tigers don’t exist). A very small adjustment to P(Tigers exist), of which you already have very high confidence, isn’t worth keeping track of… unless maybe you’re systematically searching the world for tigers, by examining small regions one at a time, each no more likely to contain a tiger than your own trashcan. Then you really would want to keep track of that very small amount of evidence: if you round it down to no evidence at all, then even after searching the whole world, you’d still have no evidence about tigers!
It’s not fully accurate to say
but it might be a useful heuristic. “Be mindful of the strength of evidence, not just its presence” would be more precise, because looking in the right place does provide a much higher likelihood ratio than not looking at all.
Is it because you deny that P(H | E) > P(H) in this case? Or do you acknowledge that P(H | ~E) < P(H) is true in this case, but you don’t interpret it as meaning “the fact that I haven’t seen a tiger in my trashcan is evidence against the existence of tigers.”
If you deny that P(H | E) > P(H), this might be because your implicit prior knowledge already screens off E from H. Perhaps we should, following Jaynes, always keep track of your prior knowledge X. Then we should rewrite P(H | E) > P(H) as P(H | E & X) > P(H | X). But if your prior knowledge already includes, say, seeing tigers at the zoo, then the additional experience of seeing a tiger in your trashcan may not make tigers any more likely to exist. That is, you could have that P(H | E & X) = P(H | X).
In that case, if you’ve already seen tigers at the zoo, then their absence from your trashcan does not count as evidence against their existence.
In this case I don’t think P(H | ~E) < P(H) applies.
/me looks into the socks drawer, doesn’t find any tigers
/me adjusts downwards the possibility of tigers existing
/me looks into the dishwasher, doesn’t find any tigers
/me further adjusts downwards the possibility of tigers existing
/me looks into the fridge, doesn’t find any tigers
...
You get the idea.
Sorry, I think that I was editing my comment after you replied. (I have no excuse. I think what happened was that I was going to make a quick typofix, but the edit grew longer, and by the end I’d forgotten that I had already submitted the comment.)
How do you react to my conjecture that your background knowledge screens off (or seems to) the experience of seeing a tiger in your trashcan from the hypothesis that tigers exist?
I don’t think screening off helps with the underlying problem.
Let’s recall where we started. I commented on the expression “absence of evidence is evidence of absence” by saying “Only provided you have looked, and looked in the right place.”
The first part should be fairly uncontroversial. If you don’t look you don’t get any new evidence, so there’s no reason to update your beliefs.
Now, the second part, “the right place”. In this thread Wes_W gives a numerical example that involves searching for tigers in houses and says that you need to search about 5 billion houses to drop your confidence to 90% -- and if you search a trillion houses and still don’t find a tiger, “then you’d be insane to still claim that tigers probably do exist.”
Well, let’s take this example as given but change one little thing. Let’s say I’m not looking for tigers—instead, I heard that there are two big rocks, Phobos and Deimos, and I’m looking for evidence of their existence.
I search a house and I don’t find them. I search 5 billion houses and I don’t find them. I search a trillion houses and still don’t find them. At this point would I be insane to believe Phobos and Deimos exist?
That is the issue of “looking in the right place”.
I agree that the “looking” part is important: Looking and not finding evidence is a different kind of “absence of evidence” than just not looking.
I think it would indeed be pretty silly to maintain that a) they exist and b) each house has an independent 10^-9 chance of containing them, after searching a trillion houses and finding neither. But if you didn’t place much credence in anything like b) in the first place, your confidence in a) may not be meaningfully altered. If you already thought Phobos and Deimos were moons of Mars, then you would have extremely minimal evidence against their existence. But again, we can construct a Paradox of the Heap-type setup where you search the solar system, one household-volume at a time, and if all of them come up empty you should end up thinking Phobos and Deimos probably aren’t real, so each individual household-volume must be some degree of evidence.
My thought here—and perhaps we agree on this, in which case I’m happy to concede the point—is that the need to look in the right place is technically already covered by the relevant math, specifically by the different strengths of evidence. But for us puny humans that are doing this without explicit numerical estimates, and who aren’t well-calibrated to nine significant figures, it’s a good rule of thumb.
(This comment has been edited multiple times. My apologies for any confusion.)
Well, you’d do better to search all of those volumes at once. Doing it one volume at a time has a significant chance of failing to find the moons even if they exist, since the moons move over time, and therefore failing to find them isn’t significant evidence of their nonexistence.
But that’s largely orthogonal to your point.
Kinda hard to do, but more to the point, the assumption that a single search is sufficient (= nothing changes with time) may not be true.
In fact, if you want to update your beliefs with absence of evidence, then every time your glance sweeps across a volume of space which physically could hold a tiger you need to update your beliefs about non-existence of tigers.
And then you get into more trouble because if your beliefs in (non)existence of tigers are time-specific, as they should be, the evidence from the previous second might not be relevant to the (non)existence of tigers in the next second. You need specific assumptions about persistence of entities like tigers on certain time scales (e.g. tigers don’t persist on the time scale where the unit of time is a billion years).
(nods) Systems that don’t assign very low priors to such “evasive” events can easily wind up incorrigibly believing falsehoods, even if they process evidence properly.
Meaningfully? I thought we were counting infintesimals :-D
If we are talking about “meaningfully altered” (or what I’d call “detectable”) then not finding a tiger in my rubbish bin does not meaningfully alter my beliefs and the absence of evidence is NOT evidence of absence.
I am not sure of that. First, we’re concerned with statistics, not math (and I think this is a serious difference). Second, I haven’t thought this through, but I suspect a big issue here is what exactly your belief is. To give a quick example, when you don’t find a tiger in your garbage, is the tiger live and real or plush and a toy? When you’re unsure about the existence of something, your idea of what exactly that something is can be fuzzy and that affects what kind of evidence you’ll accept and where will you look for it.
As in “for most practical purposes, and with human computational abilities, this is no update at all”. I’m not sure we can usefully say this isn’t really evidence after all, or we run into Paradox of the Heap problems.
Let me give an example where I think “absence of evidence is evidence of absence” is applicable, even though I’m not sure anyone has ever looked in the right place: Bigfoot.
Bigfoot moves around. It is possible that all of our searches happen to have missed it, like the one-volume-at-a-time search mentioned above.
We don’t really know much about Bigfoot, so it’s hard to be sure if we’ve been looking in the right place. Nor are we quite sure what we’re looking for.
And any individual hike through the woods has a very, very small chance of encountering Bigfoot, even if it does exist, so any looking that has happened by accident won’t be especially rigorous.
Nevertheless, if Bigfoot DID exist, we would expect there to be some good photographs by now. No individual instance of not finding evidence for Bigfoot is particularly significant, but all of the searches combined failing to produce any good evidence for Bigfoot makes me reasonably confident that Bigfoot doesn’t exist, and every year of continued non-findings would drive that down a little more, if I cared enough to keep track.
Similar reasoning is useful for, say, UFOs and the power of prayer. In both cases, it is plausible that none of our evidence is really “looking in the right place” (because aliens might have arbitrarily good evasion capabilities [although beware of Giant Cheesecake Fallacy], because s/he who demands a miracle shall not receive one and running a study on prayer is like demanding a miracle, etc), but the dearth of positive evidence is pretty important evidence of absence, and justifies low confidence in those claims until/unless some strong positive evidence shows up.
Oh, of course there are situation where “absence of evidence is evidence of absence” is applicable.
For a very simple example, consider belief in my second head. The absence of evidence that I have a second head is for me excellent evidence that I do not, in fact, have a second head.
The discussion is really about whether AoE=EoA is universal.
The second half of the sentence was the reason I was bringing it up in this context. We’ve looked, kinda, and not very systematically, and maybe not in the right places, but haven’t found any evidence. Is it fair to call this evidence against paranormal claims?
It’s complicated, I don’t think this problem can be resolved in one or two sentences.
For example, there is clear relationship to how specific the claim/belief is. Lack of evidence is more important for very specific and easily testable claims (“I can bend this very spoon in front of your eyes”) and less important for general claims (“some people can occasionally perform telekinesis”).
Oh, and there’s lot of evidence for paranormal claims. It’s just that this evidence is contested. Some of it has been conclusively debunked, but not all.
Trying to not get sidetracked into that specific sub-discussion: should you be skeptical of any given paranormal claim (specific or general), if some people have tried but nobody has been able to produce clear evidence for it? “Clear evidence” here meaning “better evidence than we would expect if the claim is false”, per the Bayesian definition of evidence.
Should you be more or less skeptical than upon first hearing the claim, but before examining the evidence about it?
I think I’m not getting why you object to “AoE is EoA”, if appending ”...but sometimes it’s so weak that we humans can’t actually make use of it” doesn’t resolve the disagreement in much the same way that ”...but only provided you have looked, and looked in the right place” does.
I am not sure that that means. Example: I claim that this coin is biased. I do a hundred coin flips, it comes up heads 55 times. Is this “clear evidence”?
Oh-oh, that’s a question about how you should form your prior. The Bayesian approach is notoriously reticent about discussing this.
But you can think about it this way: invert the belief and make it “Everyone who claims to have paranormal powers is a fraud”. When another one is debunked, it’s positive evidence for your belief and you should update it. The more people get debunked, the stronger your belief gets.
Does it ever get strong enough for you to dismiss all claimed evidence of paranormal powers sight unseen? I don’t know—it depends on your prior and on how did you update. I expect different results with different people.
Without crunching the numbers, my best guess is no, a fair coin is not very unlikely to come up heads 55 times out of 100. I would guess that no possible P(heads) would have a likelihood ratio much greater than 1 from that test.
If one of the hypotheses is that the coin is unfair in a way that causes it to always get exactly 55 heads in 100 flips, that might be clear/strong evidence, but this would require a different mechanism than usually implied when discussing coin flips.
I don’t know either. This is a rather different question from whether you’re getting evidence at all, though.
No need for best guesses—this is a standard problem in statistics. What it boils down to is that there is a specific distribution of the number of heads that 100 tosses of a fair coin would produce. You look at this distribution, note where 55 heads are on it… and then? What is clear evidence? how high a probability number makes things “likely” or “unlikely”? It’s up to you to decide what level of certainty is acceptable to you.
The Bayesian approach, of course, sidesteps all this and just updates the belief. The downside is that the output you get is not a simple “likely” or “unlikely”, it’s a full distribution and it’s still up to you what to make out of it.
As I said, it’s complicated and, in particular, depends on the specifics of the belief you’re interested in.
I would expect it to be hard to get to high levels of certainty in beliefs of the type “It’s impossible to do X” unless there are e.g. obvious physical constraints.
Right, it’s definitely not a hard problem to calculate directly; I specifically chose not to do so, because I don’t think you need to run the numbers here to know roughly what they’ll look like. Specifically, this test shouldn’t yield even a 2:1 likelihood ratio for any specific P(heads):fair coin, and it’s only one standard deviation from the mean. Either way, it doesn’t give us much confidence that the coin isn’t fair.
Asking what is clear evidence sounds to me like asking what is hot water; it’s a quantitative thing which we describe with qualitative words. 55 heads is not very clear; 56 would be a little clearer; 100 heads is much clearer, but still not perfectly so.
Suppose the chance of finding a tiger somewhere in a given household, on a given day, is one in a billion. Or so say the pro-tigerians. The tiger denialist faction, of course, claims that statistic is made-up, and tigers don’t actually exist. But one household in a trillion might hallucinate a tiger, on any given day.
Today, you search your entire house—the dishwasher AND the fridge AND the trashcan etc.
P(You find a tiger|tigers exist) = .000000001
P(You don’t find a tiger|tigers don’t exist) = .000000000001
P(You don’t find a tiger|tigers exist) = .999999999
P(You don’t find a tiger|tigers don’t exist) = .999999999999
And suppose you are 99.9% confident that tigers exist—you think you could make statements like that a thousand times in a row, and be wrong only once. (Perhaps rattling off all the animals you know.) Your prior odds ratio is 999 to 1. So you take your prior odds, (.999/.001) and multiply by the likelihood ratio, (.999999999/.999999999999), to get a posterior odds ratio of 998.999999002 to 1. This is, clearly, a VERY small adjustment.
What if you search more households: how many would you have to search, without finding a tiger, before you dropped just to 90% confidence in tigers, where you still think tigers exist but would not willingly bet your life on it? If I’ve done the math right, about five billion. There probably aren’t that many households in the world, so searching every house would be insufficient to get you down to just 90% confidence, much less 10% or whatever threshold you’d like to use for “tigers probably don’t exist”.
(And my one-in-a-billion figure is probably far too high, and so searching every household in the world should get you even less adjustment...)
But if you could search a trillion houses at those odds, and still never found a tiger—then you’d be insane to still claim that tigers probably do exist.
And if a trillion searches can produce such a shift, then each individual search can’t produce no evidence. Just very little.
I’ve posted a comment that answers you here
Bayes’ Theorem implies that you can take the prior odds of the hypothesis A, or the ratio of its probability to the probability of its being false, A/a, and update that to take the evidence E into account by multiplying in the ratio of the probability of that evidence given A and given A: new odds = old odds * (E|A)/(E|a).
Play around with that until you see the truth of the claim you asked about. Note that A = 1-a.
Under the technical definition of “evidence”, yes. In practice, it’s a question of how likely you would be to have seen one by now if they were real.
Well, at some point the upper bound of consequences for being wrong multiplied by the likelihood that you expect to be wrong is so tiny that it’s worth less than the mental overhead of keeping track of the level of certainty. Like, I’m confident enough that physics works (to the level of everyday phenomena), that keeping track of what might happen if I’m wrong about that doesn’t add enough value to be worthwhile.
So it’s about a protocol for language instead?
No, I don’t think so. But I’m not sure how to elaborate without knowing why you thought that.
I’d just like to point out that even #1 of the OP’s “lessons” is far more problematic than they make it seem. Consider the statement:
“The fact that there are myths about Zeus is evidence that Zeus exists. Zeus’s existing would make it more likely for myths about him to arise, so the arising of myths about him must make it more likely that he exists.” (supposedly an argument of the form P(E | H) > P(E)).
So first, “Zeus’s existing would make it more likely for myths about him to arise”—more likely than what? Than “a priori”? This is essentially impossible to know, since to compute P(E) you must do P(E) = sum(i) { P(E|H[i])*P(H[i]) }, i.e. marginalise over a mutually exclusive set of hypotheses (and no “Zeus” and “not Zeus” does not help, because “not Zeus” is a compound hypothesis which you also need to marginalise over).
I will grant you that it may seem plausible to guess that the average P(E|H[i]) over all possible explanations for E is lower than P(E|Zeus) (since most of them are bad explanations), but since the average is weighted by the various priors P(H[i]), then if our background knowledge causes some high likelihood explanation for E (high P(E|H[i])) to dominates the average then P(E) may not be less than P(E|Zeus) even if P(E|Zeus) is relatively high! In which case E actually counts against the Zeus hypothesis, since P(H|E)<P(H) if P(E|H)<P(E).
Whether this is the case or not in the example is tough to say, (and of course is relative to the agents background knowledge), but I think it worth emphasising that it is not so easy as it seems.
The post on whether you’re “entitled” to evidence has always annoyed me a bit… what does “entitled” even mean? If the person you’re talking to isn’t updating on the evidence you’re giving them for some bad reason, what can you really do?
In I don’t know, Eliezer isn’t arguing that you shouldn’t say it, but that you shouldn’t think it:
I agree that “I don’t know” is useful. It’s the longer statement that I’m “banishing”.
Point by point take on this:
The evidence can be weak enough and/or be evidence for an immense number of other things besides what evidence claims it is evidence for, as to be impossible to process qualitatively. If there’s “no evidence”, the effect size is usually pretty small, much smaller than the filtering that the anecdotes pass through, much smaller than can be inferred qualitatively, etc.
2.
There’s great many things that you have never even thought of, and you know nothing about those things. They have no probabilities assigned, and worse, they work as if they had probability of zero. And you can’t avoid that, because there’s far more things you ought to enumerate than you can enumerate, by a very very huge factor.
Having heard of something leads to quite non-Bayesian change of the belief (effective zero to non-zero). In light of this, degrees of beliefs are not probabilities, but some sort of tag values attached to the propositions that were considered (a very small subset of the totality of propositions), tags which need to be processed in a manner as to arrive at most accurate choices in the end despite absence of processing of the vast majority of relevant beliefs. (A manner which does resemble probability theory to some extent)
Treating them more like probabilities will just lead to a larger final error, even though superficially the edit distance from your algorithm to some basic understanding of probability theory can seem smaller.
3.
Imposing thresholds on both beliefs and evidence in support of the beliefs allows you to compensate for and decrease the consequences of the unavoidable errors described above. The thresholds have been established after a very long history of inferring some completely wrong conclusions based on accumulation of evidence that was weaker than what can be usefully processed qualitatively but instead requires very accurate set up and quantitative calculations.
4.
Sometimes, and sometimes it’s really weak evidence that isn’t statistically independent from the belief that it ought to affect.
5.
But you threw away those that can not be easily demonstrated.
6.
Theorems of probability are not going to hold exactly for the optimum value that should be assigned to the beliefs in the light of what’s described in 2, and working as ff they do hold can not be expected to improve outcomes.
Keep in mind that you can reasonably expect that in the future great many things that you have never thought of may be brought to your attention, without being able to actually enumerate and process a significant fraction of them right now and then.
edit: improved that some. Also, many of those limitations would hold for any physically plausible Jupiter Brains, Matroshka Brains, or other such giant objects which, while they can process great many more beliefs than you can, are still stuck with processing only a minuscule fraction of the beliefs they ought to process.
edit2: interestingly, David Chapman touches on much same points.
You don’t need to enumerate beliefs to assign them nonzero probability. You can have a catch-all “stuff nothing like anything that’d ever even occur to me, unless it smacked me in the face” category, to which you can assign nonzero probability.
Those beliefs don’t propagate where they should, that’s the issue, and universe doesn’t care if you made an excuse to make it sound better. Those beliefs still have zero effect on inferences, and that’s what matters. And when you get some of that weak “evidence” such as your Zeus example it doesn’t go towards other hypotheses, but it goes towards Zeus, because the latter you have been prompted with.
Or when you process an anecdote, it would seem to me that with your qualitative Bayes you are going to tend to affect your belief about the conclusion too much and your belief about how the anecdote has been picked, too little (for contentious issues you expect anecdotes for both sides). Since you are doing everything qualitatively rather than quantitatively, that’s an approximation, and approximation that breaks down for what is normally not called “evidence”.
edit: I’d think, by the way, that a real deity and a made up deity would result in statistically different sets of myths, making a specific set of myths evidence either for or against a deity depending on the actual content of the myths. Just as a police report by the suspect where the suspect denies guilt can be either evidence against or for the guilt depending on what the suspect actually said and how it squares together with the other facts.
edit2: an analogy. Suppose you have a huge, enormous network of water pipes, or an electronic circuit. A lot of pipes, trillions. You want to find water flow in a specific point, or you want to find voltage at a spot. (Probability flows in an even more complicated manner than water in pipes or electricity through resistor networks, by the way, and numbers are larger than trillions). I am telling you that you aren’t considering a lot of pipes, they have effective flow of zero where they should have non-zero. You’re saying that no, you can have one thick pipe which is all the flows that you didn’t even consider—a pipe that aren’t really connected much to anything. As far as processing flows does, that does not even make any coherent sense.
Bayes theorem only works with as much information as you put into it. Humans can only ever be approximate Bayesian agents. If you learn about some proposition you never though of before it is not a failing of Bayesian reasoning, it is just that you learn you have been doing it wrong up until that point and have to recompute everything.
That doesn’t look useful to me.
By the same token, my mentioning here the name of the monster Ygafalkufeoinencfhncfc is evidence that it exists. Funnily enough, the same reasoning provides evidence for the monster Grrapoeiruvnenrcancaef and a VERY large number of other, ahem, creatures.
True.
No it doesn’t. Most of the creatures in that class have, in fact, not been mentioned by you or anyone else.
Also true.
If one of dem monsters exists, that would be evidence that more of dem monsters exist.
Realize then that a conclusion of “one of those monsters exists” is just assigning a high probability. It follows that just increasing the probability of “one of those monsters exists” also increases the probability ýou’d assign to more monsters of its class existing. It’s a continuous updating relationship, there’s no discontinuous jump in the belief in other monsters of the class which only occurs once you’re sure that one of them exists.
Compare this to seeing an alien-engineered kaiju and then being less surprised at Godzilla (even if that’s only in a neighboring class).
True. But then I can write a one-line Perl script which will bring into being evidence for a LOT of monsters.
Which itself brings into being the question of what kind of evidence the output of a RNG is. Or, perhaps, what kind of evidence does software produce.
I’d think that how much mentioning a monster updates the probability that it exists depends on the context of mentioning the monster. Furthermore, mentioning it in the context of examples of probability should score particularly low in this regard.
Do you mean that if I bothered to write a short story about how I met the monster Ygafalkufeoinencfhncfc while hiking in remote mountains and how all the locals were like “Oooh, there is something there but we don’t talk about it”—in such a case you’d update your probabilities higher?
P.S. Oh, an now we have TWICE the amount of evidence for Ygafalkufeoinencfhncfc compared to what we had a few minutes ago. Can we extrapolate the trend? :D
If you wrote a short story about a monster, there’s some chance that the monster exists, there are legends about it, one of those legends made its way to you, and you then chose to use it in your story. Thus, if I never heard of the monster, then you wrote such a story and I had no further information about your story-writing process, that would indeed increase my estimate that the monster exists by a miniscule amount.
If you wrote the story specifically to prove a point in a lesswrong discussion about monster names that are pulled out of thin air, that would be further information, so my estimate wouldn’t increase in the same way.
Moreover, it’s not twice the evidence anyway, since the story and the lesswrong post aren’t independent.
Note: It’s not a linear trend. If you mention the name one million times, it does not make it one million times as likely. Also, being mentioned one million times by the same person is not the same evidence as being mentioned by one million people, once by each. And even that is not a linear trend.
Nobody said it was. In any case, I was talking about the amount of evidence, not about what does that imply in terms of belief probabilities.
A “linear trend” in probabilities would have big issues, of course, because there are hard caps at zero and one.
We have some misunderstanding here. My best guess is that you think evidence makes something likely, while the typical usage here is that evidence makes things more likely. An example: Imagine that according to your best knowledge, some thing X has a probability 0.000001. Now you get some new information E, and based on all the knowledge you have now, the probability of X is 0.000001001.
On one hand, the information E increased the probability of X from 0.000001to 0.000001001. This is what we mean by saying that E is an evidence for X. It made it more likely.
On the other hand, even the increased probability is pretty close to zero. Therefore, more likely does not imply likely or even worth considering seriously (you can imagine even more zeroes before the first nonzero digit).
Similarly, probability of Ygafalkufeoinencfhncfc is pretty close to zero, but not exactly zero. Mentioning it on a discussion forum (choosing this specific topic instead of other millions of topic that could have been chosen) slightly increases the probability (at the cost of those other topics that were not chosen to be mentioned). The change is very small, but technically it is an increase. That’s why we call it evidence for Ygafalkufeoinencfhncfc.
I understand the argument. I don’t accept it.
No. We can’t extrapolate a trend. That’s what”You cannot expect that future evidence will sway you in a particular direction” means.
That there were myths about Troy and Mycenae was highly useful to Heinrich Schliemann who discovered them and proved their existence.
Troy and Mycenae were not mythical cities—they were described in many writings other than the Iliad. There wasn’t much doubt that they existed.
Oh, and the myth about Schliemann is wrong—see Wikipedia:
What? I don’t know of any writing that referred to Troy and Mycenae except the ones that directly related to the myths (I never said it was just Iliad of course, I mean the entire corpus of Greek myths).
We have a few Egyptian incriptions of the era that mention “Mukana” or some such, and perhaps a Hittite inscription that refers to some name similar sounding to Ilium, but those hardly count as “descriptions”—just a mention of a foreign city/country name without evidence to its location or anything really relating to it.
The evidence of the myths were pretty much the only evidence we had about Troy and Mycenae prior to the physical discovery of their remains.
First, the question of whether the Trojan war was fiction is different from the question of whether the city of Troy was real and actually existed. It seems to me that during the first half of the XIX century there were claims that the Trojan war didn’t actually happen but was just imagined by Homer—but I don’t think that the mainstream consensus of that time insisted that the city of Troy was invented by him as well.
In fact, if you look at ancient texts you’ll find Troy being mentioned and discussed by such people as Herodotus and Thucydides. It’s not that their word should be taken as gospel, but their writings aren’t usually called “myths”.
See e.g. http://riversfromeden.wordpress.com/2011/09/06/the-trojan-war-in-greek-historical-sources/ for more details.
All the texts described (Herodotus, Thucydides, etc) in your link only seem to discuss Troy in the context of the Trojan war which was itself known to the Greeks via the work of the myths passed down. So it seems strange to say that we knew Troy existed, but we doubted that the Trojan war was real.
Thucydides likewise mentions Mycenae—and he argued in favour of taking the poets’ words seriously about the past importance of Mycenae, though at his time no physical evidence of such remained (the location was by Thucydides’s time become mere insignificant villages).
Around the time of the Ancient Greeks and Romans, do you distinguish “myth” and “history” at all? It seems to me you’re calling everything without physical evidence a “myth”.