My impression from the Sequences seems to be that Eliezer considers Bayesianism to be a core element of rationality. Some people have even referred to the community as Bayesian Rationalists. I’ve always found it curious, like it seemed like more of a technicality most of the time. Why is Bayesianism important or why did Eliezer consider it important?
[Question] Why is Bayesianism important for rationality?
(Not speaking for Eliezer, obviously.) “Carefully adjusting one’s model of the world based on new observations” seems like the core idea behind Bayesianism in all its incarnations, and I’m not sure if there is much more to it than that. The stronger the evidence, the more signifiant the update, yada-yada. It seems important to rational thinking because we all tend to fall into the trap of either ignoring evidence we don’t like or being overly gullible when something sounds impressive. Not that it helps a lot, way too many “rationalists” uncritically accept the local egregores and defend them like a religion. But the allegiance to an ingroup is emotionally stronger than logic, so we sometimes confuse rationality with rationalization. Still, relative to many other ingroups this one is not bad, so maybe Bayesianism does its thing.
Egregores?
From https://en.wikipedia.org/wiki/Egregore#Contemporary_usage
a kind of group mind that is created when people consciously come together for a common purpose
I am new to this stuff but did we not have like 200 years of observations about Newton’s theories? How would have a Bayesian adjusted their models here? I use this example as a “we now know better” - Is it the “new” observation that is key?
In my opinion, “a rationalism” (IE, a set of memes designed for an intellectual community focused on the topic of clear thinking itself) requires a few components to work.
It requires a story in which more is possible (as compared with “how you would reason otherwise” or “how other people reason” or such).
The first component is an overarching theory of reasoning. This is a framework in which we can understand what reason is, analyze good and bad reasoning, and offer advice about reasoning.
The second component is an account of how this is not the default. If there is a simple notion of good reasoning, but also everyone is already quite good at that, then there is not a strong motivation to learn about it, practice it, or form a community around it.
The sequences told a story in which the first role was mostly played by a form of Bayesianism, and the second was mostly played by the heuristics and biases literature. The LessWrong memeplex has evolved somewhat over time, including forming some distinct subdivisions with slightly different answers to those two questions.
Most notably, I think CFAR has changed its ideas about these two components quite a bit. One version I heard once: the sequences might give you the impression that people are overall pretty bad at Bayesian reasoning, and the best way to become more rational is to specifically de-bias yourself by training Bayesian reasoning and un-training all the known biases or coming up with ways to compensate for them. Initially, this was the vision of CFAR as well. But what CFAR found was that humans are actually really really good at Bayesian reasoning, when other psychological factors are not getting in the way. So CFAR pivoted to a model more focused on removing blockers rather than increasing basic reasoning skills.
Note that this is a different answer to the second question, but keeps Bayesianism as the overarching theory of rationality. (Also keep in mind that this is, quite probably, a pretty bad summary of how views have changed since the beginning of CFAR.)
Eliezer has now written Inadequate Equilibria, which offers a significantly different version of the second component. I could understand starting there and getting an impression of what’s important about rationalism which is quite distant from Bayesianism: there, the primary story about rationality is social blockers, to which the primary antidote is thinking for yourself rather than going with the crowd. Why is Bayesianism important for that? Well, the answer is that Bayesianism offers a nuts-and-bolts theory of how to think. You need some such theory in order to ground attempts at self-improvement (otherwise you run the risk of making haphazard changes without any standard by which to judge whether you are thinking better/worse). But the quality of the theory has a significant bearing on how well the self-improvement will turn out!
I think it’s important that the overarching reasoning was some form of probabilism for “obvious” reasons I won’t go into.
I think it was important that it was Bayesianism in particular for a few reasons, some better than others.
Bayesianism allows probabilism to be applied in the most broad way. Frequentist and propensity interpretations of probability both hold that it’s inappropriate to apply probabilistic judgement in hypothesis testing. This makes it much more difficult to apply lessons from probabilistic reasoning, since you’re being restricted in where to apply them. (Of course, if that restriction were appropriate then it would be better to avoid applying the lessons of probability...)
Although vanilla Bayesianism is subjectivist about the prior, it offers a completely objective story about how reasoning should go once we’ve fixed the prior. I recently argued against this aspect of classical Bayesianism. However, I can see how this was an advantage in terms of memetics—a totally objective story for this part makes for strong dividing lines between correct and incorrect reasoning.
The addition of algorithmic information theory also offers a “more objective” story about the prior.
As I have recently argued, classical Bayesianism ends up sidelining some important “frequentist” properties, which we should also want. So, to an extent, my current perspective is a hybrid of Bayesianism and frequentism. But given a choice between the two, it seems much better that I started out Bayesian and had to figure out how to integrate frequentist ideas, rather than the other way around.
What’s the source for that claim? Is that a public position of CFAR?
No. It is something somebody said to me once. (Possibly in a context where I’m supposed to anonymize the source—I don’t remember for sure.) I’m sure there are a lot of other complexities to CFAR’s history, and a lot of different summaries one could give. And maybe this particular summary is actually really bad for some reason I’m not aware of.
See Eliezer’s post Beautiful Probability, and Yvain on ‘probabilism’; there’s a core disagreement about what sort of knowledge is possible, and unless you’re thinking about things in Bayesian terms, you will get hopelessly confused.
Also, Yvain’s article really helped clarify how Bayesianism is important, although maybe I’d call it more probabilistic reasoning more than anything else
Seems to me that there is a disagreement on how the “probabilistic reasoning” is done correctly, and Bayesianism is one of the possible answers.
The other is frequentism, which could be simplified (strawmanned?) as “the situation must happen many times, and then ‘probability’ is the frequency of this specific outcome given the situation”. Which is nice if the situation indeed happens often, but kinda useless if the situation happens very rarely, or in extreme case, this is the first time it happened. In those cases, frequentism still provides a few tricks, but there doesn’t seem to be a coherent story behind them, and I think that in a few cases different tricks provide different answers, with no standard way to choose.
More technically, Bayesianism admits that one always starts with some “prior belief”, and then only “updates” on it based on evidence. Which of course invites the questioning of how the prior belief was obtained—and this is outside the scope of Bayesianism. However, many frequentist tricks can be interpreted as making a Bayesian update upon an unspoken prior belief (for example, a belief that all unknown probabilities follow a uniform distribution). So Bayesianism provides a unifying framework for the dozen tricks and exposes their unspoken assumptions.
But I am not an expert, so this is just my impression from having read something about it.
Okay, but why is this important?
First, it is an example of a universal law. That is an important part of the “Less Wrong mindset”; once you get it, then the idea that “probability = dozen unrelated tricks with no underlying system” will seem completely crazy; almost believing that the laws of physics only apply while you are in the lab studing them, but stop applying when you take off your white cloak and go home (and then you are free to believe in religion, homeopathy, or whatever).
Second, some people update incorrectly (too little or too much) and this is to explain why they make the mistake and what should be done instead. Probably the most important thing is that if the prior belief is much stronger than the evidence, you should not update too much. For example, if there is a disease people have with probability 1:1000000, and a test tells you that you have it, but the test provides a wrong result in 10% of cases, that still means the probability of you having the disease is only 1:100000 (ten times higher than before, but still very small). Some people instead go “well, if the test is wrong in 10% of cases, that means 90% probability I have the disease”. Many people make this mistake, including doctors who actually use tests like this. It is already difficult to teach people that “X implies Y” is not the same as “Y implies X”, and it becomes almost impossible when probabilites are involved: “X with probability P implies Y” versus “Y with probability P implies X”. Yet another instance this happens is scientific journals. How can you have journals full of research with p<0.05 and yet most of it fails to replicate? The answer is obvious, if you understand Bayesianism.
That definition has the advantage of defining probability as something that’s objective while the Bayesian definition depends on the prior beliefs of a particular person and is subjective.
Sometimes the subjectivity comes back in the form of choosing the proper reference class.
If I flip a coin, should our calculation include all coins that were ever flipped, or only coins that were flipped by me, or perhaps only the coins that I flipped on the same day of week...?
Intuitively, sometimes the narrower definitions are better (maybe a specific type of coin produces unusual outcomes), but the more specific you get, the fewer examples you find.
That’s important. Bayes and Frequentism are not just different ways of doing calculations, they also make different implications about what probability is.
Universal law has its challengers: eg. Nancy Cartwright’s How The Laws of Physics Lie.
It’s also not clear that Bayes has that much to do with physics. Most Bayesians would say that you should still use Bayes if you find yourself in a different universe.
It’s fairly standard in the mainstream to say that frequentism is suitable for some purposes, Bayes for others. Is the mainstream crazy?
Haven’t read the book, so I looked at some reviews, and… it seems to me that there two different questions:
a) Are there universal laws in math and physics? (Yes.)
b) Are the consequences of such laws trivial? (No.)
So we seem to have two groups of people talking past each other, when one group says “there is a unifying principle behind this all, it’s not just an arbitrary hodgepodge of tricks all the way down”, and the other group says “but calculating everything from the first principles is difficult, often impossible, and my tricks work, so what’s your problem”.
To simplify it a lot, it’s like one person saying “multiplying by 10 is really simple, you just add a zero to the end” and another person says “the laws of multiplication are the same for all numbers, 10 is not a separate magisterium”. Both of them are right. It is very useful to be able to multiply by 10 quickly. But if your students start to believe that multiplication by 10 follows separate laws of math, something is seriously wrong. (Especially if they sometimes happen to apply the rule like this: “2.0 × 10 = 2.00”. At that moment you should realize they were just following the motions, even if they got their previous 999 calculations right.) Using tricks is okay, if you understand why they work. Believing it is arbitrary tricks all the way down is not.
Bayesians don’t say “the frequentist tricks don’t work”. They say “they work, because they are simplifications of a more general principle, and by the way these are their unstated assumptions, so of course if you apply two tricks using different assumptions, you might get two different results”. But that doesn’t mean one shouldn’t know or shouldn’t use the tricks.
Also, looking at another review...
...yeah, of course. Classical Less Wrong topic.
I see absolutely no problem with this. The laws may be simple, their consequences complex.
To be more precise—although this comment is already too long—it would make sense to distinguish two kinds of “laws”. I don’t know if there is already a name for this. Some laws are simply “generalizations of observations”. You observe thousand white sheep, you conclude “all sheep are white”. Then you see a black sheep. Oops! But there is another approach, which goes something like “imagine that this world is a simulation; what would be the rules of the simulation so that they would produce the kind of outcomes we observe”. Simulation here is only a metaphor; Einstein would use the metaphor of understanding God’s mind, etc. The idea is to think which underlying principles could be responsible for what we see, as opposed to merely noticing the trends in what we see.
And yes, it works differently in math and in physics; physics tries to describe a given existing universe, math is kinda its own map and territory at the same time. But in both cases, there is this idea of looking for the underlying principles, whether those are universal laws in physics or axioms in math, as opposed to merely collecting stamps (which is also a useful thing to do).
No.
The argument against universal laws in physics is based on the fact that they use ceteris paribus clauses. You said it was ridiculous for different laws to hold outside the laboratory, but CP is only guaranteed inside the laboratory: the first rule of experimentation is to change only one thing per experiment, thus enforcing CP artificially.
As for maths, there are disputes about proof by contradiction (intuitionism) , the axiom of choice and so.
There is a difference between “the law applies randomly” and “multiple laws apply, you need to sum their effects”.
If you say “if one apple costs 10 cents, then three apples cost 30 cents”, the rule is not refuted by saying “but I bought three apples and a cola, and I paid 80 cents”. The law of gravity does not stop being universal just because the ball stops falling downwards after I kick it.
The way “laws” combine is much more complex than simple summation. If it were that simple, we would already have a TOE.
But it’s worse than that. There’s a difference between being able use shortcuts, and having to. And there’s a difference between the shortcut resulting in the same answer, and the shortcut being an approximation.
Since Bayes is uncomputable in the general case, cognitively limited agents have to use heuristic replacements instead. That means Bayes isn’t important in practice, unless you forget about the maths and focus on non- fquantitative maxims, as has happened.
Cognitively limited agents include AIs. At one time, lesswrong believed that Bayes underpinned decision theory, decision theory underpinned rationality, and some combination of decision theory and Bayes could be used to predict the behaviour of ASIs.
Edit:
(Which to is to say that they disbelieved in the simple argument that agents cannot predict more complex agents, in general). But if an agent is using heuristics to overcome it’s computational limitations, you can’t predict it using pure Bayes, even assuming you somehow don’t have computation limitations, because heuristics give different and worse answers. That is, you can’t predict it as a black box and would need to know it’s code.
So Bayes isn’t useful for the two things it was believed to be useful for, so whats left is basically a philosophical claim ,that Bayes subsumes frequentism, so that frequentism is not really rivalrous. But Bayes itself is subsumed by radical probabilism, which is more general still!
I think “probabilistic reasoning” doesn’t quite point at the thing; it’s about what type signature knowledge should have, and what functions you can call on it. (This is a short version of Viliam’s reply, I think.)
To elaborate, it’s different to say “sometimes you should do X” and “this is the ideal”. Like, sometimes I do proofs by contradiction, but not every proof is a proof by contradiction, and so it’s just a methodology; but the idea of ‘doing proofs’ is foundational to mathematics / could be seen as one definition of ‘what mathematical knowledge is.’
The example in Beautiful Probability seems underdefined. It doesn’t specify whether whether or not the second had the option of stopping before N=100.
I don’t know how many people, if any, are actually going around in daily life trying to assign or calculate probabilities (conditional or otherwise) or directly apply Bayes’ theorem. However, there are core insights that come from learning to think about probability theory coherently that are extremely non-obvious to almost everyone, and require deliberate practice. This includes seemingly simple things like “Mathematical theorems hold whether or not you understand them,” “Questions of truth and probability have right answers, and if you get the wrong answers you’ll fail to make optimal decisions,” or “It’s valuable, psychologically and for interpersonal communication, to be able to assign numerical estimates of your confidence in various beliefs or hypotheses.” Other more subtle ones like “it is fundamentally impossible to be 100% certain of anything” are also important, and *much* harder to explain to people who aren’t aware of the math that defines the relevant terms.
My day job as a research analyst involves making a lot of estimates about a lot of things based on fairly loose and imprecise evidence. In recent years I’ve been involved in helping train a lot of my coworkers. I find myself paraphrasing ideas from the Sequences constantly (recommending people read them has been less helpful; most won’t, and in any case transfer of learning is hard). I notice that their writing, speaking, and thinking become a lot more precise, with fewer mistakes and impossibilities, when I ask them to try doing simple mental exercises like “In your head, assign a probability estimate to everything you claim will happen or think is true now, and add appropriate “likeliness” quantifiers to your sentences based on that.”
Also, I’ve had multiple people tell me that they won’t, or even literally can’t, make numerical assumptions and estimates without numerical data to back them up, sometimes with very strict ideas about what counts as data. The fact they their colleagues manage to make such assumptions and get useful answers isn’t enough to persuade them otherwise. Math is often more likely to get through to such people.
I wrote a LessWrong post that addressed this: What Bayesianism Taught Me