Also, Yvain’s article really helped clarify how Bayesianism is important, although maybe I’d call it more probabilistic reasoning more than anything else
Seems to me that there is a disagreement on how the “probabilistic reasoning” is done correctly, and Bayesianism is one of the possible answers.
The other is frequentism, which could be simplified (strawmanned?) as “the situation must happen many times, and then ‘probability’ is the frequency of this specific outcome given the situation”. Which is nice if the situation indeed happens often, but kinda useless if the situation happens very rarely, or in extreme case, this is the first time it happened. In those cases, frequentism still provides a few tricks, but there doesn’t seem to be a coherent story behind them, and I think that in a few cases different tricks provide different answers, with no standard way to choose.
More technically, Bayesianism admits that one always starts with some “prior belief”, and then only “updates” on it based on evidence. Which of course invites the questioning of how the prior belief was obtained—and this is outside the scope of Bayesianism. However, many frequentist tricks can be interpreted as making a Bayesian update upon an unspoken prior belief (for example, a belief that all unknown probabilities follow a uniform distribution). So Bayesianism provides a unifying framework for the dozen tricks and exposes their unspoken assumptions.
But I am not an expert, so this is just my impression from having read something about it.
Okay, but why is this important?
First, it is an example of a universal law. That is an important part of the “Less Wrong mindset”; once you get it, then the idea that “probability = dozen unrelated tricks with no underlying system” will seem completely crazy; almost believing that the laws of physics only apply while you are in the lab studing them, but stop applying when you take off your white cloak and go home (and then you are free to believe in religion, homeopathy, or whatever).
Second, some people update incorrectly (too little or too much) and this is to explain why they make the mistake and what should be done instead. Probably the most important thing is that if the prior belief is much stronger than the evidence, you should not update too much. For example, if there is a disease people have with probability 1:1000000, and a test tells you that you have it, but the test provides a wrong result in 10% of cases, that still means the probability of you having the disease is only 1:100000 (ten times higher than before, but still very small). Some people instead go “well, if the test is wrong in 10% of cases, that means 90% probability I have the disease”. Many people make this mistake, including doctors who actually use tests like this. It is already difficult to teach people that “X implies Y” is not the same as “Y implies X”, and it becomes almost impossible when probabilites are involved: “X with probability P implies Y” versus “Y with probability P implies X”. Yet another instance this happens is scientific journals. How can you have journals full of research with p<0.05 and yet most of it fails to replicate? The answer is obvious, if you understand Bayesianism.
The other is frequentism, which could be simplified (strawmanned?) as “the situation must happen many times, and then ‘probability’ is the frequency of this specific outcome given the situation”.
That definition has the advantage of defining probability as something that’s objective while the Bayesian definition depends on the prior beliefs of a particular person and is subjective.
Sometimes the subjectivity comes back in the form of choosing the proper reference class.
If I flip a coin, should our calculation include all coins that were ever flipped, or only coins that were flipped by me, or perhaps only the coins that I flipped on the same day of week...?
Intuitively, sometimes the narrower definitions are better (maybe a specific type of coin produces unusual outcomes), but the more specific you get, the fewer examples you find.
That’s important. Bayes and Frequentism are not just different ways of doing calculations, they also make different implications about what probability is.
First, it is an example of a universal law. That is an important part of the “Less Wrong mindset”; once you get it, then the idea that “probability = dozen unrelated tricks with no underlying system” will seem completely crazy
Universal law has its challengers: eg. Nancy Cartwright’s How The Laws of Physics Lie.
It’s also not clear that Bayes has that much to do with physics. Most Bayesians would say that you should still use Bayes if you find yourself in a different universe.
It’s fairly standard in the mainstream to say that frequentism is suitable for some purposes, Bayes for others. Is the mainstream crazy?
Haven’t read the book, so I looked at some reviews, and… it seems to me that there two different questions:
a) Are there universal laws in math and physics? (Yes.)
b) Are the consequences of such laws trivial? (No.)
So we seem to have two groups of people talking past each other, when one group says “there is a unifying principle behind this all, it’s not just an arbitrary hodgepodge of tricks all the way down”, and the other group says “but calculating everything from the first principles is difficult, often impossible, and my tricks work, so what’s your problem”.
To simplify it a lot, it’s like one person saying “multiplying by 10 is really simple, you just add a zero to the end” and another person says “the laws of multiplication are the same for all numbers, 10 is not a separate magisterium”. Both of them are right. It is very useful to be able to multiply by 10 quickly. But if your students start to believe that multiplication by 10 follows separate laws of math, something is seriously wrong. (Especially if they sometimes happen to apply the rule like this: “2.0 × 10 = 2.00”. At that moment you should realize they were just following the motions, even if they got their previous 999 calculations right.) Using tricks is okay, if you understand why they work. Believing it is arbitrary tricks all the way down is not.
Bayesians don’t say “the frequentist tricks don’t work”. They say “they work, because they are simplifications of a more general principle, and by the way these are their unstated assumptions, so of course if you apply two tricks using different assumptions, you might get two different results”. But that doesn’t mean one shouldn’t know or shouldn’t use the tricks.
Also, looking at another review...
She strongly objects, for example, to the idea that consciousness is somehow intimately involved in the measurement process
...yeah, of course. Classical Less Wrong topic.
Each law thus comes with a ceteris paribus (all else being equal) clause attached. So, for example, the ideal gas law tells us how pressure, volume, and temperature are related, but it is reliable only for closed systems. When she says that such laws are not very useful she means something quite specific, namely, that such a law is, by itself, almost useless for understanding p-t-v relationships in open systems, like the Earth’s atmosphere, where all else is NOT equal. Such laws are extremely useful as foundational concepts in our abstract understanding of how the universe works, but it can take years, or decades, or even centuries after the discovery of a law for engineers and technologists to figure out how to cash out all of the “all else being equal” clauses in the real situations where the laws operate. For example, the central laws governing fusion in plasma are pretty well understood, but turning that understanding into an operating fusion generator is proving extremely difficult.
the “law of gravity” is a great example: No two bodies REALLY interact SOLELY in accordance with the “law of gravity.” In the real world, electromagnetic forces, inertia, gravitation from other bodies, and a host of other forces are at play—and you must “correct” for those other forces, Luc, if you want to land safely on Mars.
I see absolutely no problem with this. The laws may be simple, their consequences complex.
To be more precise—although this comment is already too long—it would make sense to distinguish two kinds of “laws”. I don’t know if there is already a name for this. Some laws are simply “generalizations of observations”. You observe thousand white sheep, you conclude “all sheep are white”. Then you see a black sheep. Oops! But there is another approach, which goes something like “imagine that this world is a simulation; what would be the rules of the simulation so that they would produce the kind of outcomes we observe”. Simulation here is only a metaphor; Einstein would use the metaphor of understanding God’s mind, etc. The idea is to think which underlying principles could be responsible for what we see, as opposed to merely noticing the trends in what we see.
And yes, it works differently in math and in physics; physics tries to describe a given existing universe, math is kinda its own map and territory at the same time. But in both cases, there is this idea of looking for the underlying principles, whether those are universal laws in physics or axioms in math, as opposed to merely collecting stamps (which is also a useful thing to do).
Are there universal laws in math and physics? (Yes.)
No.
The argument against universal laws in physics is based on the fact that they use ceteris paribus clauses. You said it was ridiculous for different laws to hold outside the laboratory, but CP is only guaranteed inside the laboratory: the first rule of experimentation is to change only one thing per experiment, thus enforcing CP artificially.
As for maths, there are disputes about proof by contradiction (intuitionism) , the axiom of choice and so.
There is a difference between “the law applies randomly” and “multiple laws apply, you need to sum their effects”.
If you say “if one apple costs 10 cents, then three apples cost 30 cents”, the rule is not refuted by saying “but I bought three apples and a cola, and I paid 80 cents”. The law of gravity does not stop being universal just because the ball stops falling downwards after I kick it.
To simplify it a lot, it’s like one person saying “multiplying by 10 is really simple, you just add a zero to the end” and another person says “the laws of multiplication are the same for all numbers, 10 is not a separate magisterium”. Both of them are right. It is very useful to be able to multiply by 10 quickly.
But it’s worse than that. There’s a difference between being able use shortcuts, and having to. And there’s a difference between the shortcut resulting in the same answer, and the shortcut being an approximation.
Since Bayes is uncomputable in the general case, cognitively limited agents have to use heuristic replacements instead. That means Bayes isn’t important in practice, unless you forget about the maths and focus on non-
fquantitative maxims, as has happened.
Cognitively limited agents include AIs. At one time, lesswrong believed that Bayes underpinned decision theory, decision theory underpinned rationality,
and some combination of decision theory and Bayes could be used to predict the behaviour of ASIs.
Edit:
(Which to is to say that they disbelieved in the simple argument that agents cannot predict more complex agents, in general). But if an agent is using heuristics to overcome it’s computational limitations, you can’t predict it using pure Bayes, even assuming you somehow don’t have computation limitations, because heuristics give different and worse answers. That is, you can’t predict it as a black box and would need to know it’s code.
So Bayes isn’t useful for the two things it was believed to be useful for, so whats left is basically a philosophical claim ,that Bayes subsumes frequentism, so that frequentism is not really rivalrous. But Bayes itself is subsumed by radical probabilism, which is more general still!
I think “probabilistic reasoning” doesn’t quite point at the thing; it’s about what type signature knowledge should have, and what functions you can call on it. (This is a short version of Viliam’s reply, I think.)
To elaborate, it’s different to say “sometimes you should do X” and “this is the ideal”. Like, sometimes I do proofs by contradiction, but not every proof is a proof by contradiction, and so it’s just a methodology; but the idea of ‘doing proofs’ is foundational to mathematics / could be seen as one definition of ‘what mathematical knowledge is.’
Also, Yvain’s article really helped clarify how Bayesianism is important, although maybe I’d call it more probabilistic reasoning more than anything else
Seems to me that there is a disagreement on how the “probabilistic reasoning” is done correctly, and Bayesianism is one of the possible answers.
The other is frequentism, which could be simplified (strawmanned?) as “the situation must happen many times, and then ‘probability’ is the frequency of this specific outcome given the situation”. Which is nice if the situation indeed happens often, but kinda useless if the situation happens very rarely, or in extreme case, this is the first time it happened. In those cases, frequentism still provides a few tricks, but there doesn’t seem to be a coherent story behind them, and I think that in a few cases different tricks provide different answers, with no standard way to choose.
More technically, Bayesianism admits that one always starts with some “prior belief”, and then only “updates” on it based on evidence. Which of course invites the questioning of how the prior belief was obtained—and this is outside the scope of Bayesianism. However, many frequentist tricks can be interpreted as making a Bayesian update upon an unspoken prior belief (for example, a belief that all unknown probabilities follow a uniform distribution). So Bayesianism provides a unifying framework for the dozen tricks and exposes their unspoken assumptions.
But I am not an expert, so this is just my impression from having read something about it.
Okay, but why is this important?
First, it is an example of a universal law. That is an important part of the “Less Wrong mindset”; once you get it, then the idea that “probability = dozen unrelated tricks with no underlying system” will seem completely crazy; almost believing that the laws of physics only apply while you are in the lab studing them, but stop applying when you take off your white cloak and go home (and then you are free to believe in religion, homeopathy, or whatever).
Second, some people update incorrectly (too little or too much) and this is to explain why they make the mistake and what should be done instead. Probably the most important thing is that if the prior belief is much stronger than the evidence, you should not update too much. For example, if there is a disease people have with probability 1:1000000, and a test tells you that you have it, but the test provides a wrong result in 10% of cases, that still means the probability of you having the disease is only 1:100000 (ten times higher than before, but still very small). Some people instead go “well, if the test is wrong in 10% of cases, that means 90% probability I have the disease”. Many people make this mistake, including doctors who actually use tests like this. It is already difficult to teach people that “X implies Y” is not the same as “Y implies X”, and it becomes almost impossible when probabilites are involved: “X with probability P implies Y” versus “Y with probability P implies X”. Yet another instance this happens is scientific journals. How can you have journals full of research with p<0.05 and yet most of it fails to replicate? The answer is obvious, if you understand Bayesianism.
That definition has the advantage of defining probability as something that’s objective while the Bayesian definition depends on the prior beliefs of a particular person and is subjective.
Sometimes the subjectivity comes back in the form of choosing the proper reference class.
If I flip a coin, should our calculation include all coins that were ever flipped, or only coins that were flipped by me, or perhaps only the coins that I flipped on the same day of week...?
Intuitively, sometimes the narrower definitions are better (maybe a specific type of coin produces unusual outcomes), but the more specific you get, the fewer examples you find.
That’s important. Bayes and Frequentism are not just different ways of doing calculations, they also make different implications about what probability is.
Universal law has its challengers: eg. Nancy Cartwright’s How The Laws of Physics Lie.
It’s also not clear that Bayes has that much to do with physics. Most Bayesians would say that you should still use Bayes if you find yourself in a different universe.
It’s fairly standard in the mainstream to say that frequentism is suitable for some purposes, Bayes for others. Is the mainstream crazy?
Haven’t read the book, so I looked at some reviews, and… it seems to me that there two different questions:
a) Are there universal laws in math and physics? (Yes.)
b) Are the consequences of such laws trivial? (No.)
So we seem to have two groups of people talking past each other, when one group says “there is a unifying principle behind this all, it’s not just an arbitrary hodgepodge of tricks all the way down”, and the other group says “but calculating everything from the first principles is difficult, often impossible, and my tricks work, so what’s your problem”.
To simplify it a lot, it’s like one person saying “multiplying by 10 is really simple, you just add a zero to the end” and another person says “the laws of multiplication are the same for all numbers, 10 is not a separate magisterium”. Both of them are right. It is very useful to be able to multiply by 10 quickly. But if your students start to believe that multiplication by 10 follows separate laws of math, something is seriously wrong. (Especially if they sometimes happen to apply the rule like this: “2.0 × 10 = 2.00”. At that moment you should realize they were just following the motions, even if they got their previous 999 calculations right.) Using tricks is okay, if you understand why they work. Believing it is arbitrary tricks all the way down is not.
Bayesians don’t say “the frequentist tricks don’t work”. They say “they work, because they are simplifications of a more general principle, and by the way these are their unstated assumptions, so of course if you apply two tricks using different assumptions, you might get two different results”. But that doesn’t mean one shouldn’t know or shouldn’t use the tricks.
Also, looking at another review...
...yeah, of course. Classical Less Wrong topic.
I see absolutely no problem with this. The laws may be simple, their consequences complex.
To be more precise—although this comment is already too long—it would make sense to distinguish two kinds of “laws”. I don’t know if there is already a name for this. Some laws are simply “generalizations of observations”. You observe thousand white sheep, you conclude “all sheep are white”. Then you see a black sheep. Oops! But there is another approach, which goes something like “imagine that this world is a simulation; what would be the rules of the simulation so that they would produce the kind of outcomes we observe”. Simulation here is only a metaphor; Einstein would use the metaphor of understanding God’s mind, etc. The idea is to think which underlying principles could be responsible for what we see, as opposed to merely noticing the trends in what we see.
And yes, it works differently in math and in physics; physics tries to describe a given existing universe, math is kinda its own map and territory at the same time. But in both cases, there is this idea of looking for the underlying principles, whether those are universal laws in physics or axioms in math, as opposed to merely collecting stamps (which is also a useful thing to do).
No.
The argument against universal laws in physics is based on the fact that they use ceteris paribus clauses. You said it was ridiculous for different laws to hold outside the laboratory, but CP is only guaranteed inside the laboratory: the first rule of experimentation is to change only one thing per experiment, thus enforcing CP artificially.
As for maths, there are disputes about proof by contradiction (intuitionism) , the axiom of choice and so.
There is a difference between “the law applies randomly” and “multiple laws apply, you need to sum their effects”.
If you say “if one apple costs 10 cents, then three apples cost 30 cents”, the rule is not refuted by saying “but I bought three apples and a cola, and I paid 80 cents”. The law of gravity does not stop being universal just because the ball stops falling downwards after I kick it.
The way “laws” combine is much more complex than simple summation. If it were that simple, we would already have a TOE.
But it’s worse than that. There’s a difference between being able use shortcuts, and having to. And there’s a difference between the shortcut resulting in the same answer, and the shortcut being an approximation.
Since Bayes is uncomputable in the general case, cognitively limited agents have to use heuristic replacements instead. That means Bayes isn’t important in practice, unless you forget about the maths and focus on non- fquantitative maxims, as has happened.
Cognitively limited agents include AIs. At one time, lesswrong believed that Bayes underpinned decision theory, decision theory underpinned rationality, and some combination of decision theory and Bayes could be used to predict the behaviour of ASIs.
Edit:
(Which to is to say that they disbelieved in the simple argument that agents cannot predict more complex agents, in general). But if an agent is using heuristics to overcome it’s computational limitations, you can’t predict it using pure Bayes, even assuming you somehow don’t have computation limitations, because heuristics give different and worse answers. That is, you can’t predict it as a black box and would need to know it’s code.
So Bayes isn’t useful for the two things it was believed to be useful for, so whats left is basically a philosophical claim ,that Bayes subsumes frequentism, so that frequentism is not really rivalrous. But Bayes itself is subsumed by radical probabilism, which is more general still!
I think “probabilistic reasoning” doesn’t quite point at the thing; it’s about what type signature knowledge should have, and what functions you can call on it. (This is a short version of Viliam’s reply, I think.)
To elaborate, it’s different to say “sometimes you should do X” and “this is the ideal”. Like, sometimes I do proofs by contradiction, but not every proof is a proof by contradiction, and so it’s just a methodology; but the idea of ‘doing proofs’ is foundational to mathematics / could be seen as one definition of ‘what mathematical knowledge is.’