In a bayesian rationalist view of the world, we assign probabilities to statements based on how likely we think they are to be true. But truth is a matter of degree, as Asimov points out. In other words, all models are wrong, but some are less wrong than others.
Consider, for example, the claim that evolution selects for reproductive fitness. Well, this is mostly true, but there’s also sometimes group selection, and the claim doesn’t distinguish between a gene-level view and an individual-level view, and so on...
So just assigning it a single probability seems inadequate. Instead, we could assign a probability distribution over its degree of correctness. But because degree of correctness is such a fuzzy concept, it’d be pretty hard to connect this distribution back to observations.
Or perhaps the distinction between truth and falsehood is sufficiently clear-cut in most everyday situations for this not to be a problem. But questions about complex systems (including, say, human thoughts and emotions) are messy enough that I expect the difference between “mostly true” and “entirely true” to often be significant.
Has this been discussed before? Given Less Wrong’s name, I’d be surprised if not, but I don’t think I’ve stumbled across it.
This feels generally related to the problems covered in Scott and Abram’s research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don’t want to look it up):
In order to be a proper bayesian agent, a single hypothesis you formulate is as big and complicated as a full universe that includes yourself
I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.
This seems reasonable, thanks. But I note that “in order to actually think about anything you have to somehow move beyond naive bayesianism” is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says “be bayesian”.
One possible answer is “no, because logical induction fixes the problem”. My uninformed guess is that this doesn’t work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to “be bayesian” justified?
(Also, for historical reasons, I’d be interested in knowing when you started believing this.)
I think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much.
In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn’t actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot.
My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that.
rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal
Which ones? In Against Strong Bayesianism I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I’m not sure how we distinguish what’s ruled out from what isn’t.
which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot
Can you give an example of a real-world problem where logical uncertainty doesn’t matter a lot, given that without logical uncertainty, we’d have solved all of mathematics and considered all the best possible theories in every other domain?
I think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way.
Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people’s votes as roughly independent, so if you have guesses on how different people are likely to vote, it’s pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that.
It’s clear that there is some simplification going on here, by assigning static probabilities for people’s vote behavior, treating them as independent (though modeling some subset of independence wouldn’t be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers.
(Note, I haven’t actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread)
So I’m not sure how we distinguish what’s ruled out from what isn’t.
Well, it’s obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles.
There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for “Bayesian interpretations of regression” for a number of examples.
Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like “given these assumption, the bayesian solution is the ideal one, and you can’t perform better than this, and if you put in the computational effort you will actually achieve this performance”.
Hmmm, but what does this give us? He talks about the difference between vague theories and technical theories, but then says that we can use a scoring rule to change the probabilities we assign to each type of theory.
But my question is still: when you increase your credence in a vague theory, what are you increasing your credence about? That the theory is true?
Nor can we say that it’s about picking the “best theory” out of the ones we have, since different theories may overlap partially.
If we can quantify how good a theory is at making accurate predictions (or rather, quantify a combination of accuracy and simplicity), that gives us a sense in which some theories are “better” (less wrong) than others, without needing theories to be “true”.
In a bayesian rationalist view of the world, we assign probabilities to statements based on how likely we think they are to be true. But truth is a matter of degree, as Asimov points out. In other words, all models are wrong, but some are less wrong than others.
Consider, for example, the claim that evolution selects for reproductive fitness. Well, this is mostly true, but there’s also sometimes group selection, and the claim doesn’t distinguish between a gene-level view and an individual-level view, and so on...
So just assigning it a single probability seems inadequate. Instead, we could assign a probability distribution over its degree of correctness. But because degree of correctness is such a fuzzy concept, it’d be pretty hard to connect this distribution back to observations.
Or perhaps the distinction between truth and falsehood is sufficiently clear-cut in most everyday situations for this not to be a problem. But questions about complex systems (including, say, human thoughts and emotions) are messy enough that I expect the difference between “mostly true” and “entirely true” to often be significant.
Has this been discussed before? Given Less Wrong’s name, I’d be surprised if not, but I don’t think I’ve stumbled across it.
This feels generally related to the problems covered in Scott and Abram’s research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don’t want to look it up):
I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.
This seems reasonable, thanks. But I note that “in order to actually think about anything you have to somehow move beyond naive bayesianism” is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says “be bayesian”.
One possible answer is “no, because logical induction fixes the problem”. My uninformed guess is that this doesn’t work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to “be bayesian” justified?
(Also, for historical reasons, I’d be interested in knowing when you started believing this.)
I think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much.
In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn’t actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot.
My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that.
Which ones? In Against Strong Bayesianism I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I’m not sure how we distinguish what’s ruled out from what isn’t.
Can you give an example of a real-world problem where logical uncertainty doesn’t matter a lot, given that without logical uncertainty, we’d have solved all of mathematics and considered all the best possible theories in every other domain?
I think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way.
Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people’s votes as roughly independent, so if you have guesses on how different people are likely to vote, it’s pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that.
It’s clear that there is some simplification going on here, by assigning static probabilities for people’s vote behavior, treating them as independent (though modeling some subset of independence wouldn’t be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers.
(Note, I haven’t actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread)
Well, it’s obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles.
There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for “Bayesian interpretations of regression” for a number of examples.
Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like “given these assumption, the bayesian solution is the ideal one, and you can’t perform better than this, and if you put in the computational effort you will actually achieve this performance”.
Are you able to give examples of the times you tried to be Bayesian and it failed because embedded was?
Scott and Abram? Who? Do they have any books I can read to familiarize myself with this discourse?
Scott: https://lesswrong.com/users/scott-garrabrant
Abram: https://lesswrong.com/users/abramdemski
Scott Garrabrant and Abram Demski, two MIRI researchers.
For introductions to their work, see the Embedded Agency sequence, the Consequences of Logical Induction sequence, and the Cartesian Frames sequence.
Related but not identical: this shortform post.
See the section about scoring rules in the Technical Explanation.
Hmmm, but what does this give us? He talks about the difference between vague theories and technical theories, but then says that we can use a scoring rule to change the probabilities we assign to each type of theory.
But my question is still: when you increase your credence in a vague theory, what are you increasing your credence about? That the theory is true?
Nor can we say that it’s about picking the “best theory” out of the ones we have, since different theories may overlap partially.
If we can quantify how good a theory is at making accurate predictions (or rather, quantify a combination of accuracy and simplicity), that gives us a sense in which some theories are “better” (less wrong) than others, without needing theories to be “true”.