This feels generally related to the problems covered in Scott and Abram’s research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don’t want to look it up):
In order to be a proper bayesian agent, a single hypothesis you formulate is as big and complicated as a full universe that includes yourself
I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.
This seems reasonable, thanks. But I note that “in order to actually think about anything you have to somehow move beyond naive bayesianism” is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says “be bayesian”.
One possible answer is “no, because logical induction fixes the problem”. My uninformed guess is that this doesn’t work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to “be bayesian” justified?
(Also, for historical reasons, I’d be interested in knowing when you started believing this.)
I think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much.
In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn’t actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot.
My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that.
rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal
Which ones? In Against Strong Bayesianism I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I’m not sure how we distinguish what’s ruled out from what isn’t.
which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot
Can you give an example of a real-world problem where logical uncertainty doesn’t matter a lot, given that without logical uncertainty, we’d have solved all of mathematics and considered all the best possible theories in every other domain?
I think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way.
Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people’s votes as roughly independent, so if you have guesses on how different people are likely to vote, it’s pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that.
It’s clear that there is some simplification going on here, by assigning static probabilities for people’s vote behavior, treating them as independent (though modeling some subset of independence wouldn’t be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers.
(Note, I haven’t actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread)
So I’m not sure how we distinguish what’s ruled out from what isn’t.
Well, it’s obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles.
There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for “Bayesian interpretations of regression” for a number of examples.
Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like “given these assumption, the bayesian solution is the ideal one, and you can’t perform better than this, and if you put in the computational effort you will actually achieve this performance”.
This feels generally related to the problems covered in Scott and Abram’s research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don’t want to look it up):
I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.
This seems reasonable, thanks. But I note that “in order to actually think about anything you have to somehow move beyond naive bayesianism” is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says “be bayesian”.
One possible answer is “no, because logical induction fixes the problem”. My uninformed guess is that this doesn’t work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to “be bayesian” justified?
(Also, for historical reasons, I’d be interested in knowing when you started believing this.)
I think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much.
In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn’t actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot.
My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that.
Which ones? In Against Strong Bayesianism I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I’m not sure how we distinguish what’s ruled out from what isn’t.
Can you give an example of a real-world problem where logical uncertainty doesn’t matter a lot, given that without logical uncertainty, we’d have solved all of mathematics and considered all the best possible theories in every other domain?
I think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way.
Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people’s votes as roughly independent, so if you have guesses on how different people are likely to vote, it’s pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that.
It’s clear that there is some simplification going on here, by assigning static probabilities for people’s vote behavior, treating them as independent (though modeling some subset of independence wouldn’t be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers.
(Note, I haven’t actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread)
Well, it’s obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles.
There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for “Bayesian interpretations of regression” for a number of examples.
Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like “given these assumption, the bayesian solution is the ideal one, and you can’t perform better than this, and if you put in the computational effort you will actually achieve this performance”.
Are you able to give examples of the times you tried to be Bayesian and it failed because embedded was?
Scott and Abram? Who? Do they have any books I can read to familiarize myself with this discourse?
Scott: https://lesswrong.com/users/scott-garrabrant
Abram: https://lesswrong.com/users/abramdemski
Scott Garrabrant and Abram Demski, two MIRI researchers.
For introductions to their work, see the Embedded Agency sequence, the Consequences of Logical Induction sequence, and the Cartesian Frames sequence.