expectation calibrator: stimulant-fueled vomiting of long-considered thoughts
In a 2004 essay, “An Intuitive Explanation of Bayes’ Theorem”, Yudkowsky puts forth that it’s not clear where Bayesain priors originally come from. Here’s a dialogue from the post poking fun at that difficulty.
Q. How can I find the priors for a problem?
A. Many commonly used priors are listed in the Handbook of Chemistry and Physics.
Q. Where do priorsoriginallycome from?
A. Never ask that question.
Q. Uh huh. Then where do scientists get their priors?
A. Priors for scientific problems are established by annual vote of the AAAS. In recent years the vote has become fractious and controversial, with widespread acrimony, factional polarization, and several outright assassinations. This may be a front for infighting within the Bayes Council, or it may be that the disputants have too much spare time. No one is really sure.
Q. I see. And where does everyone else get their priors?
A. They download their priors from Kazaa.
Q. What if the priors I want aren’t available on Kazaa?
A. There’s a small, cluttered antique shop in a back alley of San Francisco’s Chinatown. Don’t ask about the bronze rat.
The problem of where priors originally come from has caused significant philosophical confusion on LessWrong, but I think it actually has a pretty clear naturalistic solution. Our brains supply the answers to questions of probability (e.g. “How likely is Donald Trump to win the 2024 presidential election?”), and our brains were shaped by natural selection. That is to say, they were shaped by a process which generates cognitive algorithms which are reproduced if they work in practice. It wasn’t like we were being evaluated on how well our brains performed in every possible universe. They just had to produce well-calibrated expectations in our universe, or more accurately, just the parts of the universe they actually had to deal with.
You can build some intuition for this topic by considering large language models. Prior to undergoing reinforcement learning and becoming chatbots, LLMs are pure next-word predictors. If such an LLM had good enough training data and a good enough architecture, its outputs will tend to be pretty good as predictions of the types of things that humans actually say in real life.[1] However, it’s not hard to contrive situations where an LLM base model’s predictions fail dramatically.
For instance, if you type
Once upon a
into a typical base model, the vast majority of its probability mass falls on the token “time”. However, you could easily continue training the model on documents which followed up “once upon a” with random noise tokens, and its predictions will completely fail on this new statistical distribution.
In this analogy, humans are like the language models, in the sense that we’re both designed to predict that the future will broadly behave the same way as the past (in a well-defined statistical learning sense). In practice, this has historically worked out well for both humans and LLMs. They’ve historically proven well-calibrated as models of the world, giving them advantages which have resulted in their architectures being selected for replication and further improvement by natural selection and deep learning engineers respectively.
However, if an LLM was exposed to a malicious distributional shift, or if the laws of physics a human lived under suddenly completely changed, both systems would more or less stop working. It’s impossible to rule this possibility out; indeed, it’s impossible to even prove that it’s unlikely, except by deferring to the very probabilistic systems whose calibrations would be thrown off by such a cataclysm. The best each system can do is keep working with its current learning algorithm, and hoping for the best.
Anyway, all of that’s to say that there’s a good chance that human priors don’t come from any bespoke mathematical process which provably achieves relatively good results in all possible universes, however we’d want to define that. There’s just some learning algorithm it runs which results in us having the ability to make natural-language statements about the probabilities of future events, which have turned out to be reasonably well-calibrated in practice.
The fact that this works comes down to the probably-inexplicable fact that we seem to live in a universe amenable enough to induction that natural selection could find an algorithm that has world well enough historically, as well as whatever the implementation details of the brain’s learning algorithm actually are. I doubt that it’s a reflection of some deep insight supporting an airtight strategy for universal learning.
(I have a more negative critique of why I don’t think popular theories like “our brains approximate Solomonoff induction” are very enlightening or even coherent as explanations of where priors come from or ought to come from, but that seems like a topic for another post.)
A pragmatic story about where we get our priors
expectation calibrator: stimulant-fueled vomiting of long-considered thoughts
In a 2004 essay, “An Intuitive Explanation of Bayes’ Theorem”, Yudkowsky puts forth that it’s not clear where Bayesain priors originally come from. Here’s a dialogue from the post poking fun at that difficulty.
The problem of where priors originally come from has caused significant philosophical confusion on LessWrong, but I think it actually has a pretty clear naturalistic solution. Our brains supply the answers to questions of probability (e.g. “How likely is Donald Trump to win the 2024 presidential election?”), and our brains were shaped by natural selection. That is to say, they were shaped by a process which generates cognitive algorithms which are reproduced if they work in practice. It wasn’t like we were being evaluated on how well our brains performed in every possible universe. They just had to produce well-calibrated expectations in our universe, or more accurately, just the parts of the universe they actually had to deal with.
You can build some intuition for this topic by considering large language models. Prior to undergoing reinforcement learning and becoming chatbots, LLMs are pure next-word predictors. If such an LLM had good enough training data and a good enough architecture, its outputs will tend to be pretty good as predictions of the types of things that humans actually say in real life.[1] However, it’s not hard to contrive situations where an LLM base model’s predictions fail dramatically.
For instance, if you type
into a typical base model, the vast majority of its probability mass falls on the token “time”. However, you could easily continue training the model on documents which followed up “once upon a” with random noise tokens, and its predictions will completely fail on this new statistical distribution.
In this analogy, humans are like the language models, in the sense that we’re both designed to predict that the future will broadly behave the same way as the past (in a well-defined statistical learning sense). In practice, this has historically worked out well for both humans and LLMs. They’ve historically proven well-calibrated as models of the world, giving them advantages which have resulted in their architectures being selected for replication and further improvement by natural selection and deep learning engineers respectively.
However, if an LLM was exposed to a malicious distributional shift, or if the laws of physics a human lived under suddenly completely changed, both systems would more or less stop working. It’s impossible to rule this possibility out; indeed, it’s impossible to even prove that it’s unlikely, except by deferring to the very probabilistic systems whose calibrations would be thrown off by such a cataclysm. The best each system can do is keep working with its current learning algorithm, and hoping for the best.
Anyway, all of that’s to say that there’s a good chance that human priors don’t come from any bespoke mathematical process which provably achieves relatively good results in all possible universes, however we’d want to define that. There’s just some learning algorithm it runs which results in us having the ability to make natural-language statements about the probabilities of future events, which have turned out to be reasonably well-calibrated in practice.
The fact that this works comes down to the probably-inexplicable fact that we seem to live in a universe amenable enough to induction that natural selection could find an algorithm that has world well enough historically, as well as whatever the implementation details of the brain’s learning algorithm actually are. I doubt that it’s a reflection of some deep insight supporting an airtight strategy for universal learning.
(I have a more negative critique of why I don’t think popular theories like “our brains approximate Solomonoff induction” are very enlightening or even coherent as explanations of where priors come from or ought to come from, but that seems like a topic for another post.)
This can be rigorously quantified by using the LLM’s loss function; see this video series if you’re ignorant and curious about what that means.