In my previous post, I introduced meta-rationality and frames, and described some examples of frames and some of their properties. In this post I’ll outline some of the limitations of existing ways of thinking about cognition, and some of the dynamics that they can’t describe which meta-rationality can. This post (especially the second half) can be seen as a summary of the key ideas from the rest of the sequence; if you find it too dense, feel free to skip it and come back after reading the next five posts. To quickly list my main claims:
Unlike logical propositions, frames can’t be evaluated as discretely true or false.
Unlike Bayesian hypotheses, frames aren’t mutually exclusive, and can overlap with each other. This (along with point #1) means that we can’t define probability distributions of credences over frames.
Unlike in critical rationalism, we evaluate frames (partly) in terms of how true they are (based on their predictions) rather than just whether they’ve been falsified or not.
Unlike Garrabrant traders and Rational Inductive Agents, frames can output any combination of empirical content (e.g. predictions about the world) and normative content (e.g. evaluations of outcomes, or recommendations for how to act).
Unlike model-based policies, policies composed of frames can’t be decomposed into modules with distinct functions, because each frame plays multiple roles.
Unlike in multi-agent RL, frames don’t interact independently with their environment, but instead contribute towards choosing the actions of a single agent.
I’ll now explain these points in more detail. Epistemology typically focuses on propositions which can (at least in principle) be judged true or false. Traditionally, truth and knowledge are both taken as binary criteria: each proposition is either true or false, and we either know which it is or we don’t. Intuitively speaking, though, this doesn’t match very well with our everyday experience. There are many propositions which are kinda true, or which we kinda know: cats are (mostly) carnivorous (I think); Bob is tall(ish, if I’m looking at the right person); America is beautiful (in some ways, by my current tastes).
The most straightforward solution to the problem of uncertainty is to assign credences based on how much evidence we have for each proposition. This is the bayesian approach, which solves a number of “paradoxes” in epistemology. But there’s still the question: what are we assigning credences to, if not to the proposition being discretely true or false? You might think that we can treat propositions which are “kinda true” (aka fuzzily true) as edge cases—but they’re omnipresent not only in everyday life, but also when thinking about more complex topics. Consider a scientific theory like Darwinian evolution. Darwin got many crucial things right, when formulating his theory; but there were also many gaps and mistakes. So applying a binary standard of truth to the theory as a whole is futile: even though some parts of Darwin’s original theory were false or too vague to evaluate, the overall theory was much better than any other in that domain. The mental models which we often use in our daily lives (e.g. our implicit models of how bicycles work), and all the other examples of frames I listed at the beginning of this post, can also be seen as “kinda but not completely true”. (From now on I’ll use “models” as a broad term which encompasses both scientific theories and informal mental models.)
Not being “completely true” isn’t just a limitation of our current models, but a more fundamental problem. Perhaps we can discover completely true theories in physics, mathematics, or theoretical CS. But in order to describe high-level features of the real world, it’s always necessary to make simplifying assumptions and use somewhat-leaky abstractions, and so those models will always face a tradeoff between accuracy and usefulness. For example, Newtonian mechanics is less true than Einsteinian relativity, but still “close enough to true” that we use it in many cases, whereas Aristotelian physics isn’t.
I’m implicitly appealing here to the idea that we can categorize and compare “how true” different models are. This often makes intuitive sense—as Isaac Asimov put it: “When people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together.” We could also treat uncertainty over wrongness from a bayesian perspective: e.g. we might place a high credence on our current understanding of evolution being fairly close to the truth, and low credences on it being almost entirely true or mostly false. And it also applies to more prosaic examples: e.g. it seems reasonable to expect that my mental model of a close friend Alice (including things like her appearance, life story, personality traits, etc) is probably more accurate than my model of Alice’s partner Bob, but unlikely to be more accurate than Bob’s model of Alice.
Unfortunately, even if we accept continuous degrees of truth in theory, it turns out to be very hard to describe clearly what it means for a model to have a given degree of truth, or to be more or less true than another model Philosophers of science have searched extensively for a principled account of this, especially in the case where two models predict all the same phenomena. But I think the point is better illustrated by comparing two models which both make accurate predictions, but where the questions they predict don’t fully overlap. For example, one model might have made more accurate predictions overall, but skew towards trivial predictions. Even if we excluded those, models would still have better average accuracy if they only made predictions that they were very confident about. Concretely, imagine an economist and an environmentalist debating climate change. The economist might do much better at predicting the specific consequences of many policies; the environmentalist might focus instead on a few big predictions, like “these measures won’t keep us below 450 ppm of atmospheric carbon dioxide” or “we’ll need significant cultural or technological progress to solve climate change”, and think that most of the metrics the economist is forecasting are basically irrelevant.
In cases like these, the idea that we can compare multiple models on a single scale of “truth” is much less compelling. What alternatives do we have? One is given by critical rationalism, an epistemology developed by Karl Popper and, later, David Deutsch. Critical rationalism holds that we don’t ever have reasons to accept a theory as true; we can only ever reject theories after they’ve been falsified. However, there are at least two reasons to be skeptical of this position. Firstly, the claim that there’s no reason to believe more strongly in models which have made good predictions is very counterintuitive (as I argue more extensively in post #5). Secondly, critical rationalism is very much focused on science, where it’s common for new theories to conclusively displace old ones, sometimes on the basis of only a few datapoints. But other domains are much less winner-takes-all. In the example above, the economist and the environmentalist probably each have both some good points and also some big mistakes or oversights. We’re primarily trying not to discard the one that’s less true, but rather to form reasonable judgments by taking both perspectives into account.
Garrabrant induction provides a framework for doing so, by thinking of models as traders on a prediction market, who buy and sell shares in a set of propositions. Traders accrue wealth by “beating the market” in predicting which propositions will turn out to be true. Intuitively speaking, this captures the idea that successful predictions should be rewarded in proportion to how surprising they are—meaning that it matters less what the specific questions are, since a trader only makes money by beating existing predictions, and makes more money the more it beats. A trader’s wealth can therefore be seen as an indicator of how true it is compared with the other models it’s trading against.[1]
Garrabrant traders capture many key intuitions about how frames work—but unlike most of the examples of frames above, they don’t directly influence actions, only beliefs. In other words, the traders in a classic Garrabrant inductor only trade on epistemic not normative claims. However, a variant of Garrabrant inductors called Rational Inductive Agents (RIAs) has been defined where the traders make bids on a decision market rather than a prediction market. My current best guess for how to formalize frames is as a combination of Garrabrant inductors, RIAs, and some kind of voting system: traders would earn wealth from a combination of making predictions and directly influencing decisions, and then spend that wealth to determine which values should be used as evaluation criteria for the decisions (as I’ll explore in post #6; see also the diagram below).
The idea of a single system which makes predictions, represents values, and makes decisions is common in the field of model-based reinforcement learning. Model-based RL policies often use a world-model to predict how different sequences of actions will play out, and then evaluate the quality of the resulting trajectories using a reward model (or some other representation of goals or values). However, crucially, these model-based policies are structured so that each module has a distinct role—as in the (highly simplified) diagram on the left. By contrast, a (hypothetical) frame-based policy would be a type of ensemble where each frame has multiple different types of output—as in the diagram on the right.[2] This distinction is important because in general we can’t cleanly factorize a frame into different components. In particular, both a frame’s empirical and its normative claims typically rely on the same underlying ontology, so that it’s hard to change one without changing the other.
The way I’ve portrayed frames here makes them seem in some ways like agents in their own right—which suggests that we could reason about the interactions between them using concepts from multi-agent RL, economics, politics, and other social sciences. I do think this is a useful perspective on frames, and will draw on it more strongly in the second half of this sequence. But a first-pass understanding of frames should focus on the fact that they’re controlling the predictions, plans and values of a single agent, rather than being free to act independently.[3]
As a final point: this section focused on ways in which frames are more general than concepts used in other epistemologies. However, the broader our conception of frames, the harder it is to identify how meta-rationality actually constrains our expectations or guides our actions. So this post should be seen mainly as an explanation for why there’s potentially a gap to be filled—but the question of whether meta-rationality actually fills that gap needs to be answered by subsequent posts.
It may seem weird for this indicator to depend on the other models that already exist, but I think that’s necessary if we lack a way to compare models against each other directly.
Note that in this diagram not all of the frames are weighing in on all three of predictions, plans and values. This is deliberate: as I’ve defined frames, they don’t need to have opinions on everything.
This helps reconcile meta-rationality with Feyerabend’s epistemological anarchism, which claims that there are no rules for how scientific progress should be made. They are consistent in the sense that “anything goes” when constructing frames: frame construction often happens in chaotic or haphazard ways (as I’ll explore further in post #4). But meta-rationality then imposes restrictions on what frames are meant to do, and how we evaluate them.
Frames in context
In my previous post, I introduced meta-rationality and frames, and described some examples of frames and some of their properties. In this post I’ll outline some of the limitations of existing ways of thinking about cognition, and some of the dynamics that they can’t describe which meta-rationality can. This post (especially the second half) can be seen as a summary of the key ideas from the rest of the sequence; if you find it too dense, feel free to skip it and come back after reading the next five posts. To quickly list my main claims:
Unlike logical propositions, frames can’t be evaluated as discretely true or false.
Unlike Bayesian hypotheses, frames aren’t mutually exclusive, and can overlap with each other. This (along with point #1) means that we can’t define probability distributions of credences over frames.
Unlike in critical rationalism, we evaluate frames (partly) in terms of how true they are (based on their predictions) rather than just whether they’ve been falsified or not.
Unlike Garrabrant traders and Rational Inductive Agents, frames can output any combination of empirical content (e.g. predictions about the world) and normative content (e.g. evaluations of outcomes, or recommendations for how to act).
Unlike model-based policies, policies composed of frames can’t be decomposed into modules with distinct functions, because each frame plays multiple roles.
Unlike in multi-agent RL, frames don’t interact independently with their environment, but instead contribute towards choosing the actions of a single agent.
I’ll now explain these points in more detail. Epistemology typically focuses on propositions which can (at least in principle) be judged true or false. Traditionally, truth and knowledge are both taken as binary criteria: each proposition is either true or false, and we either know which it is or we don’t. Intuitively speaking, though, this doesn’t match very well with our everyday experience. There are many propositions which are kinda true, or which we kinda know: cats are (mostly) carnivorous (I think); Bob is tall(ish, if I’m looking at the right person); America is beautiful (in some ways, by my current tastes).
The most straightforward solution to the problem of uncertainty is to assign credences based on how much evidence we have for each proposition. This is the bayesian approach, which solves a number of “paradoxes” in epistemology. But there’s still the question: what are we assigning credences to, if not to the proposition being discretely true or false? You might think that we can treat propositions which are “kinda true” (aka fuzzily true) as edge cases—but they’re omnipresent not only in everyday life, but also when thinking about more complex topics. Consider a scientific theory like Darwinian evolution. Darwin got many crucial things right, when formulating his theory; but there were also many gaps and mistakes. So applying a binary standard of truth to the theory as a whole is futile: even though some parts of Darwin’s original theory were false or too vague to evaluate, the overall theory was much better than any other in that domain. The mental models which we often use in our daily lives (e.g. our implicit models of how bicycles work), and all the other examples of frames I listed at the beginning of this post, can also be seen as “kinda but not completely true”. (From now on I’ll use “models” as a broad term which encompasses both scientific theories and informal mental models.)
Not being “completely true” isn’t just a limitation of our current models, but a more fundamental problem. Perhaps we can discover completely true theories in physics, mathematics, or theoretical CS. But in order to describe high-level features of the real world, it’s always necessary to make simplifying assumptions and use somewhat-leaky abstractions, and so those models will always face a tradeoff between accuracy and usefulness. For example, Newtonian mechanics is less true than Einsteinian relativity, but still “close enough to true” that we use it in many cases, whereas Aristotelian physics isn’t.
I’m implicitly appealing here to the idea that we can categorize and compare “how true” different models are. This often makes intuitive sense—as Isaac Asimov put it: “When people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together.” We could also treat uncertainty over wrongness from a bayesian perspective: e.g. we might place a high credence on our current understanding of evolution being fairly close to the truth, and low credences on it being almost entirely true or mostly false. And it also applies to more prosaic examples: e.g. it seems reasonable to expect that my mental model of a close friend Alice (including things like her appearance, life story, personality traits, etc) is probably more accurate than my model of Alice’s partner Bob, but unlikely to be more accurate than Bob’s model of Alice.
Unfortunately, even if we accept continuous degrees of truth in theory, it turns out to be very hard to describe clearly what it means for a model to have a given degree of truth, or to be more or less true than another model Philosophers of science have searched extensively for a principled account of this, especially in the case where two models predict all the same phenomena. But I think the point is better illustrated by comparing two models which both make accurate predictions, but where the questions they predict don’t fully overlap. For example, one model might have made more accurate predictions overall, but skew towards trivial predictions. Even if we excluded those, models would still have better average accuracy if they only made predictions that they were very confident about. Concretely, imagine an economist and an environmentalist debating climate change. The economist might do much better at predicting the specific consequences of many policies; the environmentalist might focus instead on a few big predictions, like “these measures won’t keep us below 450 ppm of atmospheric carbon dioxide” or “we’ll need significant cultural or technological progress to solve climate change”, and think that most of the metrics the economist is forecasting are basically irrelevant.
In cases like these, the idea that we can compare multiple models on a single scale of “truth” is much less compelling. What alternatives do we have? One is given by critical rationalism, an epistemology developed by Karl Popper and, later, David Deutsch. Critical rationalism holds that we don’t ever have reasons to accept a theory as true; we can only ever reject theories after they’ve been falsified. However, there are at least two reasons to be skeptical of this position. Firstly, the claim that there’s no reason to believe more strongly in models which have made good predictions is very counterintuitive (as I argue more extensively in post #5). Secondly, critical rationalism is very much focused on science, where it’s common for new theories to conclusively displace old ones, sometimes on the basis of only a few datapoints. But other domains are much less winner-takes-all. In the example above, the economist and the environmentalist probably each have both some good points and also some big mistakes or oversights. We’re primarily trying not to discard the one that’s less true, but rather to form reasonable judgments by taking both perspectives into account.
Garrabrant induction provides a framework for doing so, by thinking of models as traders on a prediction market, who buy and sell shares in a set of propositions. Traders accrue wealth by “beating the market” in predicting which propositions will turn out to be true. Intuitively speaking, this captures the idea that successful predictions should be rewarded in proportion to how surprising they are—meaning that it matters less what the specific questions are, since a trader only makes money by beating existing predictions, and makes more money the more it beats. A trader’s wealth can therefore be seen as an indicator of how true it is compared with the other models it’s trading against.[1]
Garrabrant traders capture many key intuitions about how frames work—but unlike most of the examples of frames above, they don’t directly influence actions, only beliefs. In other words, the traders in a classic Garrabrant inductor only trade on epistemic not normative claims. However, a variant of Garrabrant inductors called Rational Inductive Agents (RIAs) has been defined where the traders make bids on a decision market rather than a prediction market. My current best guess for how to formalize frames is as a combination of Garrabrant inductors, RIAs, and some kind of voting system: traders would earn wealth from a combination of making predictions and directly influencing decisions, and then spend that wealth to determine which values should be used as evaluation criteria for the decisions (as I’ll explore in post #6; see also the diagram below).
The idea of a single system which makes predictions, represents values, and makes decisions is common in the field of model-based reinforcement learning. Model-based RL policies often use a world-model to predict how different sequences of actions will play out, and then evaluate the quality of the resulting trajectories using a reward model (or some other representation of goals or values). However, crucially, these model-based policies are structured so that each module has a distinct role—as in the (highly simplified) diagram on the left. By contrast, a (hypothetical) frame-based policy would be a type of ensemble where each frame has multiple different types of output—as in the diagram on the right.[2] This distinction is important because in general we can’t cleanly factorize a frame into different components. In particular, both a frame’s empirical and its normative claims typically rely on the same underlying ontology, so that it’s hard to change one without changing the other.
The way I’ve portrayed frames here makes them seem in some ways like agents in their own right—which suggests that we could reason about the interactions between them using concepts from multi-agent RL, economics, politics, and other social sciences. I do think this is a useful perspective on frames, and will draw on it more strongly in the second half of this sequence. But a first-pass understanding of frames should focus on the fact that they’re controlling the predictions, plans and values of a single agent, rather than being free to act independently.[3]
As a final point: this section focused on ways in which frames are more general than concepts used in other epistemologies. However, the broader our conception of frames, the harder it is to identify how meta-rationality actually constrains our expectations or guides our actions. So this post should be seen mainly as an explanation for why there’s potentially a gap to be filled—but the question of whether meta-rationality actually fills that gap needs to be answered by subsequent posts.
It may seem weird for this indicator to depend on the other models that already exist, but I think that’s necessary if we lack a way to compare models against each other directly.
Note that in this diagram not all of the frames are weighing in on all three of predictions, plans and values. This is deliberate: as I’ve defined frames, they don’t need to have opinions on everything.
This helps reconcile meta-rationality with Feyerabend’s epistemological anarchism, which claims that there are no rules for how scientific progress should be made. They are consistent in the sense that “anything goes” when constructing frames: frame construction often happens in chaotic or haphazard ways (as I’ll explore further in post #4). But meta-rationality then imposes restrictions on what frames are meant to do, and how we evaluate them.